Files

Phil Skentelbery b0230b1233 feat: Complete v1.1.2 Phase 1 - Metrics Instrumentation

Implements the metrics instrumentation framework that was missing from v1.1.1.
The monitoring framework existed but was never actually used to collect metrics.

Phase 1 Deliverables:
- Database operation monitoring with query timing and slow query detection
- HTTP request/response metrics with request IDs for all requests
- Memory monitoring via daemon thread with configurable intervals
- Business metrics framework for notes, feeds, and cache operations
- Configuration management with environment variable support

Implementation Details:
- MonitoredConnection wrapper at pool level for transparent DB monitoring
- Flask middleware hooks for HTTP metrics collection
- Background daemon thread for memory statistics (skipped in test mode)
- Simple business metric helpers for integration in Phase 2
- Comprehensive test suite with 28/28 tests passing

Quality Metrics:
- 100% test pass rate (28/28 tests)
- Zero architectural deviations from specifications
- <1% performance overhead achieved
- Production-ready with minimal memory impact (~2MB)

Architect Review: APPROVED with excellent marks

Documentation:
- Implementation report: docs/reports/v1.1.2-phase1-metrics-implementation.md
- Architect review: docs/reviews/2025-11-26-v1.1.2-phase1-review.md
- Updated CHANGELOG.md with Phase 1 additions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-26 14:13:44 -07:00

18 KiB

Raw Blame History

StarPunk v1.1.2 "Syndicate" - Implementation Guide

Overview

This guide provides a phased approach to implementing v1.1.2 "Syndicate" features. The release is structured in three phases totaling 14-16 hours of focused development.

Pre-Implementation Checklist

Review v1.1.1 performance monitoring specification
Ensure development environment has Python 3.11+
Create feature branch: feature/v1.1.2-syndicate
Review feed format specifications (RSS 2.0, ATOM 1.0, JSON Feed 1.1)
Set up feed reader test clients

Phase 1: Metrics Instrumentation (4-6 hours) ✅ COMPLETE

Objective

Complete the metrics instrumentation that was partially implemented in v1.1.1, adding comprehensive coverage across all system operations.

1.1 Database Operation Timing (1.5 hours) ✅

Location: starpunk/monitoring/database.py

Implementation Steps:

Create Database Monitor Wrapper

class MonitoredConnection:
    """Wrapper for SQLite connections with timing"""

    def execute(self, query, params=None):
        # Start timer
        # Execute query
        # Record metric
        # Return result

Instrument All Query Types
- SELECT queries (with row count)
- INSERT operations (with affected rows)
- UPDATE operations (with affected rows)
- DELETE operations (rare, but instrumented)
- Transaction boundaries (BEGIN/COMMIT)
Add Query Pattern Detection
- Identify query type (SELECT, INSERT, etc.)
- Extract table name
- Detect slow queries (>1s)
- Track prepared statement usage

Metrics to Collect:

db.query.duration - Query execution time
db.query.count - Number of queries by type
db.rows.returned - Result set size
db.transaction.duration - Transaction time
db.connection.wait - Connection acquisition time

1.2 HTTP Request/Response Metrics (1.5 hours) ✅

Location: starpunk/monitoring/http.py

Implementation Steps:

Enhance Request Middleware

@app.before_request
def start_request_metrics():
    g.metrics = {
        'start_time': time.perf_counter(),
        'start_memory': get_memory_usage(),
        'request_id': generate_request_id()
    }

Capture Response Metrics

@app.after_request
def capture_response_metrics(response):
    # Calculate duration
    # Measure memory delta
    # Record response size
    # Track status codes

Add Endpoint-Specific Metrics
- Feed generation timing
- Micropub processing time
- Static file serving
- Admin operations

Metrics to Collect:

http.request.duration - Total request time
http.request.size - Request body size
http.response.size - Response body size
http.status.{code} - Status code distribution
http.endpoint.{name} - Per-endpoint timing

1.3 Memory Monitoring Thread (1 hour) ✅

Location: starpunk/monitoring/memory.py

Implementation Steps:

Create Background Monitor

class MemoryMonitor(Thread):
    def run(self):
        while self.running:
            # Get RSS memory
            # Check for growth
            # Detect potential leaks
            # Sleep interval

Track Memory Patterns
- Process RSS memory
- Virtual memory size
- Memory growth rate
- High water mark
- Garbage collection stats
Add Leak Detection
- Baseline after startup
- Track growth over time
- Alert on sustained growth
- Identify allocation sources

Metrics to Collect:

memory.rss - Resident set size
memory.vms - Virtual memory size
memory.growth_rate - MB/hour
memory.gc.collections - GC runs
memory.high_water - Peak usage

1.4 Business Metrics for Syndication (1 hour) ✅

Location: starpunk/monitoring/business.py

Implementation Steps:

Track Feed Operations
- Feed requests by format
- Cache hit/miss rates
- Generation timing
- Format negotiation results
Monitor Content Flow
- Notes published per day
- Average note length
- Media attachments
- Syndication success
User Behavior Metrics
- Popular feed formats
- Reader user agents
- Request patterns
- Geographic distribution

Metrics to Collect:

feed.requests.{format} - Requests by format
feed.cache.hit_rate - Cache effectiveness
feed.generation.time - Generation duration
content.notes.published - Publishing rate
content.syndication.success - Successful syndications

Phase 1 Completion Status ✅

Completed: 2025-11-25 Developer: StarPunk Fullstack Developer (AI) Review: Approved by Architect on 2025-11-26 Test Results: 28/28 tests passing Performance: <1% overhead achieved Next Step: Begin Phase 2 - Feed Formats

Note: All Phase 1 metrics instrumentation is complete and ready for production use. Business metrics functions are available for integration into notes.py and feed.py during Phase 2.

Phase 2: Feed Formats (6-8 hours)

Objective

Fix RSS feed ordering regression, then implement ATOM and JSON Feed formats alongside existing RSS, with proper content negotiation and caching.

2.0 Fix RSS Feed Ordering Regression (0.5 hours) - CRITICAL

Location: starpunk/feed.py

Critical Production Bug: RSS feed currently shows oldest entries first instead of newest first. This violates RSS standards and user expectations.

Root Cause: Incorrect reversed() calls on lines 100 and 198 that flip the correct DESC order from database.

Implementation Steps:

Remove Incorrect Reversals
- Line 100: Remove reversed() from for note in reversed(notes[:limit]):
- Line 198: Remove reversed() from for note in reversed(notes[:limit]):
- Update/remove misleading comments about feedgen reversing order
Verify Expected Behavior
- Database returns notes in DESC order (newest first) - confirmed line 440 of notes.py
- Feed should maintain this order (newest entries first)
- This is the standard for ALL feed formats (RSS, ATOM, JSON Feed)

Add Feed Order Tests

def test_rss_feed_newest_first():
    """Test RSS feed shows newest entries first"""
    # Create notes with different timestamps
    old_note = create_note(title="Old", created_at=yesterday)
    new_note = create_note(title="New", created_at=today)

    # Generate feed
    feed = generate_rss_feed([old_note, new_note])

    # Parse and verify order
    items = parse_feed_items(feed)
    assert items[0].title == "New"
    assert items[1].title == "Old"

Important: This MUST be fixed before implementing ATOM and JSON feeds to ensure all formats have consistent, correct ordering.

2.1 ATOM Feed Generation (2.5 hours)

Location: starpunk/feed/atom.py

Implementation Steps:

Create ATOM Generator Class

class AtomGenerator:
    def generate(self, notes, config):
        # Yield XML declaration
        # Yield feed element
        # Yield entries
        # Stream output

Implement ATOM 1.0 Elements
- Required: id, title, updated
- Recommended: author, link, category
- Optional: contributor, generator, icon, logo, rights, subtitle
Handle Content Types
- Text content (escaped)
- HTML content (in CDATA)
- XHTML content (inline)
- Base64 for binary
Date Formatting
- RFC 3339 format
- Timezone handling
- Updated vs published

ATOM Structure:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Site Title</title>
  <link href="http://example.com/"/>
  <link href="http://example.com/feed.atom" rel="self"/>
  <updated>2024-11-25T12:00:00Z</updated>
  <author>
    <name>Author Name</name>
  </author>
  <id>http://example.com/</id>

  <entry>
    <title>Note Title</title>
    <link href="http://example.com/note/1"/>
    <id>http://example.com/note/1</id>
    <updated>2024-11-25T12:00:00Z</updated>
    <content type="html">
      <![CDATA[<p>HTML content</p>]]>
    </content>
  </entry>
</feed>

2.2 JSON Feed Generation (2.5 hours)

Location: starpunk/feed/json_feed.py

Implementation Steps:

Create JSON Feed Generator

class JsonFeedGenerator:
    def generate(self, notes, config):
        # Build feed object
        # Add items array
        # Include metadata
        # Stream JSON output

Implement JSON Feed 1.1 Schema
- version (required)
- title (required)
- items (required array)
- home_page_url
- feed_url
- description
- authors array
- language
- icon, favicon
Handle Rich Content
- content_html
- content_text
- summary
- image attachments
- tags array
- authors array
Add Extensions
- _starpunk namespace
- Pagination hints
- Hub for real-time

JSON Feed Structure:

{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Site Title",
  "home_page_url": "https://example.com/",
  "feed_url": "https://example.com/feed.json",
  "description": "Site description",
  "authors": [
    {
      "name": "Author Name",
      "url": "https://example.com/about"
    }
  ],
  "items": [
    {
      "id": "https://example.com/note/1",
      "url": "https://example.com/note/1",
      "title": "Note Title",
      "content_html": "<p>HTML content</p>",
      "date_published": "2024-11-25T12:00:00Z",
      "tags": ["tag1", "tag2"]
    }
  ]
}

2.3 Content Negotiation (1.5 hours)

Location: starpunk/feed/negotiator.py

Implementation Steps:

Create Content Negotiator

class FeedNegotiator:
    def negotiate(self, accept_header):
        # Parse Accept header
        # Score each format
        # Return best match

Parse Accept Header
- Split on comma
- Extract MIME type
- Parse quality factors (q=)
- Handle wildcards (/)
Score Formats
- Exact match: 1.0
- Wildcard match: 0.5
- Type/* match: 0.7
- Default RSS: 0.1

Format Mapping

FORMAT_MIME_TYPES = {
    'rss': ['application/rss+xml', 'application/xml', 'text/xml'],
    'atom': ['application/atom+xml'],
    'json': ['application/json', 'application/feed+json']
}

2.4 Feed Validation (1.5 hours)

Location: starpunk/feed/validators.py

Implementation Steps:

Create Validation Framework

class FeedValidator(Protocol):
    def validate(self, content: str) -> List[ValidationError]:
        pass

RSS Validator
- Check required elements
- Verify date formats
- Validate URLs
- Check CDATA escaping
ATOM Validator
- Verify namespace
- Check required elements
- Validate RFC 3339 dates
- Verify ID uniqueness
JSON Feed Validator
- Validate against schema
- Check required fields
- Verify URL formats
- Validate date strings

Validation Levels:

ERROR: Feed is invalid
WARNING: Non-critical issue
INFO: Suggestion for improvement

Phase 3: Feed Enhancements (4 hours)

Objective

Add caching, statistics, and operational improvements to the feed system.

3.1 Feed Caching Layer (1.5 hours)

Location: starpunk/feed/cache.py

Implementation Steps:

Create Cache Manager

class FeedCache:
    def __init__(self, max_size=100, ttl=300):
        self.cache = LRU(max_size)
        self.ttl = ttl

Cache Key Generation
- Format type
- Item limit
- Content checksum
- Last modified
Cache Operations
- Get with TTL check
- Set with expiration
- Invalidate on changes
- Clear entire cache
Memory Management
- Monitor cache size
- Implement eviction
- Track hit rates
- Report statistics

Cache Strategy:

def get_or_generate(format, limit):
    key = generate_cache_key(format, limit)
    cached = cache.get(key)

    if cached and not expired(cached):
        metrics.record_cache_hit()
        return cached

    content = generate_feed(format, limit)
    cache.set(key, content, ttl=300)
    metrics.record_cache_miss()
    return content

3.2 Statistics Dashboard (1.5 hours)

Location: starpunk/admin/syndication.py

Template: templates/admin/syndication.html

Implementation Steps:

Create Dashboard Route

@app.route('/admin/syndication')
@require_admin
def syndication_dashboard():
    stats = gather_syndication_stats()
    return render_template('admin/syndication.html', stats=stats)

Gather Statistics
- Requests by format (pie chart)
- Cache hit rates (line graph)
- Generation times (histogram)
- Popular user agents (table)
- Recent errors (log)
Create Dashboard UI
- Overview cards
- Time series graphs
- Format breakdown
- Performance metrics
- Configuration status

Dashboard Sections:

Feed Format Usage
Cache Performance
Generation Times
Client Analysis
Error Log
Configuration

3.3 OPML Export (1 hour)

Location: starpunk/feed/opml.py

Implementation Steps:

Create OPML Generator

def generate_opml(site_config):
    # Generate OPML header
    # Add feed outlines
    # Include metadata
    return opml_content

OPML Structure

<?xml version="1.0" encoding="UTF-8"?>
<opml version="2.0">
  <head>
    <title>StarPunk Feeds</title>
    <dateCreated>Mon, 25 Nov 2024 12:00:00 UTC</dateCreated>
  </head>
  <body>
    <outline type="rss" text="RSS Feed" xmlUrl="https://example.com/feed.xml"/>
    <outline type="atom" text="ATOM Feed" xmlUrl="https://example.com/feed.atom"/>
    <outline type="json" text="JSON Feed" xmlUrl="https://example.com/feed.json"/>
  </body>
</opml>

Add Export Route

@app.route('/feeds.opml')
def export_opml():
    opml = generate_opml(config)
    return Response(opml, mimetype='text/x-opml')

Testing Strategy

Phase 1 Tests (Metrics)

Unit Tests
- Mock database operations
- Test metric collection
- Verify memory monitoring
- Test business metrics
Integration Tests
- End-to-end request tracking
- Database timing accuracy
- Memory leak detection
- Metrics aggregation

Phase 2 Tests (Feeds)

Format Tests
- Valid RSS generation
- Valid ATOM generation
- Valid JSON Feed generation
- Content negotiation logic
- Feed ordering (newest first) for ALL formats - CRITICAL

Feed Ordering Tests (REQUIRED)

def test_all_feeds_newest_first():
    """Verify all feed formats show newest entries first"""
    old_note = create_note(title="Old", created_at=yesterday)
    new_note = create_note(title="New", created_at=today)
    notes = [new_note, old_note]  # DESC order from database

    # Test RSS
    rss_feed = generate_rss_feed(notes)
    assert first_item(rss_feed).title == "New"

    # Test ATOM
    atom_feed = generate_atom_feed(notes)
    assert first_item(atom_feed).title == "New"

    # Test JSON
    json_feed = generate_json_feed(notes)
    assert json_feed['items'][0]['title'] == "New"

Compliance Tests
- W3C Feed Validator
- ATOM validator
- JSON Feed validator
- Popular readers

Phase 3 Tests (Enhancements)

Cache Tests
- TTL expiration
- LRU eviction
- Invalidation
- Hit rate tracking
Dashboard Tests
- Statistics accuracy
- Graph rendering
- OPML validity
- Performance impact

Configuration Updates

New Configuration Options

Add to config.py:

# Feed configuration
FEED_DEFAULT_LIMIT = int(os.getenv('STARPUNK_FEED_DEFAULT_LIMIT', 50))
FEED_MAX_LIMIT = int(os.getenv('STARPUNK_FEED_MAX_LIMIT', 500))
FEED_CACHE_TTL = int(os.getenv('STARPUNK_FEED_CACHE_TTL', 300))
FEED_CACHE_SIZE = int(os.getenv('STARPUNK_FEED_CACHE_SIZE', 100))

# Format support
FEED_RSS_ENABLED = str_to_bool(os.getenv('STARPUNK_FEED_RSS_ENABLED', 'true'))
FEED_ATOM_ENABLED = str_to_bool(os.getenv('STARPUNK_FEED_ATOM_ENABLED', 'true'))
FEED_JSON_ENABLED = str_to_bool(os.getenv('STARPUNK_FEED_JSON_ENABLED', 'true'))

# Metrics for syndication
METRICS_FEED_TIMING = str_to_bool(os.getenv('STARPUNK_METRICS_FEED_TIMING', 'true'))
METRICS_CACHE_STATS = str_to_bool(os.getenv('STARPUNK_METRICS_CACHE_STATS', 'true'))
METRICS_FORMAT_USAGE = str_to_bool(os.getenv('STARPUNK_METRICS_FORMAT_USAGE', 'true'))

Documentation Updates

User Documentation

Feed Formats Guide
- How to access each format
- Which readers support what
- Format comparison
Configuration Guide
- New environment variables
- Performance tuning
- Cache settings

API Documentation

Feed Endpoints
- /feed.xml - RSS feed
- /feed.atom - ATOM feed
- /feed.json - JSON feed
- /feeds.opml - OPML export
Content Negotiation
- Accept header usage
- Format precedence
- Default behavior

Deployment Checklist

Pre-deployment

All tests passing
Metrics instrumentation verified
Feed formats validated
Cache performance tested
Documentation updated

Deployment Steps

Backup database
Update configuration
Deploy new code
Run migrations (none for v1.1.2)
Clear feed cache
Test all feed formats
Verify metrics collection

Post-deployment

Monitor memory usage
Check feed generation times
Verify cache hit rates
Test with feed readers
Review error logs

Rollback Plan

If issues arise:

Immediate Rollback

git checkout v1.1.1
supervisorctl restart starpunk

Cache Cleanup

redis-cli FLUSHDB  # If using Redis
rm -rf /tmp/starpunk_cache/*  # If file-based

Configuration Rollback
```
cp config.backup.ini config.ini
```

Success Metrics

Performance Targets

Feed generation <100ms (50 items)
Cache hit rate >80%
Memory overhead <10MB
Zero performance regression

Compatibility Targets

10+ feed readers tested
All validators passing
No breaking changes
Backward compatibility maintained

Timeline

Week 1

Phase 1: Metrics instrumentation (4-6 hours)
Testing and validation

Week 2

Phase 2: Feed formats (6-8 hours)
Integration testing

Week 3

Phase 3: Enhancements (4 hours)
Final testing and documentation
Deployment

Total estimated time: 14-16 hours of focused development

18 KiB Raw Blame History

StarPunk v1.1.2 "Syndicate" - Implementation Guide

Overview

Pre-Implementation Checklist

Phase 1: Metrics Instrumentation (4-6 hours) ✅ COMPLETE

Objective

1.1 Database Operation Timing (1.5 hours) ✅

1.2 HTTP Request/Response Metrics (1.5 hours) ✅

1.3 Memory Monitoring Thread (1 hour) ✅

1.4 Business Metrics for Syndication (1 hour) ✅

Phase 1 Completion Status ✅

Phase 2: Feed Formats (6-8 hours)

Objective

2.0 Fix RSS Feed Ordering Regression (0.5 hours) - CRITICAL

2.1 ATOM Feed Generation (2.5 hours)

2.2 JSON Feed Generation (2.5 hours)

2.3 Content Negotiation (1.5 hours)

2.4 Feed Validation (1.5 hours)

Phase 3: Feed Enhancements (4 hours)

Objective

3.1 Feed Caching Layer (1.5 hours)

3.2 Statistics Dashboard (1.5 hours)

3.3 OPML Export (1 hour)

Testing Strategy

Phase 1 Tests (Metrics)

Phase 2 Tests (Feeds)

Phase 3 Tests (Enhancements)

Configuration Updates

New Configuration Options

Documentation Updates

User Documentation

API Documentation

Deployment Checklist

Pre-deployment

Deployment Steps

Post-deployment

Rollback Plan

Success Metrics

Performance Targets

Compatibility Targets

Timeline

Week 1

Week 2

Week 3

18 KiB

Raw Blame History