Files
StarPunk/docs/design/v1.1.2/v1.1.2-syndicate-architecture.md
Phil Skentelbery f10d0679da feat(tags): Add database schema and tags module (v1.3.0 Phase 1)
Implements tag/category system backend following microformats2 p-category specification.

Database changes:
- Migration 008: Add tags and note_tags tables
- Normalized tag storage (case-insensitive lookup, display name preserved)
- Indexes for performance

New module:
- starpunk/tags.py: Tag management functions
  - normalize_tag: Normalize tag strings
  - get_or_create_tag: Get or create tag records
  - add_tags_to_note: Associate tags with notes (replaces existing)
  - get_note_tags: Retrieve note tags (alphabetically ordered)
  - get_tag_by_name: Lookup tag by normalized name
  - get_notes_by_tag: Get all notes with specific tag
  - parse_tag_input: Parse comma-separated tag input

Model updates:
- Note.tags property (lazy-loaded, prefer pre-loading in routes)
- Note.to_dict() add include_tags parameter

CRUD updates:
- create_note() accepts tags parameter
- update_note() accepts tags parameter (None = no change, [] = remove all)

Micropub integration:
- Pass tags to create_note() (tags already extracted by extract_tags())
- Return tags in q=source response

Per design doc: docs/design/v1.3.0/microformats-tags-design.md

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-10 11:24:23 -07:00

12 KiB

StarPunk v1.1.2 "Syndicate" - Architecture Overview

Executive Summary

Version 1.1.2 "Syndicate" enhances StarPunk's content distribution capabilities by completing the metrics instrumentation from v1.1.1 and adding comprehensive feed format support. This release focuses on making content accessible to the widest possible audience through multiple syndication formats while maintaining visibility into system performance.

Architecture Goals

  1. Complete Observability: Fully instrument all system operations for performance monitoring
  2. Multi-Format Syndication: Support RSS, ATOM, and JSON Feed formats
  3. Efficient Generation: Stream-based feed generation for memory efficiency
  4. Content Negotiation: Smart format selection based on client preferences
  5. Caching Strategy: Minimize regeneration overhead
  6. Standards Compliance: Full adherence to feed specifications

System Architecture

Component Overview

┌─────────────────────────────────────────────────────────┐
│                    HTTP Request Layer                    │
│                          ↓                               │
│              ┌──────────────────────┐                   │
│              │  Content Negotiator   │                   │
│              │  (Accept header)      │                   │
│              └──────────┬───────────┘                   │
│                         ↓                                │
│         ┌───────────────┴────────────────┐              │
│         ↓               ↓                ↓              │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐        │
│   │   RSS    │    │   ATOM   │    │   JSON   │        │
│   │Generator │    │Generator │    │ Generator│        │
│   └────┬─────┘    └────┬─────┘    └────┬─────┘        │
│        └───────────────┬────────────────┘              │
│                        ↓                                │
│              ┌──────────────────────┐                   │
│              │   Feed Cache Layer   │                   │
│              │  (LRU with TTL)      │                   │
│              └──────────┬───────────┘                   │
│                         ↓                                │
│              ┌──────────────────────┐                   │
│              │    Data Layer        │                   │
│              │  (Notes Repository)  │                   │
│              └──────────┬───────────┘                   │
│                         ↓                                │
│              ┌──────────────────────┐                   │
│              │  Metrics Collector   │                   │
│              │  (All operations)    │                   │
│              └──────────────────────┘                   │
└─────────────────────────────────────────────────────────┘

Data Flow

  1. Request Processing

    • Client sends HTTP request with Accept header
    • Content negotiator determines optimal format
    • Check cache for existing feed
  2. Feed Generation

    • If cache miss, fetch notes from database
    • Generate feed using appropriate generator
    • Stream response to client
    • Update cache asynchronously
  3. Metrics Collection

    • Record request timing
    • Track cache hit/miss rates
    • Monitor generation performance
    • Log format popularity

Key Components

1. Metrics Instrumentation Layer

Purpose: Complete visibility into all system operations

Components:

  • Database operation timing (all queries)
  • HTTP request/response metrics
  • Memory monitoring thread
  • Business metrics (syndication stats)

Integration Points:

  • Database connection wrapper
  • Flask middleware hooks
  • Background thread for memory
  • Feed generation decorators

2. Content Negotiation Service

Purpose: Determine optimal feed format based on client preferences

Algorithm:

1. Parse Accept header
2. Score each format:
   - Exact match: 1.0
   - Wildcard match: 0.5
   - No match: 0.0
3. Consider quality factors (q=)
4. Return highest scoring format
5. Default to RSS if no preference

Supported MIME Types:

  • RSS: application/rss+xml, application/xml, text/xml
  • ATOM: application/atom+xml
  • JSON: application/json, application/feed+json

3. Feed Generators

Shared Interface:

class FeedGenerator(Protocol):
    def generate(self, notes: List[Note], config: FeedConfig) -> Iterator[str]:
        """Generate feed chunks"""

    def validate(self, feed_content: str) -> List[ValidationError]:
        """Validate generated feed"""

RSS Generator (existing, enhanced):

  • RSS 2.0 specification
  • Streaming generation
  • CDATA wrapping for HTML

ATOM Generator (new):

  • ATOM 1.0 specification
  • RFC 3339 date formatting
  • Author metadata support
  • Category/tag support

JSON Feed Generator (new):

  • JSON Feed 1.1 specification
  • Attachment support for media
  • Author object with avatar
  • Hub support for real-time

4. Feed Cache System

Purpose: Minimize regeneration overhead

Design:

  • LRU cache with configurable size
  • TTL-based expiration (default: 5 minutes)
  • Format-specific cache keys
  • Invalidation on note changes

Cache Key Structure:

feed:{format}:{limit}:{checksum}

Where checksum is based on:

  • Latest note timestamp
  • Total note count
  • Site configuration

5. Statistics Dashboard

Purpose: Track syndication performance and usage

Metrics Tracked:

  • Feed requests by format
  • Cache hit rates
  • Generation times
  • Client user agents
  • Geographic distribution (via IP)

Dashboard Location: /admin/syndication

6. OPML Export

Purpose: Allow users to share their feed collection

Implementation:

  • Generate OPML 2.0 document
  • Include all available feed formats
  • Add metadata (title, owner, date)

Performance Considerations

Memory Management

Streaming Generation:

  • Generate feeds in chunks
  • Yield results incrementally
  • Avoid loading all notes at once
  • Use generators throughout

Cache Sizing:

  • Monitor memory usage
  • Implement cache eviction
  • Configurable cache limits

Database Optimization

Query Optimization:

  • Index on published status
  • Index on created_at for ordering
  • Limit fetched columns
  • Use prepared statements

Connection Pooling:

  • Reuse database connections
  • Monitor pool usage
  • Track connection wait times

HTTP Optimization

Compression:

  • gzip for text formats (RSS, ATOM)
  • Already compact JSON Feed
  • Configurable compression level

Caching Headers:

  • ETag based on content hash
  • Last-Modified from latest note
  • Cache-Control with max-age

Security Considerations

Input Validation

  • Validate Accept headers
  • Sanitize format parameters
  • Limit feed size
  • Rate limit feed endpoints

Content Security

  • Escape XML entities properly
  • Valid JSON encoding
  • No script injection in feeds
  • CORS headers for JSON feeds

Resource Protection

  • Rate limiting per IP
  • Maximum feed items limit
  • Timeout for generation
  • Circuit breaker for database

Configuration

Feed Settings

# Feed generation
STARPUNK_FEED_DEFAULT_LIMIT = 50
STARPUNK_FEED_MAX_LIMIT = 500
STARPUNK_FEED_CACHE_TTL = 300  # seconds
STARPUNK_FEED_CACHE_SIZE = 100  # entries

# Format support
STARPUNK_FEED_RSS_ENABLED = true
STARPUNK_FEED_ATOM_ENABLED = true
STARPUNK_FEED_JSON_ENABLED = true

# Performance
STARPUNK_FEED_STREAMING = true
STARPUNK_FEED_COMPRESSION = true
STARPUNK_FEED_COMPRESSION_LEVEL = 6

Monitoring Settings

# Metrics collection
STARPUNK_METRICS_FEED_TIMING = true
STARPUNK_METRICS_CACHE_STATS = true
STARPUNK_METRICS_FORMAT_USAGE = true

# Dashboard
STARPUNK_SYNDICATION_DASHBOARD = true
STARPUNK_SYNDICATION_STATS_RETENTION = 7  # days

Testing Strategy

Unit Tests

  1. Content Negotiation

    • Accept header parsing
    • Format scoring algorithm
    • Default behavior
  2. Feed Generators

    • Valid output for each format
    • Streaming behavior
    • Error handling
  3. Cache System

    • LRU eviction
    • TTL expiration
    • Invalidation logic

Integration Tests

  1. End-to-End Feeds

    • Request with various Accept headers
    • Verify correct format returned
    • Check caching behavior
  2. Performance Tests

    • Measure generation time
    • Monitor memory usage
    • Verify streaming works
  3. Compliance Tests

    • Validate against feed specs
    • Test with popular feed readers
    • Check encoding edge cases

Migration Path

From v1.1.1 to v1.1.2

  1. Database: No schema changes required
  2. Configuration: New feed options (backward compatible)
  3. URLs: Existing /feed.xml continues to work
  4. Cache: New cache system, no migration needed

Rollback Plan

  1. Keep v1.1.1 database backup
  2. Configuration rollback script
  3. Clear feed cache
  4. Revert to previous version

Future Considerations

v1.2.0 Possibilities

  1. WebSub Support: Real-time feed updates
  2. Custom Feeds: User-defined filters
  3. Feed Analytics: Detailed reader statistics
  4. Podcast Support: Audio enclosures
  5. ActivityPub: Fediverse integration

Technical Debt

  1. Refactor feed module into package
  2. Extract cache to separate service
  3. Implement feed preview UI
  4. Add feed validation endpoint

Success Metrics

  1. Performance

    • Feed generation <100ms for 50 items
    • Cache hit rate >80%
    • Memory usage <10MB for feeds
  2. Compatibility

    • Works with 10 major feed readers
    • Passes all format validators
    • Zero regression on existing RSS
  3. Usage

    • 20% adoption of non-RSS formats
    • Reduced server load via caching
    • Positive user feedback

Risk Mitigation

Performance Risks

Risk: Feed generation slows down site Mitigation:

  • Streaming generation
  • Aggressive caching
  • Request timeouts
  • Rate limiting

Compatibility Risks

Risk: Feed readers reject new formats Mitigation:

  • Extensive testing with readers
  • Strict spec compliance
  • Format validation
  • Fallback to RSS

Operational Risks

Risk: Cache grows unbounded Mitigation:

  • LRU eviction
  • Size limits
  • Memory monitoring
  • Auto-cleanup

Conclusion

StarPunk v1.1.2 "Syndicate" creates a robust, standards-compliant syndication platform while completing the observability foundation started in v1.1.1. The architecture prioritizes performance through streaming and caching, compatibility through strict standards adherence, and maintainability through clean component separation.

The design balances feature richness with StarPunk's core philosophy of simplicity, adding only what's necessary to serve content to the widest possible audience while maintaining operational visibility.