Files

Phil Skentelbery f10d0679da feat(tags): Add database schema and tags module (v1.3.0 Phase 1)

Implements tag/category system backend following microformats2 p-category specification.

Database changes:
- Migration 008: Add tags and note_tags tables
- Normalized tag storage (case-insensitive lookup, display name preserved)
- Indexes for performance

New module:
- starpunk/tags.py: Tag management functions
  - normalize_tag: Normalize tag strings
  - get_or_create_tag: Get or create tag records
  - add_tags_to_note: Associate tags with notes (replaces existing)
  - get_note_tags: Retrieve note tags (alphabetically ordered)
  - get_tag_by_name: Lookup tag by normalized name
  - get_notes_by_tag: Get all notes with specific tag
  - parse_tag_input: Parse comma-separated tag input

Model updates:
- Note.tags property (lazy-loaded, prefer pre-loading in routes)
- Note.to_dict() add include_tags parameter

CRUD updates:
- create_note() accepts tags parameter
- update_note() accepts tags parameter (None = no change, [] = remove all)

Micropub integration:
- Pass tags to create_note() (tags already extracted by extract_tags())
- Return tags in q=source response

Per design doc: docs/design/v1.3.0/microformats-tags-design.md

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-10 11:24:23 -07:00

12 KiB

Raw Blame History

StarPunk v1.1.2 "Syndicate" - Architecture Overview

Executive Summary

Version 1.1.2 "Syndicate" enhances StarPunk's content distribution capabilities by completing the metrics instrumentation from v1.1.1 and adding comprehensive feed format support. This release focuses on making content accessible to the widest possible audience through multiple syndication formats while maintaining visibility into system performance.

Architecture Goals

Complete Observability: Fully instrument all system operations for performance monitoring
Multi-Format Syndication: Support RSS, ATOM, and JSON Feed formats
Efficient Generation: Stream-based feed generation for memory efficiency
Content Negotiation: Smart format selection based on client preferences
Caching Strategy: Minimize regeneration overhead
Standards Compliance: Full adherence to feed specifications

System Architecture

Component Overview

┌─────────────────────────────────────────────────────────┐
│                    HTTP Request Layer                    │
│                          ↓                               │
│              ┌──────────────────────┐                   │
│              │  Content Negotiator   │                   │
│              │  (Accept header)      │                   │
│              └──────────┬───────────┘                   │
│                         ↓                                │
│         ┌───────────────┴────────────────┐              │
│         ↓               ↓                ↓              │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐        │
│   │   RSS    │    │   ATOM   │    │   JSON   │        │
│   │Generator │    │Generator │    │ Generator│        │
│   └────┬─────┘    └────┬─────┘    └────┬─────┘        │
│        └───────────────┬────────────────┘              │
│                        ↓                                │
│              ┌──────────────────────┐                   │
│              │   Feed Cache Layer   │                   │
│              │  (LRU with TTL)      │                   │
│              └──────────┬───────────┘                   │
│                         ↓                                │
│              ┌──────────────────────┐                   │
│              │    Data Layer        │                   │
│              │  (Notes Repository)  │                   │
│              └──────────┬───────────┘                   │
│                         ↓                                │
│              ┌──────────────────────┐                   │
│              │  Metrics Collector   │                   │
│              │  (All operations)    │                   │
│              └──────────────────────┘                   │
└─────────────────────────────────────────────────────────┘

Data Flow

Request Processing
- Client sends HTTP request with Accept header
- Content negotiator determines optimal format
- Check cache for existing feed
Feed Generation
- If cache miss, fetch notes from database
- Generate feed using appropriate generator
- Stream response to client
- Update cache asynchronously
Metrics Collection
- Record request timing
- Track cache hit/miss rates
- Monitor generation performance
- Log format popularity

Key Components

1. Metrics Instrumentation Layer

Purpose: Complete visibility into all system operations

Components:

Database operation timing (all queries)
HTTP request/response metrics
Memory monitoring thread
Business metrics (syndication stats)

Integration Points:

Database connection wrapper
Flask middleware hooks
Background thread for memory
Feed generation decorators

2. Content Negotiation Service

Purpose: Determine optimal feed format based on client preferences

Algorithm:

1. Parse Accept header
2. Score each format:
   - Exact match: 1.0
   - Wildcard match: 0.5
   - No match: 0.0
3. Consider quality factors (q=)
4. Return highest scoring format
5. Default to RSS if no preference

Supported MIME Types:

RSS: application/rss+xml, application/xml, text/xml
ATOM: application/atom+xml
JSON: application/json, application/feed+json

3. Feed Generators

Shared Interface:

class FeedGenerator(Protocol):
    def generate(self, notes: List[Note], config: FeedConfig) -> Iterator[str]:
        """Generate feed chunks"""

    def validate(self, feed_content: str) -> List[ValidationError]:
        """Validate generated feed"""

RSS Generator (existing, enhanced):

RSS 2.0 specification
Streaming generation
CDATA wrapping for HTML

ATOM Generator (new):

ATOM 1.0 specification
RFC 3339 date formatting
Author metadata support
Category/tag support

JSON Feed Generator (new):

JSON Feed 1.1 specification
Attachment support for media
Author object with avatar
Hub support for real-time

4. Feed Cache System

Purpose: Minimize regeneration overhead

Design:

LRU cache with configurable size
TTL-based expiration (default: 5 minutes)
Format-specific cache keys
Invalidation on note changes

Cache Key Structure:

feed:{format}:{limit}:{checksum}

Where checksum is based on:

Latest note timestamp
Total note count
Site configuration

5. Statistics Dashboard

Purpose: Track syndication performance and usage

Metrics Tracked:

Feed requests by format
Cache hit rates
Generation times
Client user agents
Geographic distribution (via IP)

Dashboard Location: /admin/syndication

6. OPML Export

Purpose: Allow users to share their feed collection

Implementation:

Generate OPML 2.0 document
Include all available feed formats
Add metadata (title, owner, date)

Performance Considerations

Memory Management

Streaming Generation:

Generate feeds in chunks
Yield results incrementally
Avoid loading all notes at once
Use generators throughout

Cache Sizing:

Monitor memory usage
Implement cache eviction
Configurable cache limits

Database Optimization

Query Optimization:

Index on published status
Index on created_at for ordering
Limit fetched columns
Use prepared statements

Connection Pooling:

Reuse database connections
Monitor pool usage
Track connection wait times

HTTP Optimization

Compression:

gzip for text formats (RSS, ATOM)
Already compact JSON Feed
Configurable compression level

Caching Headers:

ETag based on content hash
Last-Modified from latest note
Cache-Control with max-age

Security Considerations

Input Validation

Validate Accept headers
Sanitize format parameters
Limit feed size
Rate limit feed endpoints

Content Security

Escape XML entities properly
Valid JSON encoding
No script injection in feeds
CORS headers for JSON feeds

Resource Protection

Rate limiting per IP
Maximum feed items limit
Timeout for generation
Circuit breaker for database

Configuration

Feed Settings

# Feed generation
STARPUNK_FEED_DEFAULT_LIMIT = 50
STARPUNK_FEED_MAX_LIMIT = 500
STARPUNK_FEED_CACHE_TTL = 300  # seconds
STARPUNK_FEED_CACHE_SIZE = 100  # entries

# Format support
STARPUNK_FEED_RSS_ENABLED = true
STARPUNK_FEED_ATOM_ENABLED = true
STARPUNK_FEED_JSON_ENABLED = true

# Performance
STARPUNK_FEED_STREAMING = true
STARPUNK_FEED_COMPRESSION = true
STARPUNK_FEED_COMPRESSION_LEVEL = 6

Monitoring Settings

# Metrics collection
STARPUNK_METRICS_FEED_TIMING = true
STARPUNK_METRICS_CACHE_STATS = true
STARPUNK_METRICS_FORMAT_USAGE = true

# Dashboard
STARPUNK_SYNDICATION_DASHBOARD = true
STARPUNK_SYNDICATION_STATS_RETENTION = 7  # days

Testing Strategy

Unit Tests

Content Negotiation
- Accept header parsing
- Format scoring algorithm
- Default behavior
Feed Generators
- Valid output for each format
- Streaming behavior
- Error handling
Cache System
- LRU eviction
- TTL expiration
- Invalidation logic

Integration Tests

End-to-End Feeds
- Request with various Accept headers
- Verify correct format returned
- Check caching behavior
Performance Tests
- Measure generation time
- Monitor memory usage
- Verify streaming works
Compliance Tests
- Validate against feed specs
- Test with popular feed readers
- Check encoding edge cases

Migration Path

From v1.1.1 to v1.1.2

Database: No schema changes required
Configuration: New feed options (backward compatible)
URLs: Existing /feed.xml continues to work
Cache: New cache system, no migration needed

Rollback Plan

Keep v1.1.1 database backup
Configuration rollback script
Clear feed cache
Revert to previous version

Future Considerations

v1.2.0 Possibilities

WebSub Support: Real-time feed updates
Custom Feeds: User-defined filters
Feed Analytics: Detailed reader statistics
Podcast Support: Audio enclosures
ActivityPub: Fediverse integration

Technical Debt

Refactor feed module into package
Extract cache to separate service
Implement feed preview UI
Add feed validation endpoint

Success Metrics

Performance
- Feed generation <100ms for 50 items
- Cache hit rate >80%
- Memory usage <10MB for feeds
Compatibility
- Works with 10 major feed readers
- Passes all format validators
- Zero regression on existing RSS
Usage
- 20% adoption of non-RSS formats
- Reduced server load via caching
- Positive user feedback

Risk Mitigation

Performance Risks

Risk: Feed generation slows down site Mitigation:

Streaming generation
Aggressive caching
Request timeouts
Rate limiting

Compatibility Risks

Risk: Feed readers reject new formats Mitigation:

Extensive testing with readers
Strict spec compliance
Format validation
Fallback to RSS

Operational Risks

Risk: Cache grows unbounded Mitigation:

LRU eviction
Size limits
Memory monitoring
Auto-cleanup

Conclusion

StarPunk v1.1.2 "Syndicate" creates a robust, standards-compliant syndication platform while completing the observability foundation started in v1.1.1. The architecture prioritizes performance through streaming and caching, compatibility through strict standards adherence, and maintainability through clean component separation.

The design balances feature richness with StarPunk's core philosophy of simplicity, adding only what's necessary to serve content to the widest possible audience while maintaining operational visibility.

12 KiB Raw Blame History