- Add ADR-014: RSS Feed Implementation - Add ADR-015: Phase 5 Implementation Approach - Add Phase 5 design documents (RSS and container) - Add pre-implementation review - Add RSS and container validation reports - Add architectural approval for v0.6.0 release Architecture reviews confirm 98/100 (RSS) and 96/100 (container) scores. Phase 5 approved for production deployment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
11 KiB
ADR-014: RSS Feed Implementation Strategy
Status
Accepted
Context
Phase 5 requires implementing RSS feed generation for syndicating published notes. We need to decide on the implementation approach, feed format, caching strategy, and technical details for generating a standards-compliant RSS feed.
Requirements
- Standard Compliance: Feed must be valid RSS 2.0
- Content Inclusion: Include all published notes (up to configured limit)
- Performance: Feed generation should be fast and cacheable
- Simplicity: Minimal dependencies, straightforward implementation
- IndieWeb Friendly: Support feed discovery and proper metadata
Key Questions
- Which feed format(s) should we support?
- How should we generate the RSS XML?
- What caching strategy should we use?
- How should we handle note titles (notes may not have explicit titles)?
- How should we format dates for RSS?
- What should the feed item limit be?
Decision
1. Feed Format: RSS 2.0 Only (V1)
Choice: Implement RSS 2.0 exclusively for V1
Rationale:
- RSS 2.0 is widely supported by all feed readers
- Simpler than Atom (fewer required elements)
- Sufficient for V1 needs (notes syndication)
- feedgen library handles RSS 2.0 well
- Defer Atom and JSON Feed to V2+
Alternatives Considered:
- Atom 1.0: More modern, better extensibility
- Rejected: More complex, not needed for basic notes
- May add in V2
- JSON Feed: Developer-friendly format
- Rejected: Less universal support, not essential
- May add in V2
- Multiple formats: Support RSS + Atom + JSON
- Rejected: Adds complexity, not justified for V1
- Single format keeps implementation simple
2. XML Generation: feedgen Library
Choice: Use feedgen library (already in dependencies)
Rationale:
- Already dependency (used in architecture overview)
- Handles RSS/Atom generation correctly
- Produces valid, compliant XML
- Saves time vs. manual XML generation
- Well-maintained, stable library
Alternatives Considered:
- Manual XML generation (ElementTree or string templates)
- Rejected: Error-prone, easy to produce invalid XML
- Would need extensive validation
- PyRSS2Gen library
- Rejected: Last updated 2007, unmaintained
- Django Syndication Framework
- Rejected: Requires Django, too heavyweight
3. Feed Caching Strategy: Simple In-Memory Cache
Choice: 5-minute in-memory cache with ETag support
Implementation:
_feed_cache = {
'xml': None,
'timestamp': None,
'etag': None
}
# Cache for 5 minutes
if cache is fresh:
return cached_xml with ETag
else:
generate fresh feed
update cache
return new XML with new ETag
Rationale:
- 5 minutes is acceptable delay for note updates
- RSS readers typically poll every 15-60 minutes
- In-memory cache is simple (no external dependencies)
- ETag enables conditional requests
- Cache-Control header enables client-side caching
- Low complexity, easy to implement
Alternatives Considered:
- No caching: Generate on every request
- Rejected: Wasteful, feed generation involves DB + file reads
- Flask-Caching with Redis
- Rejected: Adds external dependency (Redis)
- Overkill for single-user system
- File-based cache
- Rejected: Complicates invalidation, I/O overhead
- Longer cache duration (30+ minutes)
- Rejected: Notes should appear reasonably quickly
- 5 minutes balances performance and freshness
4. Note Titles: First Line or Timestamp
Choice: Extract first line (max 100 chars) or use timestamp
Algorithm:
def get_note_title(note):
# Try first line
lines = note.content.strip().split('\n')
if lines:
title = lines[0].strip('#').strip()
if title:
return title[:100] # Truncate to 100 chars
# Fall back to timestamp
return note.created_at.strftime('%B %d, %Y at %I:%M %p')
Rationale:
- Notes (per IndieWeb spec) don't have required titles
- First line often serves as implicit title
- Timestamp fallback ensures every item has title
- 100 char limit prevents overly long titles
- Simple, deterministic algorithm
Alternatives Considered:
- Always use timestamp: Too generic, not descriptive
- Use content hash: Not human-friendly
- Require explicit title: Breaks note simplicity
- Use first sentence: Complex parsing, can be long
- Content preview (first 50 chars): May not be meaningful
5. Date Formatting: RFC-822
Choice: RFC-822 format as required by RSS 2.0 spec
Format: Mon, 18 Nov 2024 12:00:00 +0000
Implementation:
def format_rfc822_date(dt):
"""Format datetime to RFC-822"""
# Ensure UTC
dt_utc = dt.replace(tzinfo=timezone.utc)
# RFC-822 format
return dt_utc.strftime('%a, %d %b %Y %H:%M:%S %z')
Rationale:
- Required by RSS 2.0 specification
- Standard format recognized by all feed readers
- Python datetime supports formatting
- Always use UTC to avoid timezone confusion
Alternatives Considered:
- ISO 8601 format: Used by Atom, not valid for RSS 2.0
- Unix timestamp: Not human-readable, not standard
- Local timezone: Ambiguous, causes parsing issues
6. Feed Item Limit: 50 (Configurable)
Choice: Default limit of 50 items, configurable via FEED_MAX_ITEMS
Rationale:
- 50 items is sufficient for typical use (notes, not articles)
- RSS readers handle 50 items well
- Keeps feed size reasonable (< 100KB typical)
- Configurable for users with different needs
- Balances completeness and performance
Alternatives Considered:
- No limit: Feed could become very large
- Rejected: Performance issues, large XML
- Limit of 10-20: Too few, users might want more history
- Pagination: Complex, not well-supported by readers
- Deferred to V2 if needed
- Dynamic limit based on date: Complicated logic
7. Content Inclusion: Full HTML in CDATA
Choice: Include full rendered HTML content in CDATA wrapper
Format:
<description><![CDATA[
<p>Rendered HTML content here</p>
]]></description>
Rationale:
- RSS readers expect HTML in description
- CDATA prevents XML parsing issues
- Already have rendered HTML from markdown
- Provides full context to readers
- Standard practice for content-rich feeds
Alternatives Considered:
- Plain text only: Loses formatting
- Markdown in description: Not rendered by readers
- Summary/excerpt: Notes are short, full content appropriate
- External link only: Forces reader to leave feed
8. Feed Discovery: Standard Link Element
Choice: Add <link rel="alternate"> to all HTML pages
Implementation:
<link rel="alternate" type="application/rss+xml"
title="Site Name RSS Feed"
href="https://example.com/feed.xml">
Rationale:
- Standard HTML feed discovery mechanism
- RSS readers auto-detect feeds
- IndieWeb recommended practice
- No JavaScript required
- Works in all browsers
Alternatives Considered:
- No discovery: Users must know feed URL
- Rejected: Poor user experience
- JavaScript-based discovery: Unnecessary complexity
- HTTP Link header: Less common, harder to discover
Implementation Details
Module Structure
File: starpunk/feed.py
Functions:
generate_feed()- Main feed generationformat_rfc822_date()- Date formattingget_note_title()- Title extractionclean_html_for_rss()- HTML sanitization
Dependencies: feedgen library (already included)
Route
Path: /feed.xml
Handler: public.feed() in starpunk/routes/public.py
Caching: In-memory cache + ETag + Cache-Control
Configuration
Environment Variables:
FEED_MAX_ITEMS- Maximum feed items (default: 50)FEED_CACHE_SECONDS- Cache duration (default: 300)
Required Channel Elements
Per RSS 2.0 spec:
<title>- Site name<link>- Site URL<description>- Site description<language>- en-us<lastBuildDate>- Feed generation time<atom:link rel="self">- Feed URL (for discovery)
Required Item Elements
Per RSS 2.0 spec:
<title>- Note title<link>- Note permalink<guid isPermaLink="true">- Note permalink<pubDate>- Note publication date<description>- Full HTML content in CDATA
Consequences
Positive
- Standard Compliance: Valid RSS 2.0 feeds work everywhere
- Performance: Caching reduces load, fast responses
- Simplicity: Single feed format, straightforward implementation
- Reliability: feedgen library ensures valid XML
- Flexibility: Configurable limits accommodate different needs
- Discovery: Auto-detection in feed readers
- Complete Content: Full HTML in feed, no truncation
Negative
- Single Format: No Atom or JSON Feed in V1
- Mitigation: Can add in V2 if requested
- Fixed Cache Duration: Not dynamically adjusted
- Mitigation: 5 minutes is reasonable compromise
- Memory-Based Cache: Lost on restart
- Mitigation: Acceptable, regenerates quickly
- No Pagination: Large archives not fully accessible
- Mitigation: 50 items is sufficient for notes
Neutral
- Title Algorithm: May not always produce ideal titles
- Acceptable: Notes don't require titles, algorithm is reasonable
- UTC Timestamps: Users might prefer local time
- Standard: UTC is RSS standard practice
Validation
The decision will be validated by:
- W3C Feed Validator: Feed must pass without errors
- Feed Reader Testing: Test in multiple readers (Feedly, NewsBlur, etc.)
- Performance Testing: Feed generation < 100ms uncached
- Caching Testing: Cache reduces load, serves stale correctly
- Standards Review: RSS 2.0 spec compliance verification
Alternatives Rejected
Use Django Syndication Framework
Reason: Requires Django, which we're not using (Flask project)
Generate RSS Manually with Templates
Reason: Error-prone, hard to maintain, easy to produce invalid XML
Support Multiple Feed Formats in V1
Reason: Adds complexity without clear benefit, RSS 2.0 is sufficient
No Feed Caching
Reason: Wasteful, feed generation involves DB + file I/O
Per-Tag Feeds
Reason: V1 doesn't have tags, defer to V2
WebSub (PubSubHubbub) Support
Reason: Adds complexity, external dependency, not essential for V1
References
Standards
Libraries
IndieWeb
Internal Documentation
ADR: 014 Status: Accepted Date: 2025-11-18 Author: StarPunk Architect Related: ADR-002 (Flask Extensions), Phase 5 Design