Files
StarPunk/docs/decisions/ADR-014-rss-feed-implementation.md
Phil Skentelbery 6863bcae67 docs: add Phase 5 design and architectural review documentation
- Add ADR-014: RSS Feed Implementation
- Add ADR-015: Phase 5 Implementation Approach
- Add Phase 5 design documents (RSS and container)
- Add pre-implementation review
- Add RSS and container validation reports
- Add architectural approval for v0.6.0 release

Architecture reviews confirm 98/100 (RSS) and 96/100 (container) scores.
Phase 5 approved for production deployment.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 10:30:55 -07:00

378 lines
11 KiB
Markdown

# ADR-014: RSS Feed Implementation Strategy
## Status
Accepted
## Context
Phase 5 requires implementing RSS feed generation for syndicating published notes. We need to decide on the implementation approach, feed format, caching strategy, and technical details for generating a standards-compliant RSS feed.
### Requirements
1. **Standard Compliance**: Feed must be valid RSS 2.0
2. **Content Inclusion**: Include all published notes (up to configured limit)
3. **Performance**: Feed generation should be fast and cacheable
4. **Simplicity**: Minimal dependencies, straightforward implementation
5. **IndieWeb Friendly**: Support feed discovery and proper metadata
### Key Questions
1. Which feed format(s) should we support?
2. How should we generate the RSS XML?
3. What caching strategy should we use?
4. How should we handle note titles (notes may not have explicit titles)?
5. How should we format dates for RSS?
6. What should the feed item limit be?
## Decision
### 1. Feed Format: RSS 2.0 Only (V1)
**Choice**: Implement RSS 2.0 exclusively for V1
**Rationale**:
- RSS 2.0 is widely supported by all feed readers
- Simpler than Atom (fewer required elements)
- Sufficient for V1 needs (notes syndication)
- feedgen library handles RSS 2.0 well
- Defer Atom and JSON Feed to V2+
**Alternatives Considered**:
- **Atom 1.0**: More modern, better extensibility
- Rejected: More complex, not needed for basic notes
- May add in V2
- **JSON Feed**: Developer-friendly format
- Rejected: Less universal support, not essential
- May add in V2
- **Multiple formats**: Support RSS + Atom + JSON
- Rejected: Adds complexity, not justified for V1
- Single format keeps implementation simple
### 2. XML Generation: feedgen Library
**Choice**: Use feedgen library (already in dependencies)
**Rationale**:
- Already dependency (used in architecture overview)
- Handles RSS/Atom generation correctly
- Produces valid, compliant XML
- Saves time vs. manual XML generation
- Well-maintained, stable library
**Alternatives Considered**:
- **Manual XML generation** (ElementTree or string templates)
- Rejected: Error-prone, easy to produce invalid XML
- Would need extensive validation
- **PyRSS2Gen library**
- Rejected: Last updated 2007, unmaintained
- **Django Syndication Framework**
- Rejected: Requires Django, too heavyweight
### 3. Feed Caching Strategy: Simple In-Memory Cache
**Choice**: 5-minute in-memory cache with ETag support
**Implementation**:
```python
_feed_cache = {
'xml': None,
'timestamp': None,
'etag': None
}
# Cache for 5 minutes
if cache is fresh:
return cached_xml with ETag
else:
generate fresh feed
update cache
return new XML with new ETag
```
**Rationale**:
- 5 minutes is acceptable delay for note updates
- RSS readers typically poll every 15-60 minutes
- In-memory cache is simple (no external dependencies)
- ETag enables conditional requests
- Cache-Control header enables client-side caching
- Low complexity, easy to implement
**Alternatives Considered**:
- **No caching**: Generate on every request
- Rejected: Wasteful, feed generation involves DB + file reads
- **Flask-Caching with Redis**
- Rejected: Adds external dependency (Redis)
- Overkill for single-user system
- **File-based cache**
- Rejected: Complicates invalidation, I/O overhead
- **Longer cache duration** (30+ minutes)
- Rejected: Notes should appear reasonably quickly
- 5 minutes balances performance and freshness
### 4. Note Titles: First Line or Timestamp
**Choice**: Extract first line (max 100 chars) or use timestamp
**Algorithm**:
```python
def get_note_title(note):
# Try first line
lines = note.content.strip().split('\n')
if lines:
title = lines[0].strip('#').strip()
if title:
return title[:100] # Truncate to 100 chars
# Fall back to timestamp
return note.created_at.strftime('%B %d, %Y at %I:%M %p')
```
**Rationale**:
- Notes (per IndieWeb spec) don't have required titles
- First line often serves as implicit title
- Timestamp fallback ensures every item has title
- 100 char limit prevents overly long titles
- Simple, deterministic algorithm
**Alternatives Considered**:
- **Always use timestamp**: Too generic, not descriptive
- **Use content hash**: Not human-friendly
- **Require explicit title**: Breaks note simplicity
- **Use first sentence**: Complex parsing, can be long
- **Content preview (first 50 chars)**: May not be meaningful
### 5. Date Formatting: RFC-822
**Choice**: RFC-822 format as required by RSS 2.0 spec
**Format**: `Mon, 18 Nov 2024 12:00:00 +0000`
**Implementation**:
```python
def format_rfc822_date(dt):
"""Format datetime to RFC-822"""
# Ensure UTC
dt_utc = dt.replace(tzinfo=timezone.utc)
# RFC-822 format
return dt_utc.strftime('%a, %d %b %Y %H:%M:%S %z')
```
**Rationale**:
- Required by RSS 2.0 specification
- Standard format recognized by all feed readers
- Python datetime supports formatting
- Always use UTC to avoid timezone confusion
**Alternatives Considered**:
- **ISO 8601 format**: Used by Atom, not valid for RSS 2.0
- **Unix timestamp**: Not human-readable, not standard
- **Local timezone**: Ambiguous, causes parsing issues
### 6. Feed Item Limit: 50 (Configurable)
**Choice**: Default limit of 50 items, configurable via FEED_MAX_ITEMS
**Rationale**:
- 50 items is sufficient for typical use (notes, not articles)
- RSS readers handle 50 items well
- Keeps feed size reasonable (< 100KB typical)
- Configurable for users with different needs
- Balances completeness and performance
**Alternatives Considered**:
- **No limit**: Feed could become very large
- Rejected: Performance issues, large XML
- **Limit of 10-20**: Too few, users might want more history
- **Pagination**: Complex, not well-supported by readers
- Deferred to V2 if needed
- **Dynamic limit based on date**: Complicated logic
### 7. Content Inclusion: Full HTML in CDATA
**Choice**: Include full rendered HTML content in CDATA wrapper
**Format**:
```xml
<description><![CDATA[
<p>Rendered HTML content here</p>
]]></description>
```
**Rationale**:
- RSS readers expect HTML in description
- CDATA prevents XML parsing issues
- Already have rendered HTML from markdown
- Provides full context to readers
- Standard practice for content-rich feeds
**Alternatives Considered**:
- **Plain text only**: Loses formatting
- **Markdown in description**: Not rendered by readers
- **Summary/excerpt**: Notes are short, full content appropriate
- **External link only**: Forces reader to leave feed
### 8. Feed Discovery: Standard Link Element
**Choice**: Add `<link rel="alternate">` to all HTML pages
**Implementation**:
```html
<link rel="alternate" type="application/rss+xml"
title="Site Name RSS Feed"
href="https://example.com/feed.xml">
```
**Rationale**:
- Standard HTML feed discovery mechanism
- RSS readers auto-detect feeds
- IndieWeb recommended practice
- No JavaScript required
- Works in all browsers
**Alternatives Considered**:
- **No discovery**: Users must know feed URL
- Rejected: Poor user experience
- **JavaScript-based discovery**: Unnecessary complexity
- **HTTP Link header**: Less common, harder to discover
## Implementation Details
### Module Structure
**File**: `starpunk/feed.py`
**Functions**:
1. `generate_feed()` - Main feed generation
2. `format_rfc822_date()` - Date formatting
3. `get_note_title()` - Title extraction
4. `clean_html_for_rss()` - HTML sanitization
**Dependencies**: feedgen library (already included)
### Route
**Path**: `/feed.xml`
**Handler**: `public.feed()` in `starpunk/routes/public.py`
**Caching**: In-memory cache + ETag + Cache-Control
### Configuration
**Environment Variables**:
- `FEED_MAX_ITEMS` - Maximum feed items (default: 50)
- `FEED_CACHE_SECONDS` - Cache duration (default: 300)
### Required Channel Elements
Per RSS 2.0 spec:
- `<title>` - Site name
- `<link>` - Site URL
- `<description>` - Site description
- `<language>` - en-us
- `<lastBuildDate>` - Feed generation time
- `<atom:link rel="self">` - Feed URL (for discovery)
### Required Item Elements
Per RSS 2.0 spec:
- `<title>` - Note title
- `<link>` - Note permalink
- `<guid isPermaLink="true">` - Note permalink
- `<pubDate>` - Note publication date
- `<description>` - Full HTML content in CDATA
## Consequences
### Positive
1. **Standard Compliance**: Valid RSS 2.0 feeds work everywhere
2. **Performance**: Caching reduces load, fast responses
3. **Simplicity**: Single feed format, straightforward implementation
4. **Reliability**: feedgen library ensures valid XML
5. **Flexibility**: Configurable limits accommodate different needs
6. **Discovery**: Auto-detection in feed readers
7. **Complete Content**: Full HTML in feed, no truncation
### Negative
1. **Single Format**: No Atom or JSON Feed in V1
- Mitigation: Can add in V2 if requested
2. **Fixed Cache Duration**: Not dynamically adjusted
- Mitigation: 5 minutes is reasonable compromise
3. **Memory-Based Cache**: Lost on restart
- Mitigation: Acceptable, regenerates quickly
4. **No Pagination**: Large archives not fully accessible
- Mitigation: 50 items is sufficient for notes
### Neutral
1. **Title Algorithm**: May not always produce ideal titles
- Acceptable: Notes don't require titles, algorithm is reasonable
2. **UTC Timestamps**: Users might prefer local time
- Standard: UTC is RSS standard practice
## Validation
The decision will be validated by:
1. **W3C Feed Validator**: Feed must pass without errors
2. **Feed Reader Testing**: Test in multiple readers (Feedly, NewsBlur, etc.)
3. **Performance Testing**: Feed generation < 100ms uncached
4. **Caching Testing**: Cache reduces load, serves stale correctly
5. **Standards Review**: RSS 2.0 spec compliance verification
## Alternatives Rejected
### Use Django Syndication Framework
**Reason**: Requires Django, which we're not using (Flask project)
### Generate RSS Manually with Templates
**Reason**: Error-prone, hard to maintain, easy to produce invalid XML
### Support Multiple Feed Formats in V1
**Reason**: Adds complexity without clear benefit, RSS 2.0 is sufficient
### No Feed Caching
**Reason**: Wasteful, feed generation involves DB + file I/O
### Per-Tag Feeds
**Reason**: V1 doesn't have tags, defer to V2
### WebSub (PubSubHubbub) Support
**Reason**: Adds complexity, external dependency, not essential for V1
## References
### Standards
- [RSS 2.0 Specification](https://www.rssboard.org/rss-specification)
- [RFC-822 Date Format](https://www.rfc-editor.org/rfc/rfc822)
- [W3C Feed Validator](https://validator.w3.org/feed/)
### Libraries
- [feedgen Documentation](https://feedgen.kiesow.be/)
- [Python datetime Documentation](https://docs.python.org/3/library/datetime.html)
### IndieWeb
- [IndieWeb RSS](https://indieweb.org/RSS)
- [Feed Discovery](https://indieweb.org/feed_discovery)
### Internal Documentation
- [Architecture Overview](/home/phil/Projects/starpunk/docs/architecture/overview.md)
- [Phase 5 Design](/home/phil/Projects/starpunk/docs/designs/phase-5-rss-and-container.md)
---
**ADR**: 014
**Status**: Accepted
**Date**: 2025-11-18
**Author**: StarPunk Architect
**Related**: ADR-002 (Flask Extensions), Phase 5 Design