- Add ADR-014: RSS Feed Implementation - Add ADR-015: Phase 5 Implementation Approach - Add Phase 5 design documents (RSS and container) - Add pre-implementation review - Add RSS and container validation reports - Add architectural approval for v0.6.0 release Architecture reviews confirm 98/100 (RSS) and 96/100 (container) scores. Phase 5 approved for production deployment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
378 lines
11 KiB
Markdown
378 lines
11 KiB
Markdown
# ADR-014: RSS Feed Implementation Strategy
|
|
|
|
## Status
|
|
|
|
Accepted
|
|
|
|
## Context
|
|
|
|
Phase 5 requires implementing RSS feed generation for syndicating published notes. We need to decide on the implementation approach, feed format, caching strategy, and technical details for generating a standards-compliant RSS feed.
|
|
|
|
### Requirements
|
|
|
|
1. **Standard Compliance**: Feed must be valid RSS 2.0
|
|
2. **Content Inclusion**: Include all published notes (up to configured limit)
|
|
3. **Performance**: Feed generation should be fast and cacheable
|
|
4. **Simplicity**: Minimal dependencies, straightforward implementation
|
|
5. **IndieWeb Friendly**: Support feed discovery and proper metadata
|
|
|
|
### Key Questions
|
|
|
|
1. Which feed format(s) should we support?
|
|
2. How should we generate the RSS XML?
|
|
3. What caching strategy should we use?
|
|
4. How should we handle note titles (notes may not have explicit titles)?
|
|
5. How should we format dates for RSS?
|
|
6. What should the feed item limit be?
|
|
|
|
## Decision
|
|
|
|
### 1. Feed Format: RSS 2.0 Only (V1)
|
|
|
|
**Choice**: Implement RSS 2.0 exclusively for V1
|
|
|
|
**Rationale**:
|
|
- RSS 2.0 is widely supported by all feed readers
|
|
- Simpler than Atom (fewer required elements)
|
|
- Sufficient for V1 needs (notes syndication)
|
|
- feedgen library handles RSS 2.0 well
|
|
- Defer Atom and JSON Feed to V2+
|
|
|
|
**Alternatives Considered**:
|
|
- **Atom 1.0**: More modern, better extensibility
|
|
- Rejected: More complex, not needed for basic notes
|
|
- May add in V2
|
|
- **JSON Feed**: Developer-friendly format
|
|
- Rejected: Less universal support, not essential
|
|
- May add in V2
|
|
- **Multiple formats**: Support RSS + Atom + JSON
|
|
- Rejected: Adds complexity, not justified for V1
|
|
- Single format keeps implementation simple
|
|
|
|
### 2. XML Generation: feedgen Library
|
|
|
|
**Choice**: Use feedgen library (already in dependencies)
|
|
|
|
**Rationale**:
|
|
- Already dependency (used in architecture overview)
|
|
- Handles RSS/Atom generation correctly
|
|
- Produces valid, compliant XML
|
|
- Saves time vs. manual XML generation
|
|
- Well-maintained, stable library
|
|
|
|
**Alternatives Considered**:
|
|
- **Manual XML generation** (ElementTree or string templates)
|
|
- Rejected: Error-prone, easy to produce invalid XML
|
|
- Would need extensive validation
|
|
- **PyRSS2Gen library**
|
|
- Rejected: Last updated 2007, unmaintained
|
|
- **Django Syndication Framework**
|
|
- Rejected: Requires Django, too heavyweight
|
|
|
|
### 3. Feed Caching Strategy: Simple In-Memory Cache
|
|
|
|
**Choice**: 5-minute in-memory cache with ETag support
|
|
|
|
**Implementation**:
|
|
```python
|
|
_feed_cache = {
|
|
'xml': None,
|
|
'timestamp': None,
|
|
'etag': None
|
|
}
|
|
|
|
# Cache for 5 minutes
|
|
if cache is fresh:
|
|
return cached_xml with ETag
|
|
else:
|
|
generate fresh feed
|
|
update cache
|
|
return new XML with new ETag
|
|
```
|
|
|
|
**Rationale**:
|
|
- 5 minutes is acceptable delay for note updates
|
|
- RSS readers typically poll every 15-60 minutes
|
|
- In-memory cache is simple (no external dependencies)
|
|
- ETag enables conditional requests
|
|
- Cache-Control header enables client-side caching
|
|
- Low complexity, easy to implement
|
|
|
|
**Alternatives Considered**:
|
|
- **No caching**: Generate on every request
|
|
- Rejected: Wasteful, feed generation involves DB + file reads
|
|
- **Flask-Caching with Redis**
|
|
- Rejected: Adds external dependency (Redis)
|
|
- Overkill for single-user system
|
|
- **File-based cache**
|
|
- Rejected: Complicates invalidation, I/O overhead
|
|
- **Longer cache duration** (30+ minutes)
|
|
- Rejected: Notes should appear reasonably quickly
|
|
- 5 minutes balances performance and freshness
|
|
|
|
### 4. Note Titles: First Line or Timestamp
|
|
|
|
**Choice**: Extract first line (max 100 chars) or use timestamp
|
|
|
|
**Algorithm**:
|
|
```python
|
|
def get_note_title(note):
|
|
# Try first line
|
|
lines = note.content.strip().split('\n')
|
|
if lines:
|
|
title = lines[0].strip('#').strip()
|
|
if title:
|
|
return title[:100] # Truncate to 100 chars
|
|
|
|
# Fall back to timestamp
|
|
return note.created_at.strftime('%B %d, %Y at %I:%M %p')
|
|
```
|
|
|
|
**Rationale**:
|
|
- Notes (per IndieWeb spec) don't have required titles
|
|
- First line often serves as implicit title
|
|
- Timestamp fallback ensures every item has title
|
|
- 100 char limit prevents overly long titles
|
|
- Simple, deterministic algorithm
|
|
|
|
**Alternatives Considered**:
|
|
- **Always use timestamp**: Too generic, not descriptive
|
|
- **Use content hash**: Not human-friendly
|
|
- **Require explicit title**: Breaks note simplicity
|
|
- **Use first sentence**: Complex parsing, can be long
|
|
- **Content preview (first 50 chars)**: May not be meaningful
|
|
|
|
### 5. Date Formatting: RFC-822
|
|
|
|
**Choice**: RFC-822 format as required by RSS 2.0 spec
|
|
|
|
**Format**: `Mon, 18 Nov 2024 12:00:00 +0000`
|
|
|
|
**Implementation**:
|
|
```python
|
|
def format_rfc822_date(dt):
|
|
"""Format datetime to RFC-822"""
|
|
# Ensure UTC
|
|
dt_utc = dt.replace(tzinfo=timezone.utc)
|
|
# RFC-822 format
|
|
return dt_utc.strftime('%a, %d %b %Y %H:%M:%S %z')
|
|
```
|
|
|
|
**Rationale**:
|
|
- Required by RSS 2.0 specification
|
|
- Standard format recognized by all feed readers
|
|
- Python datetime supports formatting
|
|
- Always use UTC to avoid timezone confusion
|
|
|
|
**Alternatives Considered**:
|
|
- **ISO 8601 format**: Used by Atom, not valid for RSS 2.0
|
|
- **Unix timestamp**: Not human-readable, not standard
|
|
- **Local timezone**: Ambiguous, causes parsing issues
|
|
|
|
### 6. Feed Item Limit: 50 (Configurable)
|
|
|
|
**Choice**: Default limit of 50 items, configurable via FEED_MAX_ITEMS
|
|
|
|
**Rationale**:
|
|
- 50 items is sufficient for typical use (notes, not articles)
|
|
- RSS readers handle 50 items well
|
|
- Keeps feed size reasonable (< 100KB typical)
|
|
- Configurable for users with different needs
|
|
- Balances completeness and performance
|
|
|
|
**Alternatives Considered**:
|
|
- **No limit**: Feed could become very large
|
|
- Rejected: Performance issues, large XML
|
|
- **Limit of 10-20**: Too few, users might want more history
|
|
- **Pagination**: Complex, not well-supported by readers
|
|
- Deferred to V2 if needed
|
|
- **Dynamic limit based on date**: Complicated logic
|
|
|
|
### 7. Content Inclusion: Full HTML in CDATA
|
|
|
|
**Choice**: Include full rendered HTML content in CDATA wrapper
|
|
|
|
**Format**:
|
|
```xml
|
|
<description><![CDATA[
|
|
<p>Rendered HTML content here</p>
|
|
]]></description>
|
|
```
|
|
|
|
**Rationale**:
|
|
- RSS readers expect HTML in description
|
|
- CDATA prevents XML parsing issues
|
|
- Already have rendered HTML from markdown
|
|
- Provides full context to readers
|
|
- Standard practice for content-rich feeds
|
|
|
|
**Alternatives Considered**:
|
|
- **Plain text only**: Loses formatting
|
|
- **Markdown in description**: Not rendered by readers
|
|
- **Summary/excerpt**: Notes are short, full content appropriate
|
|
- **External link only**: Forces reader to leave feed
|
|
|
|
### 8. Feed Discovery: Standard Link Element
|
|
|
|
**Choice**: Add `<link rel="alternate">` to all HTML pages
|
|
|
|
**Implementation**:
|
|
```html
|
|
<link rel="alternate" type="application/rss+xml"
|
|
title="Site Name RSS Feed"
|
|
href="https://example.com/feed.xml">
|
|
```
|
|
|
|
**Rationale**:
|
|
- Standard HTML feed discovery mechanism
|
|
- RSS readers auto-detect feeds
|
|
- IndieWeb recommended practice
|
|
- No JavaScript required
|
|
- Works in all browsers
|
|
|
|
**Alternatives Considered**:
|
|
- **No discovery**: Users must know feed URL
|
|
- Rejected: Poor user experience
|
|
- **JavaScript-based discovery**: Unnecessary complexity
|
|
- **HTTP Link header**: Less common, harder to discover
|
|
|
|
## Implementation Details
|
|
|
|
### Module Structure
|
|
|
|
**File**: `starpunk/feed.py`
|
|
|
|
**Functions**:
|
|
1. `generate_feed()` - Main feed generation
|
|
2. `format_rfc822_date()` - Date formatting
|
|
3. `get_note_title()` - Title extraction
|
|
4. `clean_html_for_rss()` - HTML sanitization
|
|
|
|
**Dependencies**: feedgen library (already included)
|
|
|
|
### Route
|
|
|
|
**Path**: `/feed.xml`
|
|
|
|
**Handler**: `public.feed()` in `starpunk/routes/public.py`
|
|
|
|
**Caching**: In-memory cache + ETag + Cache-Control
|
|
|
|
### Configuration
|
|
|
|
**Environment Variables**:
|
|
- `FEED_MAX_ITEMS` - Maximum feed items (default: 50)
|
|
- `FEED_CACHE_SECONDS` - Cache duration (default: 300)
|
|
|
|
### Required Channel Elements
|
|
|
|
Per RSS 2.0 spec:
|
|
- `<title>` - Site name
|
|
- `<link>` - Site URL
|
|
- `<description>` - Site description
|
|
- `<language>` - en-us
|
|
- `<lastBuildDate>` - Feed generation time
|
|
- `<atom:link rel="self">` - Feed URL (for discovery)
|
|
|
|
### Required Item Elements
|
|
|
|
Per RSS 2.0 spec:
|
|
- `<title>` - Note title
|
|
- `<link>` - Note permalink
|
|
- `<guid isPermaLink="true">` - Note permalink
|
|
- `<pubDate>` - Note publication date
|
|
- `<description>` - Full HTML content in CDATA
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
1. **Standard Compliance**: Valid RSS 2.0 feeds work everywhere
|
|
2. **Performance**: Caching reduces load, fast responses
|
|
3. **Simplicity**: Single feed format, straightforward implementation
|
|
4. **Reliability**: feedgen library ensures valid XML
|
|
5. **Flexibility**: Configurable limits accommodate different needs
|
|
6. **Discovery**: Auto-detection in feed readers
|
|
7. **Complete Content**: Full HTML in feed, no truncation
|
|
|
|
### Negative
|
|
|
|
1. **Single Format**: No Atom or JSON Feed in V1
|
|
- Mitigation: Can add in V2 if requested
|
|
2. **Fixed Cache Duration**: Not dynamically adjusted
|
|
- Mitigation: 5 minutes is reasonable compromise
|
|
3. **Memory-Based Cache**: Lost on restart
|
|
- Mitigation: Acceptable, regenerates quickly
|
|
4. **No Pagination**: Large archives not fully accessible
|
|
- Mitigation: 50 items is sufficient for notes
|
|
|
|
### Neutral
|
|
|
|
1. **Title Algorithm**: May not always produce ideal titles
|
|
- Acceptable: Notes don't require titles, algorithm is reasonable
|
|
2. **UTC Timestamps**: Users might prefer local time
|
|
- Standard: UTC is RSS standard practice
|
|
|
|
## Validation
|
|
|
|
The decision will be validated by:
|
|
|
|
1. **W3C Feed Validator**: Feed must pass without errors
|
|
2. **Feed Reader Testing**: Test in multiple readers (Feedly, NewsBlur, etc.)
|
|
3. **Performance Testing**: Feed generation < 100ms uncached
|
|
4. **Caching Testing**: Cache reduces load, serves stale correctly
|
|
5. **Standards Review**: RSS 2.0 spec compliance verification
|
|
|
|
## Alternatives Rejected
|
|
|
|
### Use Django Syndication Framework
|
|
|
|
**Reason**: Requires Django, which we're not using (Flask project)
|
|
|
|
### Generate RSS Manually with Templates
|
|
|
|
**Reason**: Error-prone, hard to maintain, easy to produce invalid XML
|
|
|
|
### Support Multiple Feed Formats in V1
|
|
|
|
**Reason**: Adds complexity without clear benefit, RSS 2.0 is sufficient
|
|
|
|
### No Feed Caching
|
|
|
|
**Reason**: Wasteful, feed generation involves DB + file I/O
|
|
|
|
### Per-Tag Feeds
|
|
|
|
**Reason**: V1 doesn't have tags, defer to V2
|
|
|
|
### WebSub (PubSubHubbub) Support
|
|
|
|
**Reason**: Adds complexity, external dependency, not essential for V1
|
|
|
|
## References
|
|
|
|
### Standards
|
|
- [RSS 2.0 Specification](https://www.rssboard.org/rss-specification)
|
|
- [RFC-822 Date Format](https://www.rfc-editor.org/rfc/rfc822)
|
|
- [W3C Feed Validator](https://validator.w3.org/feed/)
|
|
|
|
### Libraries
|
|
- [feedgen Documentation](https://feedgen.kiesow.be/)
|
|
- [Python datetime Documentation](https://docs.python.org/3/library/datetime.html)
|
|
|
|
### IndieWeb
|
|
- [IndieWeb RSS](https://indieweb.org/RSS)
|
|
- [Feed Discovery](https://indieweb.org/feed_discovery)
|
|
|
|
### Internal Documentation
|
|
- [Architecture Overview](/home/phil/Projects/starpunk/docs/architecture/overview.md)
|
|
- [Phase 5 Design](/home/phil/Projects/starpunk/docs/designs/phase-5-rss-and-container.md)
|
|
|
|
---
|
|
|
|
**ADR**: 014
|
|
**Status**: Accepted
|
|
**Date**: 2025-11-18
|
|
**Author**: StarPunk Architect
|
|
**Related**: ADR-002 (Flask Extensions), Phase 5 Design
|