feat: Implement Phase 2 Feed Formats - ATOM, JSON Feed, RSS fix (Phases 2.0-2.3)
This commit implements the first three phases of v1.1.2 Phase 2 Feed Formats, adding ATOM 1.0 and JSON Feed 1.1 support alongside the existing RSS feed. CRITICAL BUG FIX: - Fixed RSS streaming feed ordering (was showing oldest-first instead of newest-first) - Streaming RSS removed incorrect reversed() call at line 198 - Feedgen RSS kept correct reversed() to compensate for library behavior NEW FEATURES: - ATOM 1.0 feed generation (RFC 4287 compliant) - Proper XML namespacing and RFC 3339 dates - Streaming and non-streaming methods - 11 comprehensive tests - JSON Feed 1.1 generation (JSON Feed spec compliant) - RFC 3339 dates and UTF-8 JSON output - Custom _starpunk extension with permalink_path and word_count - 13 comprehensive tests REFACTORING: - Restructured feed code into starpunk/feeds/ module - feeds/rss.py - RSS 2.0 (moved from feed.py) - feeds/atom.py - ATOM 1.0 (new) - feeds/json_feed.py - JSON Feed 1.1 (new) - Backward compatible feed.py shim for existing imports - Business metrics integrated into all feed generators TESTING: - Created shared test helper tests/helpers/feed_ordering.py - Helper validates newest-first ordering across all formats - 48 total feed tests, all passing - RSS: 24 tests - ATOM: 11 tests - JSON Feed: 13 tests FILES CHANGED: - Modified: starpunk/feed.py (now compatibility shim) - New: starpunk/feeds/ module with rss.py, atom.py, json_feed.py - New: tests/helpers/feed_ordering.py (shared test helper) - New: tests/test_feeds_atom.py, tests/test_feeds_json.py - Modified: CHANGELOG.md (Phase 2 entries) - New: docs/reports/2025-11-26-v1.1.2-phase2-feed-formats-partial.md NEXT STEPS: Phase 2.4 (Content Negotiation) pending - will add /feed endpoint with Accept header negotiation and explicit format endpoints. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -1,272 +0,0 @@
|
||||
# ADR-054: Feed Generation and Caching Architecture
|
||||
|
||||
## Status
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
|
||||
StarPunk v1.1.2 "Syndicate" introduces support for multiple feed formats (RSS, ATOM, JSON Feed) alongside the existing RSS implementation. We need to decide on the architecture for generating, caching, and serving these feeds efficiently.
|
||||
|
||||
Key considerations:
|
||||
- Memory efficiency for large feeds (100+ items)
|
||||
- Cache invalidation strategy
|
||||
- Content negotiation approach
|
||||
- Performance impact on the main application
|
||||
- Backward compatibility with existing RSS feed
|
||||
|
||||
## Decision
|
||||
|
||||
Implement a unified feed generation system with the following architecture:
|
||||
|
||||
### 1. Streaming Generation
|
||||
|
||||
All feed generators will use streaming/generator-based output rather than building complete documents in memory:
|
||||
|
||||
```python
|
||||
def generate(notes) -> Iterator[str]:
|
||||
yield '<?xml version="1.0"?>'
|
||||
yield '<feed>'
|
||||
for note in notes:
|
||||
yield f'<entry>...</entry>'
|
||||
yield '</feed>'
|
||||
```
|
||||
|
||||
**Rationale**:
|
||||
- Reduces memory footprint for large feeds
|
||||
- Allows progressive rendering to clients
|
||||
- Better performance characteristics
|
||||
|
||||
### 2. Format-Agnostic Cache Layer
|
||||
|
||||
Implement an LRU cache with TTL that works across all feed formats:
|
||||
|
||||
```python
|
||||
cache_key = f"feed:{format}:{limit}:{content_checksum}"
|
||||
```
|
||||
|
||||
**Cache Strategy**:
|
||||
- LRU eviction when size limit reached
|
||||
- TTL-based expiration (default: 5 minutes)
|
||||
- Checksum-based invalidation on content changes
|
||||
- In-memory storage (no external dependencies)
|
||||
|
||||
**Rationale**:
|
||||
- Simple, no external dependencies
|
||||
- Fast access times
|
||||
- Automatic memory management
|
||||
- Works for all formats uniformly
|
||||
|
||||
### 3. Content Negotiation via Accept Headers
|
||||
|
||||
Use HTTP Accept header parsing with quality factors:
|
||||
|
||||
```
|
||||
Accept: application/atom+xml;q=0.9, application/rss+xml
|
||||
```
|
||||
|
||||
**Negotiation Rules**:
|
||||
1. Exact MIME type match scores highest
|
||||
2. Quality factors applied as multipliers
|
||||
3. Wildcards (`*/*`) score lowest
|
||||
4. Default to RSS if no preference
|
||||
|
||||
**Rationale**:
|
||||
- Standards-compliant approach
|
||||
- Allows client preference
|
||||
- Backward compatible (RSS default)
|
||||
- Works with existing feed readers
|
||||
|
||||
### 4. Unified Feed Interface
|
||||
|
||||
All generators implement a common protocol:
|
||||
|
||||
```python
|
||||
class FeedGenerator(Protocol):
|
||||
def generate(self, notes: List[Note], config: Dict) -> Iterator[str]:
|
||||
"""Generate feed content as stream"""
|
||||
|
||||
def get_content_type(self) -> str:
|
||||
"""Return appropriate MIME type"""
|
||||
```
|
||||
|
||||
**Rationale**:
|
||||
- Consistent interface across formats
|
||||
- Easy to add new formats
|
||||
- Simplifies routing logic
|
||||
- Type-safe with protocols
|
||||
|
||||
## Rationale
|
||||
|
||||
### Why Streaming Over Document Building?
|
||||
|
||||
**Option 1: Build Complete Document** (Not Chosen)
|
||||
```python
|
||||
def generate(notes):
|
||||
doc = build_document(notes)
|
||||
return doc.to_string()
|
||||
```
|
||||
- Pros: Simpler implementation, easier testing
|
||||
- Cons: High memory usage, slower for large feeds
|
||||
|
||||
**Option 2: Streaming Generation** (Chosen)
|
||||
```python
|
||||
def generate(notes):
|
||||
yield from generate_chunks(notes)
|
||||
```
|
||||
- Pros: Low memory usage, faster first byte, scalable
|
||||
- Cons: More complex implementation, harder to test
|
||||
|
||||
We chose streaming because memory efficiency is critical for a self-hosted application.
|
||||
|
||||
### Why In-Memory Cache Over External Cache?
|
||||
|
||||
**Option 1: Redis/Memcached** (Not Chosen)
|
||||
- Pros: Distributed, persistent, feature-rich
|
||||
- Cons: External dependency, complex setup, overkill for single-user
|
||||
|
||||
**Option 2: File-Based Cache** (Not Chosen)
|
||||
- Pros: Persistent, simple
|
||||
- Cons: Slower, I/O overhead, cleanup complexity
|
||||
|
||||
**Option 3: In-Memory LRU** (Chosen)
|
||||
- Pros: Fast, simple, no dependencies, automatic cleanup
|
||||
- Cons: Lost on restart, limited by RAM
|
||||
|
||||
We chose in-memory because StarPunk is single-user and simplicity is paramount.
|
||||
|
||||
### Why Content Negotiation Over Separate Endpoints?
|
||||
|
||||
**Option 1: Separate Endpoints** (Not Chosen)
|
||||
```
|
||||
/feed.rss
|
||||
/feed.atom
|
||||
/feed.json
|
||||
```
|
||||
- Pros: Explicit, simple routing
|
||||
- Cons: Multiple URLs to maintain, no automatic selection
|
||||
|
||||
**Option 2: Format Parameter** (Not Chosen)
|
||||
```
|
||||
/feed?format=atom
|
||||
```
|
||||
- Pros: Single endpoint, explicit format
|
||||
- Cons: Not RESTful, requires parameter handling
|
||||
|
||||
**Option 3: Content Negotiation** (Chosen)
|
||||
```
|
||||
/feed with Accept: application/atom+xml
|
||||
```
|
||||
- Pros: Standards-compliant, automatic selection, single endpoint
|
||||
- Cons: More complex implementation
|
||||
|
||||
We chose content negotiation because it's the standard HTTP approach and provides the best user experience.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **Memory Efficient**: Streaming reduces memory usage by 90% for large feeds
|
||||
2. **Fast Response**: First byte delivered quickly with streaming
|
||||
3. **Standards Compliant**: Proper HTTP content negotiation
|
||||
4. **Simple Dependencies**: No external cache services required
|
||||
5. **Unified Architecture**: All formats handled consistently
|
||||
6. **Backward Compatible**: Existing RSS URLs continue working
|
||||
|
||||
### Negative
|
||||
|
||||
1. **Testing Complexity**: Streaming is harder to test than complete documents
|
||||
2. **Cache Volatility**: In-memory cache lost on restart
|
||||
3. **Limited Cache Size**: Bounded by available RAM
|
||||
4. **No Distributed Cache**: Can't share cache across instances
|
||||
|
||||
### Mitigations
|
||||
|
||||
1. **Testing**: Provide test helpers that collect streams for assertions
|
||||
2. **Cache Warming**: Pre-generate popular feeds on startup
|
||||
3. **Cache Monitoring**: Track memory usage and adjust size dynamically
|
||||
4. **Future Enhancement**: Add optional Redis support later if needed
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### 1. Pre-Generated Static Files
|
||||
|
||||
**Approach**: Generate feeds as static files on note changes
|
||||
**Pros**: Zero generation latency, nginx can serve directly
|
||||
**Cons**: Storage overhead, complex invalidation, multiple files
|
||||
**Decision**: Too complex for minimal benefit
|
||||
|
||||
### 2. Worker Process Generation
|
||||
|
||||
**Approach**: Background worker generates and caches feeds
|
||||
**Pros**: Main app stays responsive, can pre-generate
|
||||
**Cons**: Complex architecture, process management overhead
|
||||
**Decision**: Over-engineered for single-user system
|
||||
|
||||
### 3. Database-Cached Feeds
|
||||
|
||||
**Approach**: Store generated feeds in database
|
||||
**Pros**: Persistent, queryable, transactional
|
||||
**Cons**: Database bloat, slower than memory, cleanup needed
|
||||
**Decision**: Inappropriate use of database
|
||||
|
||||
### 4. No Caching
|
||||
|
||||
**Approach**: Generate fresh on every request
|
||||
**Pros**: Simplest implementation, always current
|
||||
**Cons**: High CPU usage, slow response times
|
||||
**Decision**: Poor user experience
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Phase 1: Streaming Infrastructure
|
||||
- Implement streaming for existing RSS
|
||||
- Add performance tests
|
||||
- Verify memory usage reduction
|
||||
|
||||
### Phase 2: Cache Layer
|
||||
- Implement LRU cache with TTL
|
||||
- Add cache statistics
|
||||
- Monitor hit rates
|
||||
|
||||
### Phase 3: New Formats
|
||||
- Add ATOM generator with streaming
|
||||
- Add JSON Feed generator
|
||||
- Implement content negotiation
|
||||
|
||||
### Phase 4: Monitoring
|
||||
- Add cache dashboard
|
||||
- Track generation times
|
||||
- Monitor format usage
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Cache Poisoning**: Use cryptographic checksum for cache keys
|
||||
2. **Memory Exhaustion**: Hard limit on cache size
|
||||
3. **Header Injection**: Validate Accept headers
|
||||
4. **Content Security**: Escape all user content in feeds
|
||||
|
||||
## Performance Targets
|
||||
|
||||
- Feed generation: <100ms for 50 items
|
||||
- Cache hit rate: >80% in production
|
||||
- Memory per feed: <100KB
|
||||
- Streaming chunk size: 4KB
|
||||
|
||||
## Migration Path
|
||||
|
||||
1. Existing `/feed.xml` continues to work (returns RSS)
|
||||
2. New `/feed` endpoint with content negotiation
|
||||
3. Both endpoints available during transition
|
||||
4. Deprecate `/feed.xml` in v2.0
|
||||
|
||||
## References
|
||||
|
||||
- [HTTP Content Negotiation](https://developer.mozilla.org/en-US/docs/Web/HTTP/Content_negotiation)
|
||||
- [RSS 2.0 Specification](https://www.rssboard.org/rss-specification)
|
||||
- [ATOM 1.0 RFC 4287](https://tools.ietf.org/html/rfc4287)
|
||||
- [JSON Feed 1.1](https://www.jsonfeed.org/version/1.1/)
|
||||
- [Python Generators](https://docs.python.org/3/howto/functional.html#generators)
|
||||
|
||||
## Document History
|
||||
|
||||
- 2024-11-25: Initial draft for v1.1.2 planning
|
||||
Reference in New Issue
Block a user