feat: Implement Phase 2 Feed Formats - ATOM, JSON Feed, RSS fix (Phases 2.0-2.3)

This commit implements the first three phases of v1.1.2 Phase 2 Feed Formats, adding ATOM 1.0 and JSON Feed 1.1 support alongside the existing RSS feed. CRITICAL BUG FIX: - Fixed RSS streaming feed ordering (was showing oldest-first instead of newest-first) - Streaming RSS removed incorrect reversed() call at line 198 - Feedgen RSS kept correct reversed() to compensate for library behavior NEW FEATURES: - ATOM 1.0 feed generation (RFC 4287 compliant) - Proper XML namespacing and RFC 3339 dates - Streaming and non-streaming methods - 11 comprehensive tests - JSON Feed 1.1 generation (JSON Feed spec compliant) - RFC 3339 dates and UTF-8 JSON output - Custom _starpunk extension with permalink_path and word_count - 13 comprehensive tests REFACTORING: - Restructured feed code into starpunk/feeds/ module - feeds/rss.py - RSS 2.0 (moved from feed.py) - feeds/atom.py - ATOM 1.0 (new) - feeds/json_feed.py - JSON Feed 1.1 (new) - Backward compatible feed.py shim for existing imports - Business metrics integrated into all feed generators TESTING: - Created shared test helper tests/helpers/feed_ordering.py - Helper validates newest-first ordering across all formats - 48 total feed tests, all passing - RSS: 24 tests - ATOM: 11 tests - JSON Feed: 13 tests FILES CHANGED: - Modified: starpunk/feed.py (now compatibility shim) - New: starpunk/feeds/ module with rss.py, atom.py, json_feed.py - New: tests/helpers/feed_ordering.py (shared test helper) - New: tests/test_feeds_atom.py, tests/test_feeds_json.py - Modified: CHANGELOG.md (Phase 2 entries) - New: docs/reports/2025-11-26-v1.1.2-phase2-feed-formats-partial.md NEXT STEPS: Phase 2.4 (Content Negotiation) pending - will add /feed endpoint with Accept header negotiation and explicit format endpoints. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:54:52 -07:00
parent b0230b1233
commit 59e9d402c6
14 changed files with 2663 additions and 637 deletions
--- a/docs/design/v1.1.2/developer-qa.md
+++ b/docs/design/v1.1.2/developer-qa.md
@@ -13,6 +13,59 @@ This document provides definitive answers to all 30 developer questions about v1

 ## Critical Questions (Must be answered before implementation)

+### C2: Feed Generator Module Structure
+
+**Question**: How should we organize the feed generator code as we add ATOM and JSON formats?
+1. Keep single file: Add ATOM and JSON to existing `feed.py`
+2. Split by format: Create `feed/rss.py`, `feed/atom.py`, `feed/json.py`
+3. Hybrid: Keep RSS in `feed.py`, new formats in `feed/` subdirectory
+
+**Answer**: **Option 2 - Split by format into separate modules** (`feed/rss.py`, `feed/atom.py`, `feed/json.py`).
+
+**Rationale**: This provides the cleanest separation of concerns and follows the single responsibility principle. Each feed format has distinct specifications, escaping rules, and structure. Separate files prevent the code from becoming unwieldy and make it easier to maintain each format independently. This also aligns with the existing pattern where distinct functionality gets its own module.
+
+**Implementation Guidance**:
+```
+starpunk/feeds/
+├── __init__.py          # Exports main interface functions
+├── rss.py               # RSSFeedGenerator class
+├── atom.py              # AtomFeedGenerator class
+├── json.py              # JSONFeedGenerator class
+├── opml.py              # OPMLGenerator class
+├── cache.py             # FeedCache class
+├── content_negotiator.py # ContentNegotiator class
+└── validators.py        # Feed validators (test use only)
+```
+
+In `feeds/__init__.py`:
+```python
+from .rss import RSSFeedGenerator
+from .atom import AtomFeedGenerator
+from .json import JSONFeedGenerator
+from .cache import FeedCache
+from .content_negotiator import ContentNegotiator
+
+def generate_feed(format, notes, config):
+    """Factory function to generate feed in specified format"""
+    generators = {
+        'rss': RSSFeedGenerator,
+        'atom': AtomFeedGenerator,
+        'json': JSONFeedGenerator
+    }
+
+    generator_class = generators.get(format)
+    if not generator_class:
+        raise ValueError(f"Unknown feed format: {format}")
+
+    return generator_class(notes, config).generate()
+```
+
+Move existing RSS code to `feeds/rss.py` during Phase 2.0.
+
+---
+
+## Critical Questions (Must be answered before implementation)
+
 ### CQ1: Database Instrumentation Integration

 **Answer**: Wrap connections at the pool level by modifying `get_connection()` to return `MonitoredConnection` instances.
@@ -322,6 +375,57 @@ def test_feed_order_newest_first():

 **Critical Note**: There is currently a bug in RSS feed generation (lines 100 and 198 of feed.py) where `reversed()` is incorrectly applied. This MUST be fixed in Phase 2 before implementing ATOM and JSON feeds.

+### C1: RSS Fix Testing Strategy
+
+**Question**: How should we test the RSS ordering fix?
+1. Minimal: Single test verifying newest-first order
+2. Comprehensive: Multiple tests covering edge cases
+3. Cross-format: Shared test helper for all 3 formats
+
+**Answer**: **Option 3 - Cross-format shared test helper** that will be used for RSS now and ATOM/JSON later.
+
+**Rationale**: The ordering requirement is identical across all feed formats (newest first). Creating a shared test helper now ensures consistency and prevents duplicating test logic. This minimal extra effort now saves time and prevents bugs when implementing ATOM and JSON formats.
+
+**Implementation Guidance**:
+```python
+# In tests/test_feeds.py
+
+def assert_feed_ordering_newest_first(feed_content, format):
+    """Shared helper to verify feed items are in newest-first order"""
+    if format == 'rss':
+        items = parse_rss_items(feed_content)
+        dates = [item.pubDate for item in items]
+    elif format == 'atom':
+        items = parse_atom_entries(feed_content)
+        dates = [item.published for item in items]
+    elif format == 'json':
+        items = json.loads(feed_content)['items']
+        dates = [item['date_published'] for item in items]
+
+    # Verify descending order (newest first)
+    for i in range(len(dates) - 1):
+        assert dates[i] > dates[i + 1], f"Item {i} should be newer than item {i+1}"
+
+    return True
+
+# Test for RSS fix in Phase 2.0
+def test_rss_feed_newest_first():
+    """Verify RSS feed shows newest entries first (regression test)"""
+    old_note = create_test_note(published=yesterday)
+    new_note = create_test_note(published=today)
+
+    generator = RSSFeedGenerator([new_note, old_note], config)
+    feed = generator.generate()
+
+    assert_feed_ordering_newest_first(feed, 'rss')
+```
+
+Also create edge case tests:
+- Empty feed
+- Single item
+- Items with identical timestamps
+- Items spanning months/years
+
 ---

 ## Important Questions (Should be answered for Phase 1)
@@ -585,6 +689,132 @@ class SyndicationStats:
        }
 ```

+### I1: Business Metrics Integration Timing
+
+**Question**: When should we integrate business metrics into feed generation?
+1. During Phase 2.0 RSS fix (add to existing feed.py)
+2. During Phase 2.1 when creating new feed structure
+3. Deferred to Phase 3
+
+**Answer**: **Option 2 - During Phase 2.1 when creating the new feed structure**.
+
+**Rationale**: Adding metrics to the old `feed.py` that we're about to refactor is throwaway work. Since you're creating the new `feeds/` module structure in Phase 2.1, integrate metrics properly from the start. This avoids refactoring metrics code immediately after adding it.
+
+**Implementation Guidance**:
+```python
+# In feeds/rss.py (and similarly for atom.py, json.py)
+class RSSFeedGenerator:
+    def __init__(self, notes, config, metrics_collector=None):
+        self.notes = notes
+        self.config = config
+        self.metrics_collector = metrics_collector
+
+    def generate(self):
+        start_time = time.time()
+        feed_content = ''.join(self.generate_streaming())
+
+        if self.metrics_collector:
+            self.metrics_collector.record_business_metric(
+                'feed_generated',
+                {
+                    'format': 'rss',
+                    'item_count': len(self.notes),
+                    'duration': time.time() - start_time
+                }
+            )
+
+        return feed_content
+```
+
+For Phase 2.0, focus solely on fixing the RSS ordering bug. Keep changes minimal.
+
+### I2: Streaming vs Non-Streaming for ATOM/JSON
+
+**Question**: Should we implement both streaming and non-streaming methods for ATOM/JSON like RSS?
+1. Implement both methods like RSS
+2. Implement streaming only
+3. Implement non-streaming only
+
+**Answer**: **Option 1 - Implement both methods** (streaming and non-streaming) for consistency.
+
+**Rationale**: This matches the existing RSS pattern established in CQ6. The non-streaming method (`generate()`) is required for caching, while the streaming method (`generate_streaming()`) provides memory efficiency for large feeds. Consistency across all feed formats simplifies maintenance and usage.
+
+**Implementation Guidance**:
+```python
+# Pattern for all feed generators
+class AtomFeedGenerator:
+    def generate(self) -> str:
+        """Generate complete feed for caching"""
+        return ''.join(self.generate_streaming())
+
+    def generate_streaming(self) -> Iterator[str]:
+        """Generate feed in chunks for memory efficiency"""
+        yield '<?xml version="1.0" encoding="utf-8"?>\n'
+        yield '<feed xmlns="http://www.w3.org/2005/Atom">\n'
+        # ... yield chunks ...
+
+# Usage in routes
+if cache_enabled:
+    content = generator.generate()  # Full string for caching
+    cache.set(key, content)
+    return Response(content, mimetype='application/atom+xml')
+else:
+    return Response(
+        generator.generate_streaming(),  # Stream directly
+        mimetype='application/atom+xml'
+    )
+```
+
+### I3: XML Escaping for ATOM
+
+**Question**: How should we handle XML generation and escaping for ATOM?
+1. Use feedgen library
+2. Write manual XML generation with custom escaping
+3. Use xml.etree.ElementTree
+
+**Answer**: **Option 3 - Use xml.etree.ElementTree** from the Python standard library.
+
+**Rationale**: ElementTree is in the standard library (no new dependencies), handles escaping correctly, and is simpler than manual XML string building. While feedgen is powerful, it's overkill for our simple needs and adds an unnecessary dependency. ElementTree provides the right balance of safety and simplicity.
+
+**Implementation Guidance**:
+```python
+# In feeds/atom.py
+import xml.etree.ElementTree as ET
+from xml.dom import minidom
+
+class AtomFeedGenerator:
+    def generate_streaming(self):
+        # Build tree
+        feed = ET.Element('feed', xmlns='http://www.w3.org/2005/Atom')
+
+        # Add metadata
+        ET.SubElement(feed, 'title').text = self.config.FEED_TITLE
+        ET.SubElement(feed, 'id').text = self.config.SITE_URL + '/feed.atom'
+
+        # Add entries
+        for note in self.notes:
+            entry = ET.SubElement(feed, 'entry')
+            ET.SubElement(entry, 'title').text = note.title or note.slug
+            ET.SubElement(entry, 'id').text = f"{self.config.SITE_URL}/notes/{note.slug}"
+
+            # Content with proper escaping
+            content = ET.SubElement(entry, 'content')
+            content.set('type', 'html' if note.html else 'text')
+            content.text = note.html or note.content  # ElementTree handles escaping
+
+        # Convert to string
+        rough_string = ET.tostring(feed, encoding='unicode')
+
+        # Pretty print for readability (optional)
+        if self.config.DEBUG:
+            dom = minidom.parseString(rough_string)
+            yield dom.toprettyxml(indent="  ")
+        else:
+            yield rough_string
+```
+
+This ensures proper escaping without manual string manipulation.
+
 ---

 ## Nice-to-Have Clarifications (Can defer if needed)
@@ -775,6 +1005,53 @@ def validate_feed_config():
        logger.warning("FEED_CACHE_TTL > 1h may serve stale content")
 ```

+### N1: Feed Discovery Link Tags
+
+**Question**: Should we automatically add feed discovery `<link>` tags to HTML pages?
+
+**Answer**: **Yes, add discovery links to all HTML responses** that have the main layout template.
+
+**Rationale**: Feed discovery is a web standard that improves user experience. Browsers and feed readers use these tags to detect available feeds. The overhead is minimal (a few bytes of HTML).
+
+**Implementation Guidance**:
+```html
+<!-- In base template head section -->
+{% if config.FEED_RSS_ENABLED %}
+<link rel="alternate" type="application/rss+xml" title="RSS Feed" href="/feed.rss">
+{% endif %}
+{% if config.FEED_ATOM_ENABLED %}
+<link rel="alternate" type="application/atom+xml" title="Atom Feed" href="/feed.atom">
+{% endif %}
+{% if config.FEED_JSON_ENABLED %}
+<link rel="alternate" type="application/json" title="JSON Feed" href="/feed.json">
+{% endif %}
+```
+
+### N2: Feed Icons/Badges
+
+**Question**: Should we add visual feed subscription buttons/icons to the site?
+
+**Answer**: **No visual feed buttons for v1.1.2**. Focus on the API functionality.
+
+**Rationale**: Visual design is not part of this technical release. The discovery link tags provide the functionality for feed readers. Visual subscription buttons can be added in a future UI-focused release.
+
+**Implementation Guidance**: Skip any visual feed indicators. The discovery links in N1 are sufficient for feed reader detection.
+
+### N3: Feed Pagination Support
+
+**Question**: Should feeds support pagination for sites with many notes?
+
+**Answer**: **No pagination for v1.1.2**. Use simple limit parameter only.
+
+**Rationale**: The spec already includes a configurable limit (default 50 items). This is sufficient for v1. RFC 5005 (Feed Paging and Archiving) can be considered for v1.2 if users need access to older entries via feeds.
+
+**Implementation Guidance**:
+- Stick with the simple `limit` parameter in the current design
+- Document the limit in the feed itself using appropriate elements:
+  - RSS: Add comment `<!-- Limited to 50 most recent entries -->`
+  - ATOM: Could add `<link rel="self">` with `?limit=50`
+  - JSON: Add to `_starpunk` extension: `"limit": 50`
+
 ---

 ## Summary
@@ -814,6 +1091,6 @@ Remember: When in doubt during implementation, choose the simpler approach. You

 ---

-**Document Version**: 1.0.0
-**Last Updated**: 2025-11-25
-**Status**: Ready for implementation
+**Document Version**: 1.1.0
+**Last Updated**: 2025-11-26
+**Status**: All questions answered - Ready for Phase 2 implementation