Files

Phil Skentelbery 6863bcae67 docs: add Phase 5 design and architectural review documentation

- Add ADR-014: RSS Feed Implementation
- Add ADR-015: Phase 5 Implementation Approach
- Add Phase 5 design documents (RSS and container)
- Add pre-implementation review
- Add RSS and container validation reports
- Add architectural approval for v0.6.0 release

Architecture reviews confirm 98/100 (RSS) and 96/100 (container) scores.
Phase 5 approved for production deployment.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-19 10:30:55 -07:00

11 KiB

Raw Blame History

ADR-014: RSS Feed Implementation Strategy

Status

Accepted

Context

Phase 5 requires implementing RSS feed generation for syndicating published notes. We need to decide on the implementation approach, feed format, caching strategy, and technical details for generating a standards-compliant RSS feed.

Requirements

Standard Compliance: Feed must be valid RSS 2.0
Content Inclusion: Include all published notes (up to configured limit)
Performance: Feed generation should be fast and cacheable
Simplicity: Minimal dependencies, straightforward implementation
IndieWeb Friendly: Support feed discovery and proper metadata

Key Questions

Which feed format(s) should we support?
How should we generate the RSS XML?
What caching strategy should we use?
How should we handle note titles (notes may not have explicit titles)?
How should we format dates for RSS?
What should the feed item limit be?

Decision

1. Feed Format: RSS 2.0 Only (V1)

Choice: Implement RSS 2.0 exclusively for V1

Rationale:

RSS 2.0 is widely supported by all feed readers
Simpler than Atom (fewer required elements)
Sufficient for V1 needs (notes syndication)
feedgen library handles RSS 2.0 well
Defer Atom and JSON Feed to V2+

Alternatives Considered:

Atom 1.0: More modern, better extensibility
- Rejected: More complex, not needed for basic notes
- May add in V2
JSON Feed: Developer-friendly format
- Rejected: Less universal support, not essential
- May add in V2
Multiple formats: Support RSS + Atom + JSON
- Rejected: Adds complexity, not justified for V1
- Single format keeps implementation simple

2. XML Generation: feedgen Library

Choice: Use feedgen library (already in dependencies)

Rationale:

Already dependency (used in architecture overview)
Handles RSS/Atom generation correctly
Produces valid, compliant XML
Saves time vs. manual XML generation
Well-maintained, stable library

Alternatives Considered:

Manual XML generation (ElementTree or string templates)
- Rejected: Error-prone, easy to produce invalid XML
- Would need extensive validation
PyRSS2Gen library
- Rejected: Last updated 2007, unmaintained
Django Syndication Framework
- Rejected: Requires Django, too heavyweight

3. Feed Caching Strategy: Simple In-Memory Cache

Choice: 5-minute in-memory cache with ETag support

Implementation:

_feed_cache = {
    'xml': None,
    'timestamp': None,
    'etag': None
}

# Cache for 5 minutes
if cache is fresh:
    return cached_xml with ETag
else:
    generate fresh feed
    update cache
    return new XML with new ETag

Rationale:

5 minutes is acceptable delay for note updates
RSS readers typically poll every 15-60 minutes
In-memory cache is simple (no external dependencies)
ETag enables conditional requests
Cache-Control header enables client-side caching
Low complexity, easy to implement

Alternatives Considered:

No caching: Generate on every request
- Rejected: Wasteful, feed generation involves DB + file reads
Flask-Caching with Redis
- Rejected: Adds external dependency (Redis)
- Overkill for single-user system
File-based cache
- Rejected: Complicates invalidation, I/O overhead
Longer cache duration (30+ minutes)
- Rejected: Notes should appear reasonably quickly
- 5 minutes balances performance and freshness

4. Note Titles: First Line or Timestamp

Choice: Extract first line (max 100 chars) or use timestamp

Algorithm:

def get_note_title(note):
    # Try first line
    lines = note.content.strip().split('\n')
    if lines:
        title = lines[0].strip('#').strip()
        if title:
            return title[:100]  # Truncate to 100 chars

    # Fall back to timestamp
    return note.created_at.strftime('%B %d, %Y at %I:%M %p')

Rationale:

Notes (per IndieWeb spec) don't have required titles
First line often serves as implicit title
Timestamp fallback ensures every item has title
100 char limit prevents overly long titles
Simple, deterministic algorithm

Alternatives Considered:

Always use timestamp: Too generic, not descriptive
Use content hash: Not human-friendly
Require explicit title: Breaks note simplicity
Use first sentence: Complex parsing, can be long
Content preview (first 50 chars): May not be meaningful

5. Date Formatting: RFC-822

Choice: RFC-822 format as required by RSS 2.0 spec

Format: Mon, 18 Nov 2024 12:00:00 +0000

Implementation:

def format_rfc822_date(dt):
    """Format datetime to RFC-822"""
    # Ensure UTC
    dt_utc = dt.replace(tzinfo=timezone.utc)
    # RFC-822 format
    return dt_utc.strftime('%a, %d %b %Y %H:%M:%S %z')

Rationale:

Required by RSS 2.0 specification
Standard format recognized by all feed readers
Python datetime supports formatting
Always use UTC to avoid timezone confusion

Alternatives Considered:

ISO 8601 format: Used by Atom, not valid for RSS 2.0
Unix timestamp: Not human-readable, not standard
Local timezone: Ambiguous, causes parsing issues

6. Feed Item Limit: 50 (Configurable)

Choice: Default limit of 50 items, configurable via FEED_MAX_ITEMS

Rationale:

50 items is sufficient for typical use (notes, not articles)
RSS readers handle 50 items well
Keeps feed size reasonable (< 100KB typical)
Configurable for users with different needs
Balances completeness and performance

Alternatives Considered:

No limit: Feed could become very large
- Rejected: Performance issues, large XML
Limit of 10-20: Too few, users might want more history
Pagination: Complex, not well-supported by readers
- Deferred to V2 if needed
Dynamic limit based on date: Complicated logic

7. Content Inclusion: Full HTML in CDATA

Choice: Include full rendered HTML content in CDATA wrapper

Format:

<description><![CDATA[
  <p>Rendered HTML content here</p>
]]></description>

Rationale:

RSS readers expect HTML in description
CDATA prevents XML parsing issues
Already have rendered HTML from markdown
Provides full context to readers
Standard practice for content-rich feeds

Alternatives Considered:

Plain text only: Loses formatting
Markdown in description: Not rendered by readers
Summary/excerpt: Notes are short, full content appropriate
External link only: Forces reader to leave feed

8. Feed Discovery: Standard Link Element

Choice: Add <link rel="alternate"> to all HTML pages

Implementation:

<link rel="alternate" type="application/rss+xml"
      title="Site Name RSS Feed"
      href="https://example.com/feed.xml">

Rationale:

Standard HTML feed discovery mechanism
RSS readers auto-detect feeds
IndieWeb recommended practice
No JavaScript required
Works in all browsers

Alternatives Considered:

No discovery: Users must know feed URL
- Rejected: Poor user experience
JavaScript-based discovery: Unnecessary complexity
HTTP Link header: Less common, harder to discover

Implementation Details

Module Structure

File: starpunk/feed.py

Functions:

generate_feed() - Main feed generation
format_rfc822_date() - Date formatting
get_note_title() - Title extraction
clean_html_for_rss() - HTML sanitization

Dependencies: feedgen library (already included)

Route

Path: /feed.xml

Handler: public.feed() in starpunk/routes/public.py

Caching: In-memory cache + ETag + Cache-Control

Configuration

Environment Variables:

FEED_MAX_ITEMS - Maximum feed items (default: 50)
FEED_CACHE_SECONDS - Cache duration (default: 300)

Required Channel Elements

Per RSS 2.0 spec:

<title> - Site name
<link> - Site URL
<description> - Site description
<language> - en-us
<lastBuildDate> - Feed generation time
<atom:link rel="self"> - Feed URL (for discovery)

Required Item Elements

Per RSS 2.0 spec:

<title> - Note title
<link> - Note permalink
<guid isPermaLink="true"> - Note permalink
<pubDate> - Note publication date
<description> - Full HTML content in CDATA

Consequences

Positive

Standard Compliance: Valid RSS 2.0 feeds work everywhere
Performance: Caching reduces load, fast responses
Simplicity: Single feed format, straightforward implementation
Reliability: feedgen library ensures valid XML
Flexibility: Configurable limits accommodate different needs
Discovery: Auto-detection in feed readers
Complete Content: Full HTML in feed, no truncation

Negative

Single Format: No Atom or JSON Feed in V1
- Mitigation: Can add in V2 if requested
Fixed Cache Duration: Not dynamically adjusted
- Mitigation: 5 minutes is reasonable compromise
Memory-Based Cache: Lost on restart
- Mitigation: Acceptable, regenerates quickly
No Pagination: Large archives not fully accessible
- Mitigation: 50 items is sufficient for notes

Neutral

Title Algorithm: May not always produce ideal titles
- Acceptable: Notes don't require titles, algorithm is reasonable
UTC Timestamps: Users might prefer local time
- Standard: UTC is RSS standard practice

Validation

The decision will be validated by:

W3C Feed Validator: Feed must pass without errors
Feed Reader Testing: Test in multiple readers (Feedly, NewsBlur, etc.)
Performance Testing: Feed generation < 100ms uncached
Caching Testing: Cache reduces load, serves stale correctly
Standards Review: RSS 2.0 spec compliance verification

Alternatives Rejected

Use Django Syndication Framework

Reason: Requires Django, which we're not using (Flask project)

Generate RSS Manually with Templates

Reason: Error-prone, hard to maintain, easy to produce invalid XML

Support Multiple Feed Formats in V1

Reason: Adds complexity without clear benefit, RSS 2.0 is sufficient

No Feed Caching

Reason: Wasteful, feed generation involves DB + file I/O

Per-Tag Feeds

Reason: V1 doesn't have tags, defer to V2

WebSub (PubSubHubbub) Support

Reason: Adds complexity, external dependency, not essential for V1

References

Standards

Libraries

IndieWeb

Internal Documentation

ADR: 014 Status: Accepted Date: 2025-11-18 Author: StarPunk Architect Related: ADR-002 (Flask Extensions), Phase 5 Design

11 KiB Raw Blame History

ADR-014: RSS Feed Implementation Strategy

Status

Context

Requirements

Key Questions

Decision

1. Feed Format: RSS 2.0 Only (V1)

2. XML Generation: feedgen Library

3. Feed Caching Strategy: Simple In-Memory Cache

4. Note Titles: First Line or Timestamp

5. Date Formatting: RFC-822

6. Feed Item Limit: 50 (Configurable)

7. Content Inclusion: Full HTML in CDATA

8. Feed Discovery: Standard Link Element

Implementation Details

Module Structure

Route

Configuration

Required Channel Elements

Required Item Elements

Consequences

Positive

Negative

Neutral

Validation

Alternatives Rejected

Use Django Syndication Framework

Generate RSS Manually with Templates

Support Multiple Feed Formats in V1

No Feed Caching

Per-Tag Feeds

WebSub (PubSubHubbub) Support

References

Standards

Libraries

IndieWeb

Internal Documentation

11 KiB

Raw Blame History