Files
StarPunk/docs/design/v1.1.2/atom-feed-specification.md
Phil Skentelbery b0230b1233 feat: Complete v1.1.2 Phase 1 - Metrics Instrumentation
Implements the metrics instrumentation framework that was missing from v1.1.1.
The monitoring framework existed but was never actually used to collect metrics.

Phase 1 Deliverables:
- Database operation monitoring with query timing and slow query detection
- HTTP request/response metrics with request IDs for all requests
- Memory monitoring via daemon thread with configurable intervals
- Business metrics framework for notes, feeds, and cache operations
- Configuration management with environment variable support

Implementation Details:
- MonitoredConnection wrapper at pool level for transparent DB monitoring
- Flask middleware hooks for HTTP metrics collection
- Background daemon thread for memory statistics (skipped in test mode)
- Simple business metric helpers for integration in Phase 2
- Comprehensive test suite with 28/28 tests passing

Quality Metrics:
- 100% test pass rate (28/28 tests)
- Zero architectural deviations from specifications
- <1% performance overhead achieved
- Production-ready with minimal memory impact (~2MB)

Architect Review: APPROVED with excellent marks

Documentation:
- Implementation report: docs/reports/v1.1.2-phase1-metrics-implementation.md
- Architect review: docs/reviews/2025-11-26-v1.1.2-phase1-review.md
- Updated CHANGELOG.md with Phase 1 additions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:13:44 -07:00

17 KiB

ATOM Feed Specification - v1.1.2

Overview

This specification defines the implementation of ATOM 1.0 feed generation for StarPunk, providing an alternative syndication format to RSS with enhanced metadata support and standardized content handling.

Requirements

Functional Requirements

  1. ATOM 1.0 Compliance

    • Full conformance to RFC 4287
    • Valid XML namespace declarations
    • Required elements present
    • Proper content type handling
  2. Content Support

    • Text content (escaped)
    • HTML content (escaped or CDATA)
    • XHTML content (inline XML)
    • Base64 for binary (future)
  3. Metadata Richness

    • Author information
    • Category/tag support
    • Updated vs published dates
    • Link relationships
  4. Streaming Generation

    • Memory-efficient output
    • Chunked response support
    • No full document in memory

Non-Functional Requirements

  1. Performance

    • Generation time <100ms for 50 entries
    • Streaming chunks of ~4KB
    • Minimal memory footprint
  2. Compatibility

    • Works with major feed readers
    • Valid per W3C Feed Validator
    • Proper content negotiation

ATOM Feed Structure

Namespace and Root Element

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <!-- Feed elements here -->
</feed>

Feed-Level Elements

Required Elements

Element Description Example
id Permanent, unique identifier <id>https://example.com/</id>
title Human-readable title <title>StarPunk Notes</title>
updated Last significant update <updated>2024-11-25T12:00:00Z</updated>
Element Description Example
author Feed author <author><name>John Doe</name></author>
link Feed relationships <link rel="self" href="..."/>
subtitle Feed description <subtitle>Personal notes</subtitle>

Optional Elements

Element Description
category Categorization scheme
contributor Secondary contributors
generator Software that generated feed
icon Small visual identification
logo Larger visual identification
rights Copyright/license info

Entry-Level Elements

Required Elements

Element Description Example
id Permanent, unique identifier <id>https://example.com/note/123</id>
title Entry title <title>My Note Title</title>
updated Last modification <updated>2024-11-25T12:00:00Z</updated>
Element Description
author Entry author (if different from feed)
content Full content
link Entry URL
summary Short summary

Optional Elements

Element Description
category Entry categories/tags
contributor Secondary contributors
published Initial publication time
rights Entry-specific rights
source If republished from elsewhere

Implementation Design

ATOM Generator Class

class AtomGenerator:
    """ATOM 1.0 feed generator with streaming support"""

    def __init__(self, site_url: str, site_name: str, site_description: str):
        self.site_url = site_url.rstrip('/')
        self.site_name = site_name
        self.site_description = site_description

    def generate(self, notes: List[Note], limit: int = 50) -> Iterator[str]:
        """Generate ATOM feed as stream of chunks

        IMPORTANT: Notes are expected to be in DESC order (newest first)
        from the database. This order MUST be preserved in the feed.
        """
        # Yield XML declaration
        yield '<?xml version="1.0" encoding="utf-8"?>\n'

        # Yield feed opening with namespace
        yield '<feed xmlns="http://www.w3.org/2005/Atom">\n'

        # Yield feed metadata
        yield from self._generate_feed_metadata()

        # Yield entries - maintain DESC order (newest first)
        # DO NOT reverse! Database order is correct
        for note in notes[:limit]:
            yield from self._generate_entry(note)

        # Yield closing tag
        yield '</feed>\n'

    def _generate_feed_metadata(self) -> Iterator[str]:
        """Generate feed-level metadata"""
        # Required elements
        yield f'  <id>{self._escape_xml(self.site_url)}/</id>\n'
        yield f'  <title>{self._escape_xml(self.site_name)}</title>\n'
        yield f'  <updated>{self._format_atom_date(datetime.now(timezone.utc))}</updated>\n'

        # Links
        yield f'  <link rel="alternate" type="text/html" href="{self._escape_xml(self.site_url)}"/>\n'
        yield f'  <link rel="self" type="application/atom+xml" href="{self._escape_xml(self.site_url)}/feed.atom"/>\n'

        # Optional elements
        if self.site_description:
            yield f'  <subtitle>{self._escape_xml(self.site_description)}</subtitle>\n'

        # Generator
        yield '  <generator version="1.1.2" uri="https://starpunk.app">StarPunk</generator>\n'

    def _generate_entry(self, note: Note) -> Iterator[str]:
        """Generate a single entry"""
        permalink = f"{self.site_url}{note.permalink}"

        yield '  <entry>\n'

        # Required elements
        yield f'    <id>{self._escape_xml(permalink)}</id>\n'
        yield f'    <title>{self._escape_xml(note.title)}</title>\n'
        yield f'    <updated>{self._format_atom_date(note.updated_at or note.created_at)}</updated>\n'

        # Link to entry
        yield f'    <link rel="alternate" type="text/html" href="{self._escape_xml(permalink)}"/>\n'

        # Published date (if different from updated)
        if note.created_at != note.updated_at:
            yield f'    <published>{self._format_atom_date(note.created_at)}</published>\n'

        # Author (if available)
        if hasattr(note, 'author'):
            yield '    <author>\n'
            yield f'      <name>{self._escape_xml(note.author.name)}</name>\n'
            if note.author.email:
                yield f'      <email>{self._escape_xml(note.author.email)}</email>\n'
            if note.author.uri:
                yield f'      <uri>{self._escape_xml(note.author.uri)}</uri>\n'
            yield '    </author>\n'

        # Content
        yield from self._generate_content(note)

        # Categories/tags
        if hasattr(note, 'tags') and note.tags:
            for tag in note.tags:
                yield f'    <category term="{self._escape_xml(tag)}"/>\n'

        yield '  </entry>\n'

    def _generate_content(self, note: Note) -> Iterator[str]:
        """Generate content element with proper type"""
        # Determine content type based on note format
        if note.html:
            # HTML content - use escaped HTML
            yield '    <content type="html">'
            yield self._escape_xml(note.html)
            yield '</content>\n'
        else:
            # Plain text content
            yield '    <content type="text">'
            yield self._escape_xml(note.content)
            yield '</content>\n'

        # Add summary if available
        if hasattr(note, 'summary') and note.summary:
            yield '    <summary type="text">'
            yield self._escape_xml(note.summary)
            yield '</summary>\n'

Date Formatting

ATOM uses RFC 3339 date format, which is a profile of ISO 8601.

def _format_atom_date(self, dt: datetime) -> str:
    """Format datetime to RFC 3339 for ATOM

    Format: 2024-11-25T12:00:00Z or 2024-11-25T12:00:00-05:00

    Args:
        dt: Datetime object (naive assumed UTC)

    Returns:
        RFC 3339 formatted string
    """
    # Ensure timezone aware
    if dt.tzinfo is None:
        dt = dt.replace(tzinfo=timezone.utc)

    # Format to RFC 3339
    # Use 'Z' for UTC, otherwise offset
    if dt.tzinfo == timezone.utc:
        return dt.strftime('%Y-%m-%dT%H:%M:%SZ')
    else:
        return dt.strftime('%Y-%m-%dT%H:%M:%S%z')

XML Escaping

def _escape_xml(self, text: str) -> str:
    """Escape special XML characters

    Escapes: & < > " '

    Args:
        text: Text to escape

    Returns:
        XML-safe escaped text
    """
    if not text:
        return ''

    # Order matters: & must be first
    text = text.replace('&', '&amp;')
    text = text.replace('<', '&lt;')
    text = text.replace('>', '&gt;')
    text = text.replace('"', '&quot;')
    text = text.replace("'", '&apos;')

    return text

Content Type Handling

Text Content

Plain text, must be escaped:

<content type="text">This is plain text with &lt;escaped&gt; characters</content>

HTML Content

HTML as escaped text:

<content type="html">&lt;p&gt;This is &lt;strong&gt;HTML&lt;/strong&gt; content&lt;/p&gt;</content>

XHTML Content (Future)

Well-formed XML inline:

<content type="xhtml">
  <div xmlns="http://www.w3.org/1999/xhtml">
    <p>This is <strong>XHTML</strong> content</p>
  </div>
</content>

Complete ATOM Feed Example

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>https://example.com/</id>
  <title>StarPunk Notes</title>
  <updated>2024-11-25T12:00:00Z</updated>
  <link rel="alternate" type="text/html" href="https://example.com"/>
  <link rel="self" type="application/atom+xml" href="https://example.com/feed.atom"/>
  <subtitle>Personal notes and thoughts</subtitle>
  <generator version="1.1.2" uri="https://starpunk.app">StarPunk</generator>

  <entry>
    <id>https://example.com/notes/2024/11/25/first-note</id>
    <title>My First Note</title>
    <updated>2024-11-25T10:30:00Z</updated>
    <published>2024-11-25T10:00:00Z</published>
    <link rel="alternate" type="text/html" href="https://example.com/notes/2024/11/25/first-note"/>
    <author>
      <name>John Doe</name>
      <email>john@example.com</email>
    </author>
    <content type="html">&lt;p&gt;This is my first note with &lt;strong&gt;bold&lt;/strong&gt; text.&lt;/p&gt;</content>
    <category term="personal"/>
    <category term="introduction"/>
  </entry>

  <entry>
    <id>https://example.com/notes/2024/11/24/another-note</id>
    <title>Another Note</title>
    <updated>2024-11-24T15:45:00Z</updated>
    <link rel="alternate" type="text/html" href="https://example.com/notes/2024/11/24/another-note"/>
    <content type="text">Plain text content for this note.</content>
    <summary type="text">A brief summary of the note</summary>
  </entry>
</feed>

Validation

W3C Feed Validator Compliance

The generated ATOM feed must pass validation at:

Common Validation Issues

  1. Missing Required Elements

    • Ensure id, title, updated are present
    • Each entry must have these elements too
  2. Invalid Dates

    • Must be RFC 3339 format
    • Include timezone information
  3. Improper Escaping

    • All XML entities must be escaped
    • No raw HTML in text content
  4. Namespace Issues

    • Correct namespace declaration
    • No prefixed elements without namespace

Testing Strategy

Unit Tests

class TestAtomGenerator:
    def test_required_elements(self):
        """Test all required ATOM elements are present"""
        generator = AtomGenerator(site_url, site_name, site_description)
        feed = ''.join(generator.generate(notes))

        assert '<id>' in feed
        assert '<title>' in feed
        assert '<updated>' in feed

    def test_feed_order_newest_first(self):
        """Test ATOM feed shows newest entries first (RFC 4287 recommendation)"""
        # Create notes with different timestamps
        old_note = Note(
            title="Old Note",
            created_at=datetime(2024, 11, 20, 10, 0, 0, tzinfo=timezone.utc)
        )
        new_note = Note(
            title="New Note",
            created_at=datetime(2024, 11, 25, 10, 0, 0, tzinfo=timezone.utc)
        )

        # Generate feed with notes in DESC order (as from database)
        generator = AtomGenerator(site_url, site_name, site_description)
        feed = ''.join(generator.generate([new_note, old_note]))

        # Parse feed and verify order
        root = etree.fromstring(feed.encode())
        entries = root.findall('{http://www.w3.org/2005/Atom}entry')

        # First entry should be newest
        first_title = entries[0].find('{http://www.w3.org/2005/Atom}title').text
        assert first_title == "New Note"

        # Second entry should be oldest
        second_title = entries[1].find('{http://www.w3.org/2005/Atom}title').text
        assert second_title == "Old Note"

    def test_xml_escaping(self):
        """Test special characters are properly escaped"""
        note = Note(title="Test & <Special> Characters")
        generator = AtomGenerator(site_url, site_name, site_description)
        feed = ''.join(generator.generate([note]))

        assert '&amp;' in feed
        assert '&lt;Special&gt;' in feed

    def test_date_formatting(self):
        """Test RFC 3339 date formatting"""
        dt = datetime(2024, 11, 25, 12, 0, 0, tzinfo=timezone.utc)
        formatted = generator._format_atom_date(dt)

        assert formatted == '2024-11-25T12:00:00Z'

    def test_streaming_generation(self):
        """Test feed is generated as stream"""
        generator = AtomGenerator(site_url, site_name, site_description)
        chunks = list(generator.generate(notes))

        assert len(chunks) > 1  # Multiple chunks
        assert chunks[0].startswith('<?xml')
        assert chunks[-1].endswith('</feed>\n')

Integration Tests

def test_atom_feed_endpoint():
    """Test ATOM feed endpoint with content negotiation"""
    response = client.get('/feed.atom')

    assert response.status_code == 200
    assert response.content_type == 'application/atom+xml'

    # Parse and validate
    feed = etree.fromstring(response.data)
    assert feed.tag == '{http://www.w3.org/2005/Atom}feed'

def test_feed_reader_compatibility():
    """Test with popular feed readers"""
    readers = [
        'Feedly',
        'Inoreader',
        'NewsBlur',
        'The Old Reader'
    ]

    for reader in readers:
        # Test parsing with reader's validator
        assert validate_with_reader(feed_url, reader)

Validation Tests

def test_w3c_validation():
    """Validate against W3C Feed Validator"""
    generator = AtomGenerator(site_url, site_name, site_description)
    feed = ''.join(generator.generate(sample_notes))

    # Submit to W3C validator API
    result = validate_feed(feed, format='atom')
    assert result['valid'] == True
    assert len(result['errors']) == 0

Performance Benchmarks

Generation Speed

def benchmark_atom_generation():
    """Benchmark ATOM feed generation"""
    notes = generate_sample_notes(100)
    generator = AtomGenerator(site_url, site_name, site_description)

    start = time.perf_counter()
    feed = ''.join(generator.generate(notes, limit=50))
    duration = time.perf_counter() - start

    assert duration < 0.1  # Less than 100ms
    assert len(feed) > 0

Memory Usage

def test_streaming_memory_usage():
    """Verify streaming doesn't load entire feed in memory"""
    notes = generate_sample_notes(1000)
    generator = AtomGenerator(site_url, site_name, site_description)

    initial_memory = get_memory_usage()

    # Generate but don't concatenate (streaming)
    for chunk in generator.generate(notes):
        pass  # Process chunk

    memory_delta = get_memory_usage() - initial_memory
    assert memory_delta < 1  # Less than 1MB increase

Configuration

ATOM-Specific Settings

# ATOM feed configuration
STARPUNK_FEED_ATOM_ENABLED=true
STARPUNK_FEED_ATOM_AUTHOR_NAME=John Doe
STARPUNK_FEED_ATOM_AUTHOR_EMAIL=john@example.com
STARPUNK_FEED_ATOM_AUTHOR_URI=https://example.com/about
STARPUNK_FEED_ATOM_ICON=https://example.com/icon.png
STARPUNK_FEED_ATOM_LOGO=https://example.com/logo.png
STARPUNK_FEED_ATOM_RIGHTS=© 2024 John Doe. CC BY-SA 4.0

Security Considerations

  1. XML Injection Prevention

    • All user content must be escaped
    • No raw XML from user input
    • Validate all URLs
  2. Content Security

    • HTML content properly escaped
    • No script tags allowed
    • Sanitize all metadata
  3. Resource Limits

    • Maximum feed size limits
    • Timeout on generation
    • Rate limiting on endpoint

Migration Notes

Adding ATOM to Existing RSS

  • ATOM runs parallel to RSS
  • No changes to existing RSS feed
  • Both formats available simultaneously
  • Shared caching infrastructure

Acceptance Criteria

  1. Valid ATOM 1.0 feed generation
  2. All required elements present
  3. RFC 3339 date formatting correct
  4. XML properly escaped
  5. Streaming generation working
  6. W3C validator passing
  7. Works with 5+ major feed readers
  8. Performance target met (<100ms)
  9. Memory efficient streaming
  10. Security review passed