StarPunk/docs/design/v1.1.2/atom-feed-specification.md

# ATOM Feed Specification - v1.1.2

## Overview

This specification defines the implementation of ATOM 1.0 feed generation for StarPunk, providing an alternative syndication format to RSS with enhanced metadata support and standardized content handling.

## Requirements

### Functional Requirements

1. **ATOM 1.0 Compliance**
   - Full conformance to RFC 4287
   - Valid XML namespace declarations
   - Required elements present
   - Proper content type handling

2. **Content Support**
   - Text content (escaped)
   - HTML content (escaped or CDATA)
   - XHTML content (inline XML)
   - Base64 for binary (future)

3. **Metadata Richness**
   - Author information
   - Category/tag support
   - Updated vs published dates
   - Link relationships

4. **Streaming Generation**
   - Memory-efficient output
   - Chunked response support
   - No full document in memory

### Non-Functional Requirements

1. **Performance**
   - Generation time <100ms for 50 entries
   - Streaming chunks of ~4KB
   - Minimal memory footprint

2. **Compatibility**
   - Works with major feed readers
   - Valid per W3C Feed Validator
   - Proper content negotiation

## ATOM Feed Structure

### Namespace and Root Element

```xml
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <!-- Feed elements here -->
</feed>
```

### Feed-Level Elements

#### Required Elements

| Element | Description | Example |
|---------|-------------|---------|
| `id` | Permanent, unique identifier | `<id>https://example.com/</id>` |
| `title` | Human-readable title | `<title>StarPunk Notes</title>` |
| `updated` | Last significant update | `<updated>2024-11-25T12:00:00Z</updated>` |

#### Recommended Elements

| Element | Description | Example |
|---------|-------------|---------|
| `author` | Feed author | `<author><name>John Doe</name></author>` |
| `link` | Feed relationships | `<link rel="self" href="..."/>` |
| `subtitle` | Feed description | `<subtitle>Personal notes</subtitle>` |

#### Optional Elements

| Element | Description |
|---------|-------------|
| `category` | Categorization scheme |
| `contributor` | Secondary contributors |
| `generator` | Software that generated feed |
| `icon` | Small visual identification |
| `logo` | Larger visual identification |
| `rights` | Copyright/license info |

### Entry-Level Elements

#### Required Elements

| Element | Description | Example |
|---------|-------------|---------|
| `id` | Permanent, unique identifier | `<id>https://example.com/note/123</id>` |
| `title` | Entry title | `<title>My Note Title</title>` |
| `updated` | Last modification | `<updated>2024-11-25T12:00:00Z</updated>` |

#### Recommended Elements

| Element | Description |
|---------|-------------|
| `author` | Entry author (if different from feed) |
| `content` | Full content |
| `link` | Entry URL |
| `summary` | Short summary |

#### Optional Elements

| Element | Description |
|---------|-------------|
| `category` | Entry categories/tags |
| `contributor` | Secondary contributors |
| `published` | Initial publication time |
| `rights` | Entry-specific rights |
| `source` | If republished from elsewhere |

## Implementation Design

### ATOM Generator Class

```python
class AtomGenerator:
    """ATOM 1.0 feed generator with streaming support"""

    def __init__(self, site_url: str, site_name: str, site_description: str):
        self.site_url = site_url.rstrip('/')
        self.site_name = site_name
        self.site_description = site_description

    def generate(self, notes: List[Note], limit: int = 50) -> Iterator[str]:
        """Generate ATOM feed as stream of chunks

        IMPORTANT: Notes are expected to be in DESC order (newest first)
        from the database. This order MUST be preserved in the feed.
        """
        # Yield XML declaration
        yield '<?xml version="1.0" encoding="utf-8"?>\n'

        # Yield feed opening with namespace
        yield '<feed xmlns="http://www.w3.org/2005/Atom">\n'

        # Yield feed metadata
        yield from self._generate_feed_metadata()

        # Yield entries - maintain DESC order (newest first)
        # DO NOT reverse! Database order is correct
        for note in notes[:limit]:
            yield from self._generate_entry(note)

        # Yield closing tag
        yield '</feed>\n'

    def _generate_feed_metadata(self) -> Iterator[str]:
        """Generate feed-level metadata"""
        # Required elements
        yield f'  <id>{self._escape_xml(self.site_url)}/</id>\n'
        yield f'  <title>{self._escape_xml(self.site_name)}</title>\n'
        yield f'  <updated>{self._format_atom_date(datetime.now(timezone.utc))}</updated>\n'

        # Links
        yield f'  <link rel="alternate" type="text/html" href="{self._escape_xml(self.site_url)}"/>\n'
        yield f'  <link rel="self" type="application/atom+xml" href="{self._escape_xml(self.site_url)}/feed.atom"/>\n'

        # Optional elements
        if self.site_description:
            yield f'  <subtitle>{self._escape_xml(self.site_description)}</subtitle>\n'

        # Generator
        yield '  <generator version="1.1.2" uri="https://starpunk.app">StarPunk</generator>\n'

    def _generate_entry(self, note: Note) -> Iterator[str]:
        """Generate a single entry"""
        permalink = f"{self.site_url}{note.permalink}"

        yield '  <entry>\n'

        # Required elements
        yield f'    <id>{self._escape_xml(permalink)}</id>\n'
        yield f'    <title>{self._escape_xml(note.title)}</title>\n'
        yield f'    <updated>{self._format_atom_date(note.updated_at or note.created_at)}</updated>\n'

        # Link to entry
        yield f'    <link rel="alternate" type="text/html" href="{self._escape_xml(permalink)}"/>\n'

        # Published date (if different from updated)
        if note.created_at != note.updated_at:
            yield f'    <published>{self._format_atom_date(note.created_at)}</published>\n'

        # Author (if available)
        if hasattr(note, 'author'):
            yield '    <author>\n'
            yield f'      <name>{self._escape_xml(note.author.name)}</name>\n'
            if note.author.email:
                yield f'      <email>{self._escape_xml(note.author.email)}</email>\n'
            if note.author.uri:
                yield f'      <uri>{self._escape_xml(note.author.uri)}</uri>\n'
            yield '    </author>\n'

        # Content
        yield from self._generate_content(note)

        # Categories/tags
        if hasattr(note, 'tags') and note.tags:
            for tag in note.tags:
                yield f'    <category term="{self._escape_xml(tag)}"/>\n'

        yield '  </entry>\n'

    def _generate_content(self, note: Note) -> Iterator[str]:
        """Generate content element with proper type"""
        # Determine content type based on note format
        if note.html:
            # HTML content - use escaped HTML
            yield '    <content type="html">'
            yield self._escape_xml(note.html)
            yield '</content>\n'
        else:
            # Plain text content
            yield '    <content type="text">'
            yield self._escape_xml(note.content)
            yield '</content>\n'

        # Add summary if available
        if hasattr(note, 'summary') and note.summary:
            yield '    <summary type="text">'
            yield self._escape_xml(note.summary)
            yield '</summary>\n'
```

### Date Formatting

ATOM uses RFC 3339 date format, which is a profile of ISO 8601.

```python
def _format_atom_date(self, dt: datetime) -> str:
    """Format datetime to RFC 3339 for ATOM

    Format: 2024-11-25T12:00:00Z or 2024-11-25T12:00:00-05:00

    Args:
        dt: Datetime object (naive assumed UTC)

    Returns:
        RFC 3339 formatted string
    """
    # Ensure timezone aware
    if dt.tzinfo is None:
        dt = dt.replace(tzinfo=timezone.utc)

    # Format to RFC 3339
    # Use 'Z' for UTC, otherwise offset
    if dt.tzinfo == timezone.utc:
        return dt.strftime('%Y-%m-%dT%H:%M:%SZ')
    else:
        return dt.strftime('%Y-%m-%dT%H:%M:%S%z')
```

### XML Escaping

```python
def _escape_xml(self, text: str) -> str:
    """Escape special XML characters

    Escapes: & < > " '

    Args:
        text: Text to escape

    Returns:
        XML-safe escaped text
    """
    if not text:
        return ''

    # Order matters: & must be first
    text = text.replace('&', '&amp;')
    text = text.replace('<', '&lt;')
    text = text.replace('>', '&gt;')
    text = text.replace('"', '&quot;')
    text = text.replace("'", '&apos;')

    return text
```

## Content Type Handling

### Text Content

Plain text, must be escaped:

```xml
<content type="text">This is plain text with &lt;escaped&gt; characters</content>
```

### HTML Content

HTML as escaped text:

```xml
<content type="html">&lt;p&gt;This is &lt;strong&gt;HTML&lt;/strong&gt; content&lt;/p&gt;</content>
```

### XHTML Content (Future)

Well-formed XML inline:

```xml
<content type="xhtml">
  <div xmlns="http://www.w3.org/1999/xhtml">
    <p>This is <strong>XHTML</strong> content</p>
  </div>
</content>
```

## Complete ATOM Feed Example

```xml
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>https://example.com/</id>
  <title>StarPunk Notes</title>
  <updated>2024-11-25T12:00:00Z</updated>
  <link rel="alternate" type="text/html" href="https://example.com"/>
  <link rel="self" type="application/atom+xml" href="https://example.com/feed.atom"/>
  <subtitle>Personal notes and thoughts</subtitle>
  <generator version="1.1.2" uri="https://starpunk.app">StarPunk</generator>

  <entry>
    <id>https://example.com/notes/2024/11/25/first-note</id>
    <title>My First Note</title>
    <updated>2024-11-25T10:30:00Z</updated>
    <published>2024-11-25T10:00:00Z</published>
    <link rel="alternate" type="text/html" href="https://example.com/notes/2024/11/25/first-note"/>
    <author>
      <name>John Doe</name>
      <email>john@example.com</email>
    </author>
    <content type="html">&lt;p&gt;This is my first note with &lt;strong&gt;bold&lt;/strong&gt; text.&lt;/p&gt;</content>
    <category term="personal"/>
    <category term="introduction"/>
  </entry>

  <entry>
    <id>https://example.com/notes/2024/11/24/another-note</id>
    <title>Another Note</title>
    <updated>2024-11-24T15:45:00Z</updated>
    <link rel="alternate" type="text/html" href="https://example.com/notes/2024/11/24/another-note"/>
    <content type="text">Plain text content for this note.</content>
    <summary type="text">A brief summary of the note</summary>
  </entry>
</feed>
```

## Validation

### W3C Feed Validator Compliance

The generated ATOM feed must pass validation at:
- https://validator.w3.org/feed/

### Common Validation Issues

1. **Missing Required Elements**
   - Ensure id, title, updated are present
   - Each entry must have these elements too

2. **Invalid Dates**
   - Must be RFC 3339 format
   - Include timezone information

3. **Improper Escaping**
   - All XML entities must be escaped
   - No raw HTML in text content

4. **Namespace Issues**
   - Correct namespace declaration
   - No prefixed elements without namespace

## Testing Strategy

### Unit Tests

```python
class TestAtomGenerator:
    def test_required_elements(self):
        """Test all required ATOM elements are present"""
        generator = AtomGenerator(site_url, site_name, site_description)
        feed = ''.join(generator.generate(notes))

        assert '<id>' in feed
        assert '<title>' in feed
        assert '<updated>' in feed

    def test_feed_order_newest_first(self):
        """Test ATOM feed shows newest entries first (RFC 4287 recommendation)"""
        # Create notes with different timestamps
        old_note = Note(
            title="Old Note",
            created_at=datetime(2024, 11, 20, 10, 0, 0, tzinfo=timezone.utc)
        )
        new_note = Note(
            title="New Note",
            created_at=datetime(2024, 11, 25, 10, 0, 0, tzinfo=timezone.utc)
        )

        # Generate feed with notes in DESC order (as from database)
        generator = AtomGenerator(site_url, site_name, site_description)
        feed = ''.join(generator.generate([new_note, old_note]))

        # Parse feed and verify order
        root = etree.fromstring(feed.encode())
        entries = root.findall('{http://www.w3.org/2005/Atom}entry')

        # First entry should be newest
        first_title = entries[0].find('{http://www.w3.org/2005/Atom}title').text
        assert first_title == "New Note"

        # Second entry should be oldest
        second_title = entries[1].find('{http://www.w3.org/2005/Atom}title').text
        assert second_title == "Old Note"

    def test_xml_escaping(self):
        """Test special characters are properly escaped"""
        note = Note(title="Test & <Special> Characters")
        generator = AtomGenerator(site_url, site_name, site_description)
        feed = ''.join(generator.generate([note]))

        assert '&amp;' in feed
        assert '&lt;Special&gt;' in feed

    def test_date_formatting(self):
        """Test RFC 3339 date formatting"""
        dt = datetime(2024, 11, 25, 12, 0, 0, tzinfo=timezone.utc)
        formatted = generator._format_atom_date(dt)

        assert formatted == '2024-11-25T12:00:00Z'

    def test_streaming_generation(self):
        """Test feed is generated as stream"""
        generator = AtomGenerator(site_url, site_name, site_description)
        chunks = list(generator.generate(notes))

        assert len(chunks) > 1  # Multiple chunks
        assert chunks[0].startswith('<?xml')
        assert chunks[-1].endswith('</feed>\n')
```

### Integration Tests

```python
def test_atom_feed_endpoint():
    """Test ATOM feed endpoint with content negotiation"""
    response = client.get('/feed.atom')

    assert response.status_code == 200
    assert response.content_type == 'application/atom+xml'

    # Parse and validate
    feed = etree.fromstring(response.data)
    assert feed.tag == '{http://www.w3.org/2005/Atom}feed'

def test_feed_reader_compatibility():
    """Test with popular feed readers"""
    readers = [
        'Feedly',
        'Inoreader',
        'NewsBlur',
        'The Old Reader'
    ]

    for reader in readers:
        # Test parsing with reader's validator
        assert validate_with_reader(feed_url, reader)
```

### Validation Tests

```python
def test_w3c_validation():
    """Validate against W3C Feed Validator"""
    generator = AtomGenerator(site_url, site_name, site_description)
    feed = ''.join(generator.generate(sample_notes))

    # Submit to W3C validator API
    result = validate_feed(feed, format='atom')
    assert result['valid'] == True
    assert len(result['errors']) == 0
```

## Performance Benchmarks

### Generation Speed

```python
def benchmark_atom_generation():
    """Benchmark ATOM feed generation"""
    notes = generate_sample_notes(100)
    generator = AtomGenerator(site_url, site_name, site_description)

    start = time.perf_counter()
    feed = ''.join(generator.generate(notes, limit=50))
    duration = time.perf_counter() - start

    assert duration < 0.1  # Less than 100ms
    assert len(feed) > 0
```

### Memory Usage

```python
def test_streaming_memory_usage():
    """Verify streaming doesn't load entire feed in memory"""
    notes = generate_sample_notes(1000)
    generator = AtomGenerator(site_url, site_name, site_description)

    initial_memory = get_memory_usage()

    # Generate but don't concatenate (streaming)
    for chunk in generator.generate(notes):
        pass  # Process chunk

    memory_delta = get_memory_usage() - initial_memory
    assert memory_delta < 1  # Less than 1MB increase
```

## Configuration

### ATOM-Specific Settings

```ini
# ATOM feed configuration
STARPUNK_FEED_ATOM_ENABLED=true
STARPUNK_FEED_ATOM_AUTHOR_NAME=John Doe
STARPUNK_FEED_ATOM_AUTHOR_EMAIL=john@example.com
STARPUNK_FEED_ATOM_AUTHOR_URI=https://example.com/about
STARPUNK_FEED_ATOM_ICON=https://example.com/icon.png
STARPUNK_FEED_ATOM_LOGO=https://example.com/logo.png
STARPUNK_FEED_ATOM_RIGHTS=© 2024 John Doe. CC BY-SA 4.0
```

## Security Considerations

1. **XML Injection Prevention**
   - All user content must be escaped
   - No raw XML from user input
   - Validate all URLs

2. **Content Security**
   - HTML content properly escaped
   - No script tags allowed
   - Sanitize all metadata

3. **Resource Limits**
   - Maximum feed size limits
   - Timeout on generation
   - Rate limiting on endpoint

## Migration Notes

### Adding ATOM to Existing RSS

- ATOM runs parallel to RSS
- No changes to existing RSS feed
- Both formats available simultaneously
- Shared caching infrastructure

## Acceptance Criteria

1. ✅ Valid ATOM 1.0 feed generation
2. ✅ All required elements present
3. ✅ RFC 3339 date formatting correct
4. ✅ XML properly escaped
5. ✅ Streaming generation working
6. ✅ W3C validator passing
7. ✅ Works with 5+ major feed readers
8. ✅ Performance target met (<100ms)
9. ✅ Memory efficient streaming
10. ✅ Security review passed