StarPunk Notes

# ATOM Feed Specification - v1.1.2 ## Overview This specification defines the implementation of ATOM 1.0 feed generation for StarPunk, providing an alternative syndication format to RSS with enhanced metadata support and standardized content handling. ## Requirements ### Functional Requirements 1. **ATOM 1.0 Compliance** - Full conformance to RFC 4287 - Valid XML namespace declarations - Required elements present - Proper content type handling 2. **Content Support** - Text content (escaped) - HTML content (escaped or CDATA) - XHTML content (inline XML) - Base64 for binary (future) 3. **Metadata Richness** - Author information - Category/tag support - Updated vs published dates - Link relationships 4. **Streaming Generation** - Memory-efficient output - Chunked response support - No full document in memory ### Non-Functional Requirements 1. **Performance** - Generation time <100ms for 50 entries - Streaming chunks of ~4KB - Minimal memory footprint 2. **Compatibility** - Works with major feed readers - Valid per W3C Feed Validator - Proper content negotiation ## ATOM Feed Structure ### Namespace and Root Element ```xml ``` ### Feed-Level Elements #### Required Elements | Element | Description | Example | |---------|-------------|---------| | `id` | Permanent, unique identifier | `https://example.com/` | | `title` | Human-readable title | `StarPunk Notes` | | `updated` | Last significant update | `2024-11-25T12:00:00Z` | #### Recommended Elements | Element | Description | Example | |---------|-------------|---------| | `author` | Feed author | `John Doe` | | `link` | Feed relationships | `` | | `subtitle` | Feed description | `Personal notes` | #### Optional Elements | Element | Description | |---------|-------------| | `category` | Categorization scheme | | `contributor` | Secondary contributors | | `generator` | Software that generated feed | | `icon` | Small visual identification | | `logo` | Larger visual identification | | `rights` | Copyright/license info | ### Entry-Level Elements #### Required Elements | Element | Description | Example | |---------|-------------|---------| | `id` | Permanent, unique identifier | `https://example.com/note/123` | | `title` | Entry title | `My Note Title` | | `updated` | Last modification | `2024-11-25T12:00:00Z` | #### Recommended Elements | Element | Description | |---------|-------------| | `author` | Entry author (if different from feed) | | `content` | Full content | | `link` | Entry URL | | `summary` | Short summary | #### Optional Elements | Element | Description | |---------|-------------| | `category` | Entry categories/tags | | `contributor` | Secondary contributors | | `published` | Initial publication time | | `rights` | Entry-specific rights | | `source` | If republished from elsewhere | ## Implementation Design ### ATOM Generator Class ```python class AtomGenerator: """ATOM 1.0 feed generator with streaming support""" def __init__(self, site_url: str, site_name: str, site_description: str): self.site_url = site_url.rstrip('/') self.site_name = site_name self.site_description = site_description def generate(self, notes: List[Note], limit: int = 50) -> Iterator[str]: """Generate ATOM feed as stream of chunks IMPORTANT: Notes are expected to be in DESC order (newest first) from the database. This order MUST be preserved in the feed. """ # Yield XML declaration yield '\n' # Yield feed opening with namespace yield '\n' # Yield feed metadata yield from self._generate_feed_metadata() # Yield entries - maintain DESC order (newest first) # DO NOT reverse! Database order is correct for note in notes[:limit]: yield from self._generate_entry(note) # Yield closing tag yield '\n' def _generate_feed_metadata(self) -> Iterator[str]: """Generate feed-level metadata""" # Required elements yield f' {self._escape_xml(self.site_url)}/\n' yield f' {self._escape_xml(self.site_name)}\n' yield f' {self._format_atom_date(datetime.now(timezone.utc))}\n' # Links yield f' \n' yield f' \n' # Optional elements if self.site_description: yield f' {self._escape_xml(self.site_description)}\n' # Generator yield ' StarPunk\n' def _generate_entry(self, note: Note) -> Iterator[str]: """Generate a single entry""" permalink = f"{self.site_url}{note.permalink}" yield ' \n' # Required elements yield f' {self._escape_xml(permalink)}\n' yield f' {self._escape_xml(note.title)}\n' yield f' {self._format_atom_date(note.updated_at or note.created_at)}\n' # Link to entry yield f' \n' # Published date (if different from updated) if note.created_at != note.updated_at: yield f' {self._format_atom_date(note.created_at)}\n' # Author (if available) if hasattr(note, 'author'): yield ' \n' yield f' {self._escape_xml(note.author.name)}\n' if note.author.email: yield f' {self._escape_xml(note.author.email)}\n' if note.author.uri: yield f' {self._escape_xml(note.author.uri)}\n' yield ' \n' # Content yield from self._generate_content(note) # Categories/tags if hasattr(note, 'tags') and note.tags: for tag in note.tags: yield f' \n' yield ' \n' def _generate_content(self, note: Note) -> Iterator[str]: """Generate content element with proper type""" # Determine content type based on note format if note.html: # HTML content - use escaped HTML yield ' ' yield self._escape_xml(note.html) yield '\n' else: # Plain text content yield ' ' yield self._escape_xml(note.content) yield '\n' # Add summary if available if hasattr(note, 'summary') and note.summary: yield '

' yield self._escape_xml(note.summary) yield '

\n' ``` ### Date Formatting ATOM uses RFC 3339 date format, which is a profile of ISO 8601. ```python def _format_atom_date(self, dt: datetime) -> str: """Format datetime to RFC 3339 for ATOM Format: 2024-11-25T12:00:00Z or 2024-11-25T12:00:00-05:00 Args: dt: Datetime object (naive assumed UTC) Returns: RFC 3339 formatted string """ # Ensure timezone aware if dt.tzinfo is None: dt = dt.replace(tzinfo=timezone.utc) # Format to RFC 3339 # Use 'Z' for UTC, otherwise offset if dt.tzinfo == timezone.utc: return dt.strftime('%Y-%m-%dT%H:%M:%SZ') else: return dt.strftime('%Y-%m-%dT%H:%M:%S%z') ``` ### XML Escaping ```python def _escape_xml(self, text: str) -> str: """Escape special XML characters Escapes: & < > " ' Args: text: Text to escape Returns: XML-safe escaped text """ if not text: return '' # Order matters: & must be first text = text.replace('&', '&') text = text.replace('<', '<') text = text.replace('>', '>') text = text.replace('"', '"') text = text.replace("'", ''') return text ``` ## Content Type Handling ### Text Content Plain text, must be escaped: ```xml This is plain text with <escaped> characters ``` ### HTML Content HTML as escaped text: ```xml <p>This is <strong>HTML</strong> content</p> ``` ### XHTML Content (Future) Well-formed XML inline: ```xml

This is XHTML content

``` ## Complete ATOM Feed Example ```xml https://example.com/ StarPunk Notes 2024-11-25T12:00:00Z Personal notes and thoughts StarPunk https://example.com/notes/2024/11/25/first-note My First Note 2024-11-25T10:30:00Z 2024-11-25T10:00:00Z John Doe john@example.com <p>This is my first note with <strong>bold</strong> text.</p> https://example.com/notes/2024/11/24/another-note Another Note 2024-11-24T15:45:00Z Plain text content for this note.

A brief summary of the note

``` ## Validation ### W3C Feed Validator Compliance The generated ATOM feed must pass validation at: - https://validator.w3.org/feed/ ### Common Validation Issues 1. **Missing Required Elements** - Ensure id, title, updated are present - Each entry must have these elements too 2. **Invalid Dates** - Must be RFC 3339 format - Include timezone information 3. **Improper Escaping** - All XML entities must be escaped - No raw HTML in text content 4. **Namespace Issues** - Correct namespace declaration - No prefixed elements without namespace ## Testing Strategy ### Unit Tests ```python class TestAtomGenerator: def test_required_elements(self): """Test all required ATOM elements are present""" generator = AtomGenerator(site_url, site_name, site_description) feed = ''.join(generator.generate(notes)) assert '' in feed assert '' in feed assert '<updated>' in feed def test_feed_order_newest_first(self): """Test ATOM feed shows newest entries first (RFC 4287 recommendation)""" # Create notes with different timestamps old_note = Note( title="Old Note", created_at=datetime(2024, 11, 20, 10, 0, 0, tzinfo=timezone.utc) ) new_note = Note( title="New Note", created_at=datetime(2024, 11, 25, 10, 0, 0, tzinfo=timezone.utc) ) # Generate feed with notes in DESC order (as from database) generator = AtomGenerator(site_url, site_name, site_description) feed = ''.join(generator.generate([new_note, old_note])) # Parse feed and verify order root = etree.fromstring(feed.encode()) entries = root.findall('{http://www.w3.org/2005/Atom}entry') # First entry should be newest first_title = entries[0].find('{http://www.w3.org/2005/Atom}title').text assert first_title == "New Note" # Second entry should be oldest second_title = entries[1].find('{http://www.w3.org/2005/Atom}title').text assert second_title == "Old Note" def test_xml_escaping(self): """Test special characters are properly escaped""" note = Note(title="Test & <Special> Characters") generator = AtomGenerator(site_url, site_name, site_description) feed = ''.join(generator.generate([note])) assert '&' in feed assert '<Special>' in feed def test_date_formatting(self): """Test RFC 3339 date formatting""" dt = datetime(2024, 11, 25, 12, 0, 0, tzinfo=timezone.utc) formatted = generator._format_atom_date(dt) assert formatted == '2024-11-25T12:00:00Z' def test_streaming_generation(self): """Test feed is generated as stream""" generator = AtomGenerator(site_url, site_name, site_description) chunks = list(generator.generate(notes)) assert len(chunks) > 1 # Multiple chunks assert chunks[0].startswith('<?xml') assert chunks[-1].endswith('</feed>\n') ``` ### Integration Tests ```python def test_atom_feed_endpoint(): """Test ATOM feed endpoint with content negotiation""" response = client.get('/feed.atom') assert response.status_code == 200 assert response.content_type == 'application/atom+xml' # Parse and validate feed = etree.fromstring(response.data) assert feed.tag == '{http://www.w3.org/2005/Atom}feed' def test_feed_reader_compatibility(): """Test with popular feed readers""" readers = [ 'Feedly', 'Inoreader', 'NewsBlur', 'The Old Reader' ] for reader in readers: # Test parsing with reader's validator assert validate_with_reader(feed_url, reader) ``` ### Validation Tests ```python def test_w3c_validation(): """Validate against W3C Feed Validator""" generator = AtomGenerator(site_url, site_name, site_description) feed = ''.join(generator.generate(sample_notes)) # Submit to W3C validator API result = validate_feed(feed, format='atom') assert result['valid'] == True assert len(result['errors']) == 0 ``` ## Performance Benchmarks ### Generation Speed ```python def benchmark_atom_generation(): """Benchmark ATOM feed generation""" notes = generate_sample_notes(100) generator = AtomGenerator(site_url, site_name, site_description) start = time.perf_counter() feed = ''.join(generator.generate(notes, limit=50)) duration = time.perf_counter() - start assert duration < 0.1 # Less than 100ms assert len(feed) > 0 ``` ### Memory Usage ```python def test_streaming_memory_usage(): """Verify streaming doesn't load entire feed in memory""" notes = generate_sample_notes(1000) generator = AtomGenerator(site_url, site_name, site_description) initial_memory = get_memory_usage() # Generate but don't concatenate (streaming) for chunk in generator.generate(notes): pass # Process chunk memory_delta = get_memory_usage() - initial_memory assert memory_delta < 1 # Less than 1MB increase ``` ## Configuration ### ATOM-Specific Settings ```ini # ATOM feed configuration STARPUNK_FEED_ATOM_ENABLED=true STARPUNK_FEED_ATOM_AUTHOR_NAME=John Doe STARPUNK_FEED_ATOM_AUTHOR_EMAIL=john@example.com STARPUNK_FEED_ATOM_AUTHOR_URI=https://example.com/about STARPUNK_FEED_ATOM_ICON=https://example.com/icon.png STARPUNK_FEED_ATOM_LOGO=https://example.com/logo.png STARPUNK_FEED_ATOM_RIGHTS=© 2024 John Doe. CC BY-SA 4.0 ``` ## Security Considerations 1. **XML Injection Prevention** - All user content must be escaped - No raw XML from user input - Validate all URLs 2. **Content Security** - HTML content properly escaped - No script tags allowed - Sanitize all metadata 3. **Resource Limits** - Maximum feed size limits - Timeout on generation - Rate limiting on endpoint ## Migration Notes ### Adding ATOM to Existing RSS - ATOM runs parallel to RSS - No changes to existing RSS feed - Both formats available simultaneously - Shared caching infrastructure ## Acceptance Criteria 1. ✅ Valid ATOM 1.0 feed generation 2. ✅ All required elements present 3. ✅ RFC 3339 date formatting correct 4. ✅ XML properly escaped 5. ✅ Streaming generation working 6. ✅ W3C validator passing 7. ✅ Works with 5+ major feed readers 8. ✅ Performance target met (<100ms) 9. ✅ Memory efficient streaming 10. ✅ Security review passed