Implements the metrics instrumentation framework that was missing from v1.1.1. The monitoring framework existed but was never actually used to collect metrics. Phase 1 Deliverables: - Database operation monitoring with query timing and slow query detection - HTTP request/response metrics with request IDs for all requests - Memory monitoring via daemon thread with configurable intervals - Business metrics framework for notes, feeds, and cache operations - Configuration management with environment variable support Implementation Details: - MonitoredConnection wrapper at pool level for transparent DB monitoring - Flask middleware hooks for HTTP metrics collection - Background daemon thread for memory statistics (skipped in test mode) - Simple business metric helpers for integration in Phase 2 - Comprehensive test suite with 28/28 tests passing Quality Metrics: - 100% test pass rate (28/28 tests) - Zero architectural deviations from specifications - <1% performance overhead achieved - Production-ready with minimal memory impact (~2MB) Architect Review: APPROVED with excellent marks Documentation: - Implementation report: docs/reports/v1.1.2-phase1-metrics-implementation.md - Architect review: docs/reviews/2025-11-26-v1.1.2-phase1-review.md - Updated CHANGELOG.md with Phase 1 additions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
17 KiB
17 KiB
ATOM Feed Specification - v1.1.2
Overview
This specification defines the implementation of ATOM 1.0 feed generation for StarPunk, providing an alternative syndication format to RSS with enhanced metadata support and standardized content handling.
Requirements
Functional Requirements
-
ATOM 1.0 Compliance
- Full conformance to RFC 4287
- Valid XML namespace declarations
- Required elements present
- Proper content type handling
-
Content Support
- Text content (escaped)
- HTML content (escaped or CDATA)
- XHTML content (inline XML)
- Base64 for binary (future)
-
Metadata Richness
- Author information
- Category/tag support
- Updated vs published dates
- Link relationships
-
Streaming Generation
- Memory-efficient output
- Chunked response support
- No full document in memory
Non-Functional Requirements
-
Performance
- Generation time <100ms for 50 entries
- Streaming chunks of ~4KB
- Minimal memory footprint
-
Compatibility
- Works with major feed readers
- Valid per W3C Feed Validator
- Proper content negotiation
ATOM Feed Structure
Namespace and Root Element
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<!-- Feed elements here -->
</feed>
Feed-Level Elements
Required Elements
| Element | Description | Example |
|---|---|---|
id |
Permanent, unique identifier | <id>https://example.com/</id> |
title |
Human-readable title | <title>StarPunk Notes</title> |
updated |
Last significant update | <updated>2024-11-25T12:00:00Z</updated> |
Recommended Elements
| Element | Description | Example |
|---|---|---|
author |
Feed author | <author><name>John Doe</name></author> |
link |
Feed relationships | <link rel="self" href="..."/> |
subtitle |
Feed description | <subtitle>Personal notes</subtitle> |
Optional Elements
| Element | Description |
|---|---|
category |
Categorization scheme |
contributor |
Secondary contributors |
generator |
Software that generated feed |
icon |
Small visual identification |
logo |
Larger visual identification |
rights |
Copyright/license info |
Entry-Level Elements
Required Elements
| Element | Description | Example |
|---|---|---|
id |
Permanent, unique identifier | <id>https://example.com/note/123</id> |
title |
Entry title | <title>My Note Title</title> |
updated |
Last modification | <updated>2024-11-25T12:00:00Z</updated> |
Recommended Elements
| Element | Description |
|---|---|
author |
Entry author (if different from feed) |
content |
Full content |
link |
Entry URL |
summary |
Short summary |
Optional Elements
| Element | Description |
|---|---|
category |
Entry categories/tags |
contributor |
Secondary contributors |
published |
Initial publication time |
rights |
Entry-specific rights |
source |
If republished from elsewhere |
Implementation Design
ATOM Generator Class
class AtomGenerator:
"""ATOM 1.0 feed generator with streaming support"""
def __init__(self, site_url: str, site_name: str, site_description: str):
self.site_url = site_url.rstrip('/')
self.site_name = site_name
self.site_description = site_description
def generate(self, notes: List[Note], limit: int = 50) -> Iterator[str]:
"""Generate ATOM feed as stream of chunks
IMPORTANT: Notes are expected to be in DESC order (newest first)
from the database. This order MUST be preserved in the feed.
"""
# Yield XML declaration
yield '<?xml version="1.0" encoding="utf-8"?>\n'
# Yield feed opening with namespace
yield '<feed xmlns="http://www.w3.org/2005/Atom">\n'
# Yield feed metadata
yield from self._generate_feed_metadata()
# Yield entries - maintain DESC order (newest first)
# DO NOT reverse! Database order is correct
for note in notes[:limit]:
yield from self._generate_entry(note)
# Yield closing tag
yield '</feed>\n'
def _generate_feed_metadata(self) -> Iterator[str]:
"""Generate feed-level metadata"""
# Required elements
yield f' <id>{self._escape_xml(self.site_url)}/</id>\n'
yield f' <title>{self._escape_xml(self.site_name)}</title>\n'
yield f' <updated>{self._format_atom_date(datetime.now(timezone.utc))}</updated>\n'
# Links
yield f' <link rel="alternate" type="text/html" href="{self._escape_xml(self.site_url)}"/>\n'
yield f' <link rel="self" type="application/atom+xml" href="{self._escape_xml(self.site_url)}/feed.atom"/>\n'
# Optional elements
if self.site_description:
yield f' <subtitle>{self._escape_xml(self.site_description)}</subtitle>\n'
# Generator
yield ' <generator version="1.1.2" uri="https://starpunk.app">StarPunk</generator>\n'
def _generate_entry(self, note: Note) -> Iterator[str]:
"""Generate a single entry"""
permalink = f"{self.site_url}{note.permalink}"
yield ' <entry>\n'
# Required elements
yield f' <id>{self._escape_xml(permalink)}</id>\n'
yield f' <title>{self._escape_xml(note.title)}</title>\n'
yield f' <updated>{self._format_atom_date(note.updated_at or note.created_at)}</updated>\n'
# Link to entry
yield f' <link rel="alternate" type="text/html" href="{self._escape_xml(permalink)}"/>\n'
# Published date (if different from updated)
if note.created_at != note.updated_at:
yield f' <published>{self._format_atom_date(note.created_at)}</published>\n'
# Author (if available)
if hasattr(note, 'author'):
yield ' <author>\n'
yield f' <name>{self._escape_xml(note.author.name)}</name>\n'
if note.author.email:
yield f' <email>{self._escape_xml(note.author.email)}</email>\n'
if note.author.uri:
yield f' <uri>{self._escape_xml(note.author.uri)}</uri>\n'
yield ' </author>\n'
# Content
yield from self._generate_content(note)
# Categories/tags
if hasattr(note, 'tags') and note.tags:
for tag in note.tags:
yield f' <category term="{self._escape_xml(tag)}"/>\n'
yield ' </entry>\n'
def _generate_content(self, note: Note) -> Iterator[str]:
"""Generate content element with proper type"""
# Determine content type based on note format
if note.html:
# HTML content - use escaped HTML
yield ' <content type="html">'
yield self._escape_xml(note.html)
yield '</content>\n'
else:
# Plain text content
yield ' <content type="text">'
yield self._escape_xml(note.content)
yield '</content>\n'
# Add summary if available
if hasattr(note, 'summary') and note.summary:
yield ' <summary type="text">'
yield self._escape_xml(note.summary)
yield '</summary>\n'
Date Formatting
ATOM uses RFC 3339 date format, which is a profile of ISO 8601.
def _format_atom_date(self, dt: datetime) -> str:
"""Format datetime to RFC 3339 for ATOM
Format: 2024-11-25T12:00:00Z or 2024-11-25T12:00:00-05:00
Args:
dt: Datetime object (naive assumed UTC)
Returns:
RFC 3339 formatted string
"""
# Ensure timezone aware
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
# Format to RFC 3339
# Use 'Z' for UTC, otherwise offset
if dt.tzinfo == timezone.utc:
return dt.strftime('%Y-%m-%dT%H:%M:%SZ')
else:
return dt.strftime('%Y-%m-%dT%H:%M:%S%z')
XML Escaping
def _escape_xml(self, text: str) -> str:
"""Escape special XML characters
Escapes: & < > " '
Args:
text: Text to escape
Returns:
XML-safe escaped text
"""
if not text:
return ''
# Order matters: & must be first
text = text.replace('&', '&')
text = text.replace('<', '<')
text = text.replace('>', '>')
text = text.replace('"', '"')
text = text.replace("'", ''')
return text
Content Type Handling
Text Content
Plain text, must be escaped:
<content type="text">This is plain text with <escaped> characters</content>
HTML Content
HTML as escaped text:
<content type="html"><p>This is <strong>HTML</strong> content</p></content>
XHTML Content (Future)
Well-formed XML inline:
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is <strong>XHTML</strong> content</p>
</div>
</content>
Complete ATOM Feed Example
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<id>https://example.com/</id>
<title>StarPunk Notes</title>
<updated>2024-11-25T12:00:00Z</updated>
<link rel="alternate" type="text/html" href="https://example.com"/>
<link rel="self" type="application/atom+xml" href="https://example.com/feed.atom"/>
<subtitle>Personal notes and thoughts</subtitle>
<generator version="1.1.2" uri="https://starpunk.app">StarPunk</generator>
<entry>
<id>https://example.com/notes/2024/11/25/first-note</id>
<title>My First Note</title>
<updated>2024-11-25T10:30:00Z</updated>
<published>2024-11-25T10:00:00Z</published>
<link rel="alternate" type="text/html" href="https://example.com/notes/2024/11/25/first-note"/>
<author>
<name>John Doe</name>
<email>john@example.com</email>
</author>
<content type="html"><p>This is my first note with <strong>bold</strong> text.</p></content>
<category term="personal"/>
<category term="introduction"/>
</entry>
<entry>
<id>https://example.com/notes/2024/11/24/another-note</id>
<title>Another Note</title>
<updated>2024-11-24T15:45:00Z</updated>
<link rel="alternate" type="text/html" href="https://example.com/notes/2024/11/24/another-note"/>
<content type="text">Plain text content for this note.</content>
<summary type="text">A brief summary of the note</summary>
</entry>
</feed>
Validation
W3C Feed Validator Compliance
The generated ATOM feed must pass validation at:
Common Validation Issues
-
Missing Required Elements
- Ensure id, title, updated are present
- Each entry must have these elements too
-
Invalid Dates
- Must be RFC 3339 format
- Include timezone information
-
Improper Escaping
- All XML entities must be escaped
- No raw HTML in text content
-
Namespace Issues
- Correct namespace declaration
- No prefixed elements without namespace
Testing Strategy
Unit Tests
class TestAtomGenerator:
def test_required_elements(self):
"""Test all required ATOM elements are present"""
generator = AtomGenerator(site_url, site_name, site_description)
feed = ''.join(generator.generate(notes))
assert '<id>' in feed
assert '<title>' in feed
assert '<updated>' in feed
def test_feed_order_newest_first(self):
"""Test ATOM feed shows newest entries first (RFC 4287 recommendation)"""
# Create notes with different timestamps
old_note = Note(
title="Old Note",
created_at=datetime(2024, 11, 20, 10, 0, 0, tzinfo=timezone.utc)
)
new_note = Note(
title="New Note",
created_at=datetime(2024, 11, 25, 10, 0, 0, tzinfo=timezone.utc)
)
# Generate feed with notes in DESC order (as from database)
generator = AtomGenerator(site_url, site_name, site_description)
feed = ''.join(generator.generate([new_note, old_note]))
# Parse feed and verify order
root = etree.fromstring(feed.encode())
entries = root.findall('{http://www.w3.org/2005/Atom}entry')
# First entry should be newest
first_title = entries[0].find('{http://www.w3.org/2005/Atom}title').text
assert first_title == "New Note"
# Second entry should be oldest
second_title = entries[1].find('{http://www.w3.org/2005/Atom}title').text
assert second_title == "Old Note"
def test_xml_escaping(self):
"""Test special characters are properly escaped"""
note = Note(title="Test & <Special> Characters")
generator = AtomGenerator(site_url, site_name, site_description)
feed = ''.join(generator.generate([note]))
assert '&' in feed
assert '<Special>' in feed
def test_date_formatting(self):
"""Test RFC 3339 date formatting"""
dt = datetime(2024, 11, 25, 12, 0, 0, tzinfo=timezone.utc)
formatted = generator._format_atom_date(dt)
assert formatted == '2024-11-25T12:00:00Z'
def test_streaming_generation(self):
"""Test feed is generated as stream"""
generator = AtomGenerator(site_url, site_name, site_description)
chunks = list(generator.generate(notes))
assert len(chunks) > 1 # Multiple chunks
assert chunks[0].startswith('<?xml')
assert chunks[-1].endswith('</feed>\n')
Integration Tests
def test_atom_feed_endpoint():
"""Test ATOM feed endpoint with content negotiation"""
response = client.get('/feed.atom')
assert response.status_code == 200
assert response.content_type == 'application/atom+xml'
# Parse and validate
feed = etree.fromstring(response.data)
assert feed.tag == '{http://www.w3.org/2005/Atom}feed'
def test_feed_reader_compatibility():
"""Test with popular feed readers"""
readers = [
'Feedly',
'Inoreader',
'NewsBlur',
'The Old Reader'
]
for reader in readers:
# Test parsing with reader's validator
assert validate_with_reader(feed_url, reader)
Validation Tests
def test_w3c_validation():
"""Validate against W3C Feed Validator"""
generator = AtomGenerator(site_url, site_name, site_description)
feed = ''.join(generator.generate(sample_notes))
# Submit to W3C validator API
result = validate_feed(feed, format='atom')
assert result['valid'] == True
assert len(result['errors']) == 0
Performance Benchmarks
Generation Speed
def benchmark_atom_generation():
"""Benchmark ATOM feed generation"""
notes = generate_sample_notes(100)
generator = AtomGenerator(site_url, site_name, site_description)
start = time.perf_counter()
feed = ''.join(generator.generate(notes, limit=50))
duration = time.perf_counter() - start
assert duration < 0.1 # Less than 100ms
assert len(feed) > 0
Memory Usage
def test_streaming_memory_usage():
"""Verify streaming doesn't load entire feed in memory"""
notes = generate_sample_notes(1000)
generator = AtomGenerator(site_url, site_name, site_description)
initial_memory = get_memory_usage()
# Generate but don't concatenate (streaming)
for chunk in generator.generate(notes):
pass # Process chunk
memory_delta = get_memory_usage() - initial_memory
assert memory_delta < 1 # Less than 1MB increase
Configuration
ATOM-Specific Settings
# ATOM feed configuration
STARPUNK_FEED_ATOM_ENABLED=true
STARPUNK_FEED_ATOM_AUTHOR_NAME=John Doe
STARPUNK_FEED_ATOM_AUTHOR_EMAIL=john@example.com
STARPUNK_FEED_ATOM_AUTHOR_URI=https://example.com/about
STARPUNK_FEED_ATOM_ICON=https://example.com/icon.png
STARPUNK_FEED_ATOM_LOGO=https://example.com/logo.png
STARPUNK_FEED_ATOM_RIGHTS=© 2024 John Doe. CC BY-SA 4.0
Security Considerations
-
XML Injection Prevention
- All user content must be escaped
- No raw XML from user input
- Validate all URLs
-
Content Security
- HTML content properly escaped
- No script tags allowed
- Sanitize all metadata
-
Resource Limits
- Maximum feed size limits
- Timeout on generation
- Rate limiting on endpoint
Migration Notes
Adding ATOM to Existing RSS
- ATOM runs parallel to RSS
- No changes to existing RSS feed
- Both formats available simultaneously
- Shared caching infrastructure
Acceptance Criteria
- ✅ Valid ATOM 1.0 feed generation
- ✅ All required elements present
- ✅ RFC 3339 date formatting correct
- ✅ XML properly escaped
- ✅ Streaming generation working
- ✅ W3C validator passing
- ✅ Works with 5+ major feed readers
- ✅ Performance target met (<100ms)
- ✅ Memory efficient streaming
- ✅ Security review passed