Implements the metrics instrumentation framework that was missing from v1.1.1. The monitoring framework existed but was never actually used to collect metrics. Phase 1 Deliverables: - Database operation monitoring with query timing and slow query detection - HTTP request/response metrics with request IDs for all requests - Memory monitoring via daemon thread with configurable intervals - Business metrics framework for notes, feeds, and cache operations - Configuration management with environment variable support Implementation Details: - MonitoredConnection wrapper at pool level for transparent DB monitoring - Flask middleware hooks for HTTP metrics collection - Background daemon thread for memory statistics (skipped in test mode) - Simple business metric helpers for integration in Phase 2 - Comprehensive test suite with 28/28 tests passing Quality Metrics: - 100% test pass rate (28/28 tests) - Zero architectural deviations from specifications - <1% performance overhead achieved - Production-ready with minimal memory impact (~2MB) Architect Review: APPROVED with excellent marks Documentation: - Implementation report: docs/reports/v1.1.2-phase1-metrics-implementation.md - Architect review: docs/reviews/2025-11-26-v1.1.2-phase1-review.md - Updated CHANGELOG.md with Phase 1 additions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
576 lines
17 KiB
Markdown
576 lines
17 KiB
Markdown
# ATOM Feed Specification - v1.1.2
|
|
|
|
## Overview
|
|
|
|
This specification defines the implementation of ATOM 1.0 feed generation for StarPunk, providing an alternative syndication format to RSS with enhanced metadata support and standardized content handling.
|
|
|
|
## Requirements
|
|
|
|
### Functional Requirements
|
|
|
|
1. **ATOM 1.0 Compliance**
|
|
- Full conformance to RFC 4287
|
|
- Valid XML namespace declarations
|
|
- Required elements present
|
|
- Proper content type handling
|
|
|
|
2. **Content Support**
|
|
- Text content (escaped)
|
|
- HTML content (escaped or CDATA)
|
|
- XHTML content (inline XML)
|
|
- Base64 for binary (future)
|
|
|
|
3. **Metadata Richness**
|
|
- Author information
|
|
- Category/tag support
|
|
- Updated vs published dates
|
|
- Link relationships
|
|
|
|
4. **Streaming Generation**
|
|
- Memory-efficient output
|
|
- Chunked response support
|
|
- No full document in memory
|
|
|
|
### Non-Functional Requirements
|
|
|
|
1. **Performance**
|
|
- Generation time <100ms for 50 entries
|
|
- Streaming chunks of ~4KB
|
|
- Minimal memory footprint
|
|
|
|
2. **Compatibility**
|
|
- Works with major feed readers
|
|
- Valid per W3C Feed Validator
|
|
- Proper content negotiation
|
|
|
|
## ATOM Feed Structure
|
|
|
|
### Namespace and Root Element
|
|
|
|
```xml
|
|
<?xml version="1.0" encoding="utf-8"?>
|
|
<feed xmlns="http://www.w3.org/2005/Atom">
|
|
<!-- Feed elements here -->
|
|
</feed>
|
|
```
|
|
|
|
### Feed-Level Elements
|
|
|
|
#### Required Elements
|
|
|
|
| Element | Description | Example |
|
|
|---------|-------------|---------|
|
|
| `id` | Permanent, unique identifier | `<id>https://example.com/</id>` |
|
|
| `title` | Human-readable title | `<title>StarPunk Notes</title>` |
|
|
| `updated` | Last significant update | `<updated>2024-11-25T12:00:00Z</updated>` |
|
|
|
|
#### Recommended Elements
|
|
|
|
| Element | Description | Example |
|
|
|---------|-------------|---------|
|
|
| `author` | Feed author | `<author><name>John Doe</name></author>` |
|
|
| `link` | Feed relationships | `<link rel="self" href="..."/>` |
|
|
| `subtitle` | Feed description | `<subtitle>Personal notes</subtitle>` |
|
|
|
|
#### Optional Elements
|
|
|
|
| Element | Description |
|
|
|---------|-------------|
|
|
| `category` | Categorization scheme |
|
|
| `contributor` | Secondary contributors |
|
|
| `generator` | Software that generated feed |
|
|
| `icon` | Small visual identification |
|
|
| `logo` | Larger visual identification |
|
|
| `rights` | Copyright/license info |
|
|
|
|
### Entry-Level Elements
|
|
|
|
#### Required Elements
|
|
|
|
| Element | Description | Example |
|
|
|---------|-------------|---------|
|
|
| `id` | Permanent, unique identifier | `<id>https://example.com/note/123</id>` |
|
|
| `title` | Entry title | `<title>My Note Title</title>` |
|
|
| `updated` | Last modification | `<updated>2024-11-25T12:00:00Z</updated>` |
|
|
|
|
#### Recommended Elements
|
|
|
|
| Element | Description |
|
|
|---------|-------------|
|
|
| `author` | Entry author (if different from feed) |
|
|
| `content` | Full content |
|
|
| `link` | Entry URL |
|
|
| `summary` | Short summary |
|
|
|
|
#### Optional Elements
|
|
|
|
| Element | Description |
|
|
|---------|-------------|
|
|
| `category` | Entry categories/tags |
|
|
| `contributor` | Secondary contributors |
|
|
| `published` | Initial publication time |
|
|
| `rights` | Entry-specific rights |
|
|
| `source` | If republished from elsewhere |
|
|
|
|
## Implementation Design
|
|
|
|
### ATOM Generator Class
|
|
|
|
```python
|
|
class AtomGenerator:
|
|
"""ATOM 1.0 feed generator with streaming support"""
|
|
|
|
def __init__(self, site_url: str, site_name: str, site_description: str):
|
|
self.site_url = site_url.rstrip('/')
|
|
self.site_name = site_name
|
|
self.site_description = site_description
|
|
|
|
def generate(self, notes: List[Note], limit: int = 50) -> Iterator[str]:
|
|
"""Generate ATOM feed as stream of chunks
|
|
|
|
IMPORTANT: Notes are expected to be in DESC order (newest first)
|
|
from the database. This order MUST be preserved in the feed.
|
|
"""
|
|
# Yield XML declaration
|
|
yield '<?xml version="1.0" encoding="utf-8"?>\n'
|
|
|
|
# Yield feed opening with namespace
|
|
yield '<feed xmlns="http://www.w3.org/2005/Atom">\n'
|
|
|
|
# Yield feed metadata
|
|
yield from self._generate_feed_metadata()
|
|
|
|
# Yield entries - maintain DESC order (newest first)
|
|
# DO NOT reverse! Database order is correct
|
|
for note in notes[:limit]:
|
|
yield from self._generate_entry(note)
|
|
|
|
# Yield closing tag
|
|
yield '</feed>\n'
|
|
|
|
def _generate_feed_metadata(self) -> Iterator[str]:
|
|
"""Generate feed-level metadata"""
|
|
# Required elements
|
|
yield f' <id>{self._escape_xml(self.site_url)}/</id>\n'
|
|
yield f' <title>{self._escape_xml(self.site_name)}</title>\n'
|
|
yield f' <updated>{self._format_atom_date(datetime.now(timezone.utc))}</updated>\n'
|
|
|
|
# Links
|
|
yield f' <link rel="alternate" type="text/html" href="{self._escape_xml(self.site_url)}"/>\n'
|
|
yield f' <link rel="self" type="application/atom+xml" href="{self._escape_xml(self.site_url)}/feed.atom"/>\n'
|
|
|
|
# Optional elements
|
|
if self.site_description:
|
|
yield f' <subtitle>{self._escape_xml(self.site_description)}</subtitle>\n'
|
|
|
|
# Generator
|
|
yield ' <generator version="1.1.2" uri="https://starpunk.app">StarPunk</generator>\n'
|
|
|
|
def _generate_entry(self, note: Note) -> Iterator[str]:
|
|
"""Generate a single entry"""
|
|
permalink = f"{self.site_url}{note.permalink}"
|
|
|
|
yield ' <entry>\n'
|
|
|
|
# Required elements
|
|
yield f' <id>{self._escape_xml(permalink)}</id>\n'
|
|
yield f' <title>{self._escape_xml(note.title)}</title>\n'
|
|
yield f' <updated>{self._format_atom_date(note.updated_at or note.created_at)}</updated>\n'
|
|
|
|
# Link to entry
|
|
yield f' <link rel="alternate" type="text/html" href="{self._escape_xml(permalink)}"/>\n'
|
|
|
|
# Published date (if different from updated)
|
|
if note.created_at != note.updated_at:
|
|
yield f' <published>{self._format_atom_date(note.created_at)}</published>\n'
|
|
|
|
# Author (if available)
|
|
if hasattr(note, 'author'):
|
|
yield ' <author>\n'
|
|
yield f' <name>{self._escape_xml(note.author.name)}</name>\n'
|
|
if note.author.email:
|
|
yield f' <email>{self._escape_xml(note.author.email)}</email>\n'
|
|
if note.author.uri:
|
|
yield f' <uri>{self._escape_xml(note.author.uri)}</uri>\n'
|
|
yield ' </author>\n'
|
|
|
|
# Content
|
|
yield from self._generate_content(note)
|
|
|
|
# Categories/tags
|
|
if hasattr(note, 'tags') and note.tags:
|
|
for tag in note.tags:
|
|
yield f' <category term="{self._escape_xml(tag)}"/>\n'
|
|
|
|
yield ' </entry>\n'
|
|
|
|
def _generate_content(self, note: Note) -> Iterator[str]:
|
|
"""Generate content element with proper type"""
|
|
# Determine content type based on note format
|
|
if note.html:
|
|
# HTML content - use escaped HTML
|
|
yield ' <content type="html">'
|
|
yield self._escape_xml(note.html)
|
|
yield '</content>\n'
|
|
else:
|
|
# Plain text content
|
|
yield ' <content type="text">'
|
|
yield self._escape_xml(note.content)
|
|
yield '</content>\n'
|
|
|
|
# Add summary if available
|
|
if hasattr(note, 'summary') and note.summary:
|
|
yield ' <summary type="text">'
|
|
yield self._escape_xml(note.summary)
|
|
yield '</summary>\n'
|
|
```
|
|
|
|
### Date Formatting
|
|
|
|
ATOM uses RFC 3339 date format, which is a profile of ISO 8601.
|
|
|
|
```python
|
|
def _format_atom_date(self, dt: datetime) -> str:
|
|
"""Format datetime to RFC 3339 for ATOM
|
|
|
|
Format: 2024-11-25T12:00:00Z or 2024-11-25T12:00:00-05:00
|
|
|
|
Args:
|
|
dt: Datetime object (naive assumed UTC)
|
|
|
|
Returns:
|
|
RFC 3339 formatted string
|
|
"""
|
|
# Ensure timezone aware
|
|
if dt.tzinfo is None:
|
|
dt = dt.replace(tzinfo=timezone.utc)
|
|
|
|
# Format to RFC 3339
|
|
# Use 'Z' for UTC, otherwise offset
|
|
if dt.tzinfo == timezone.utc:
|
|
return dt.strftime('%Y-%m-%dT%H:%M:%SZ')
|
|
else:
|
|
return dt.strftime('%Y-%m-%dT%H:%M:%S%z')
|
|
```
|
|
|
|
### XML Escaping
|
|
|
|
```python
|
|
def _escape_xml(self, text: str) -> str:
|
|
"""Escape special XML characters
|
|
|
|
Escapes: & < > " '
|
|
|
|
Args:
|
|
text: Text to escape
|
|
|
|
Returns:
|
|
XML-safe escaped text
|
|
"""
|
|
if not text:
|
|
return ''
|
|
|
|
# Order matters: & must be first
|
|
text = text.replace('&', '&')
|
|
text = text.replace('<', '<')
|
|
text = text.replace('>', '>')
|
|
text = text.replace('"', '"')
|
|
text = text.replace("'", ''')
|
|
|
|
return text
|
|
```
|
|
|
|
## Content Type Handling
|
|
|
|
### Text Content
|
|
|
|
Plain text, must be escaped:
|
|
|
|
```xml
|
|
<content type="text">This is plain text with <escaped> characters</content>
|
|
```
|
|
|
|
### HTML Content
|
|
|
|
HTML as escaped text:
|
|
|
|
```xml
|
|
<content type="html"><p>This is <strong>HTML</strong> content</p></content>
|
|
```
|
|
|
|
### XHTML Content (Future)
|
|
|
|
Well-formed XML inline:
|
|
|
|
```xml
|
|
<content type="xhtml">
|
|
<div xmlns="http://www.w3.org/1999/xhtml">
|
|
<p>This is <strong>XHTML</strong> content</p>
|
|
</div>
|
|
</content>
|
|
```
|
|
|
|
## Complete ATOM Feed Example
|
|
|
|
```xml
|
|
<?xml version="1.0" encoding="utf-8"?>
|
|
<feed xmlns="http://www.w3.org/2005/Atom">
|
|
<id>https://example.com/</id>
|
|
<title>StarPunk Notes</title>
|
|
<updated>2024-11-25T12:00:00Z</updated>
|
|
<link rel="alternate" type="text/html" href="https://example.com"/>
|
|
<link rel="self" type="application/atom+xml" href="https://example.com/feed.atom"/>
|
|
<subtitle>Personal notes and thoughts</subtitle>
|
|
<generator version="1.1.2" uri="https://starpunk.app">StarPunk</generator>
|
|
|
|
<entry>
|
|
<id>https://example.com/notes/2024/11/25/first-note</id>
|
|
<title>My First Note</title>
|
|
<updated>2024-11-25T10:30:00Z</updated>
|
|
<published>2024-11-25T10:00:00Z</published>
|
|
<link rel="alternate" type="text/html" href="https://example.com/notes/2024/11/25/first-note"/>
|
|
<author>
|
|
<name>John Doe</name>
|
|
<email>john@example.com</email>
|
|
</author>
|
|
<content type="html"><p>This is my first note with <strong>bold</strong> text.</p></content>
|
|
<category term="personal"/>
|
|
<category term="introduction"/>
|
|
</entry>
|
|
|
|
<entry>
|
|
<id>https://example.com/notes/2024/11/24/another-note</id>
|
|
<title>Another Note</title>
|
|
<updated>2024-11-24T15:45:00Z</updated>
|
|
<link rel="alternate" type="text/html" href="https://example.com/notes/2024/11/24/another-note"/>
|
|
<content type="text">Plain text content for this note.</content>
|
|
<summary type="text">A brief summary of the note</summary>
|
|
</entry>
|
|
</feed>
|
|
```
|
|
|
|
## Validation
|
|
|
|
### W3C Feed Validator Compliance
|
|
|
|
The generated ATOM feed must pass validation at:
|
|
- https://validator.w3.org/feed/
|
|
|
|
### Common Validation Issues
|
|
|
|
1. **Missing Required Elements**
|
|
- Ensure id, title, updated are present
|
|
- Each entry must have these elements too
|
|
|
|
2. **Invalid Dates**
|
|
- Must be RFC 3339 format
|
|
- Include timezone information
|
|
|
|
3. **Improper Escaping**
|
|
- All XML entities must be escaped
|
|
- No raw HTML in text content
|
|
|
|
4. **Namespace Issues**
|
|
- Correct namespace declaration
|
|
- No prefixed elements without namespace
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
|
|
```python
|
|
class TestAtomGenerator:
|
|
def test_required_elements(self):
|
|
"""Test all required ATOM elements are present"""
|
|
generator = AtomGenerator(site_url, site_name, site_description)
|
|
feed = ''.join(generator.generate(notes))
|
|
|
|
assert '<id>' in feed
|
|
assert '<title>' in feed
|
|
assert '<updated>' in feed
|
|
|
|
def test_feed_order_newest_first(self):
|
|
"""Test ATOM feed shows newest entries first (RFC 4287 recommendation)"""
|
|
# Create notes with different timestamps
|
|
old_note = Note(
|
|
title="Old Note",
|
|
created_at=datetime(2024, 11, 20, 10, 0, 0, tzinfo=timezone.utc)
|
|
)
|
|
new_note = Note(
|
|
title="New Note",
|
|
created_at=datetime(2024, 11, 25, 10, 0, 0, tzinfo=timezone.utc)
|
|
)
|
|
|
|
# Generate feed with notes in DESC order (as from database)
|
|
generator = AtomGenerator(site_url, site_name, site_description)
|
|
feed = ''.join(generator.generate([new_note, old_note]))
|
|
|
|
# Parse feed and verify order
|
|
root = etree.fromstring(feed.encode())
|
|
entries = root.findall('{http://www.w3.org/2005/Atom}entry')
|
|
|
|
# First entry should be newest
|
|
first_title = entries[0].find('{http://www.w3.org/2005/Atom}title').text
|
|
assert first_title == "New Note"
|
|
|
|
# Second entry should be oldest
|
|
second_title = entries[1].find('{http://www.w3.org/2005/Atom}title').text
|
|
assert second_title == "Old Note"
|
|
|
|
def test_xml_escaping(self):
|
|
"""Test special characters are properly escaped"""
|
|
note = Note(title="Test & <Special> Characters")
|
|
generator = AtomGenerator(site_url, site_name, site_description)
|
|
feed = ''.join(generator.generate([note]))
|
|
|
|
assert '&' in feed
|
|
assert '<Special>' in feed
|
|
|
|
def test_date_formatting(self):
|
|
"""Test RFC 3339 date formatting"""
|
|
dt = datetime(2024, 11, 25, 12, 0, 0, tzinfo=timezone.utc)
|
|
formatted = generator._format_atom_date(dt)
|
|
|
|
assert formatted == '2024-11-25T12:00:00Z'
|
|
|
|
def test_streaming_generation(self):
|
|
"""Test feed is generated as stream"""
|
|
generator = AtomGenerator(site_url, site_name, site_description)
|
|
chunks = list(generator.generate(notes))
|
|
|
|
assert len(chunks) > 1 # Multiple chunks
|
|
assert chunks[0].startswith('<?xml')
|
|
assert chunks[-1].endswith('</feed>\n')
|
|
```
|
|
|
|
### Integration Tests
|
|
|
|
```python
|
|
def test_atom_feed_endpoint():
|
|
"""Test ATOM feed endpoint with content negotiation"""
|
|
response = client.get('/feed.atom')
|
|
|
|
assert response.status_code == 200
|
|
assert response.content_type == 'application/atom+xml'
|
|
|
|
# Parse and validate
|
|
feed = etree.fromstring(response.data)
|
|
assert feed.tag == '{http://www.w3.org/2005/Atom}feed'
|
|
|
|
def test_feed_reader_compatibility():
|
|
"""Test with popular feed readers"""
|
|
readers = [
|
|
'Feedly',
|
|
'Inoreader',
|
|
'NewsBlur',
|
|
'The Old Reader'
|
|
]
|
|
|
|
for reader in readers:
|
|
# Test parsing with reader's validator
|
|
assert validate_with_reader(feed_url, reader)
|
|
```
|
|
|
|
### Validation Tests
|
|
|
|
```python
|
|
def test_w3c_validation():
|
|
"""Validate against W3C Feed Validator"""
|
|
generator = AtomGenerator(site_url, site_name, site_description)
|
|
feed = ''.join(generator.generate(sample_notes))
|
|
|
|
# Submit to W3C validator API
|
|
result = validate_feed(feed, format='atom')
|
|
assert result['valid'] == True
|
|
assert len(result['errors']) == 0
|
|
```
|
|
|
|
## Performance Benchmarks
|
|
|
|
### Generation Speed
|
|
|
|
```python
|
|
def benchmark_atom_generation():
|
|
"""Benchmark ATOM feed generation"""
|
|
notes = generate_sample_notes(100)
|
|
generator = AtomGenerator(site_url, site_name, site_description)
|
|
|
|
start = time.perf_counter()
|
|
feed = ''.join(generator.generate(notes, limit=50))
|
|
duration = time.perf_counter() - start
|
|
|
|
assert duration < 0.1 # Less than 100ms
|
|
assert len(feed) > 0
|
|
```
|
|
|
|
### Memory Usage
|
|
|
|
```python
|
|
def test_streaming_memory_usage():
|
|
"""Verify streaming doesn't load entire feed in memory"""
|
|
notes = generate_sample_notes(1000)
|
|
generator = AtomGenerator(site_url, site_name, site_description)
|
|
|
|
initial_memory = get_memory_usage()
|
|
|
|
# Generate but don't concatenate (streaming)
|
|
for chunk in generator.generate(notes):
|
|
pass # Process chunk
|
|
|
|
memory_delta = get_memory_usage() - initial_memory
|
|
assert memory_delta < 1 # Less than 1MB increase
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### ATOM-Specific Settings
|
|
|
|
```ini
|
|
# ATOM feed configuration
|
|
STARPUNK_FEED_ATOM_ENABLED=true
|
|
STARPUNK_FEED_ATOM_AUTHOR_NAME=John Doe
|
|
STARPUNK_FEED_ATOM_AUTHOR_EMAIL=john@example.com
|
|
STARPUNK_FEED_ATOM_AUTHOR_URI=https://example.com/about
|
|
STARPUNK_FEED_ATOM_ICON=https://example.com/icon.png
|
|
STARPUNK_FEED_ATOM_LOGO=https://example.com/logo.png
|
|
STARPUNK_FEED_ATOM_RIGHTS=© 2024 John Doe. CC BY-SA 4.0
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
1. **XML Injection Prevention**
|
|
- All user content must be escaped
|
|
- No raw XML from user input
|
|
- Validate all URLs
|
|
|
|
2. **Content Security**
|
|
- HTML content properly escaped
|
|
- No script tags allowed
|
|
- Sanitize all metadata
|
|
|
|
3. **Resource Limits**
|
|
- Maximum feed size limits
|
|
- Timeout on generation
|
|
- Rate limiting on endpoint
|
|
|
|
## Migration Notes
|
|
|
|
### Adding ATOM to Existing RSS
|
|
|
|
- ATOM runs parallel to RSS
|
|
- No changes to existing RSS feed
|
|
- Both formats available simultaneously
|
|
- Shared caching infrastructure
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. ✅ Valid ATOM 1.0 feed generation
|
|
2. ✅ All required elements present
|
|
3. ✅ RFC 3339 date formatting correct
|
|
4. ✅ XML properly escaped
|
|
5. ✅ Streaming generation working
|
|
6. ✅ W3C validator passing
|
|
7. ✅ Works with 5+ major feed readers
|
|
8. ✅ Performance target met (<100ms)
|
|
9. ✅ Memory efficient streaming
|
|
10. ✅ Security review passed |