Files
StarPunk/docs/design/v1.1.2/json-feed-specification.md
Phil Skentelbery b0230b1233 feat: Complete v1.1.2 Phase 1 - Metrics Instrumentation
Implements the metrics instrumentation framework that was missing from v1.1.1.
The monitoring framework existed but was never actually used to collect metrics.

Phase 1 Deliverables:
- Database operation monitoring with query timing and slow query detection
- HTTP request/response metrics with request IDs for all requests
- Memory monitoring via daemon thread with configurable intervals
- Business metrics framework for notes, feeds, and cache operations
- Configuration management with environment variable support

Implementation Details:
- MonitoredConnection wrapper at pool level for transparent DB monitoring
- Flask middleware hooks for HTTP metrics collection
- Background daemon thread for memory statistics (skipped in test mode)
- Simple business metric helpers for integration in Phase 2
- Comprehensive test suite with 28/28 tests passing

Quality Metrics:
- 100% test pass rate (28/28 tests)
- Zero architectural deviations from specifications
- <1% performance overhead achieved
- Production-ready with minimal memory impact (~2MB)

Architect Review: APPROVED with excellent marks

Documentation:
- Implementation report: docs/reports/v1.1.2-phase1-metrics-implementation.md
- Architect review: docs/reviews/2025-11-26-v1.1.2-phase1-review.md
- Updated CHANGELOG.md with Phase 1 additions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:13:44 -07:00

20 KiB

JSON Feed Specification - v1.1.2

Overview

This specification defines the implementation of JSON Feed 1.1 format for StarPunk, providing a modern, developer-friendly syndication format that's easier to parse than XML-based feeds.

Requirements

Functional Requirements

  1. JSON Feed 1.1 Compliance

    • Full conformance to JSON Feed 1.1 spec
    • Valid JSON structure
    • Required fields present
    • Proper date formatting
  2. Rich Content Support

    • HTML content
    • Plain text content
    • Summary field
    • Image attachments
    • External URLs
  3. Enhanced Metadata

    • Author objects with avatars
    • Tags array
    • Language specification
    • Custom extensions
  4. Efficient Generation

    • Streaming JSON output
    • Minimal memory usage
    • Fast serialization

Non-Functional Requirements

  1. Performance

    • Generation <50ms for 50 items
    • Compact JSON output
    • Efficient serialization
  2. Compatibility

    • Valid JSON syntax
    • Works with JSON Feed readers
    • Proper MIME type handling

JSON Feed Structure

Top-Level Object

{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Required: Feed title",
  "items": [],

  "home_page_url": "https://example.com/",
  "feed_url": "https://example.com/feed.json",
  "description": "Feed description",
  "user_comment": "Free-form comment",
  "next_url": "https://example.com/feed.json?page=2",
  "icon": "https://example.com/icon.png",
  "favicon": "https://example.com/favicon.ico",
  "authors": [],
  "language": "en-US",
  "expired": false,
  "hubs": []
}

Required Fields

Field Type Description
version String Must be "https://jsonfeed.org/version/1.1"
title String Feed title
items Array Array of item objects

Optional Feed Fields

Field Type Description
home_page_url String Website URL
feed_url String URL of this feed
description String Feed description
user_comment String Implementation notes
next_url String Pagination next page
icon String 512x512+ image
favicon String Website favicon
authors Array Feed authors
language String RFC 5646 language tag
expired Boolean Feed no longer updated
hubs Array WebSub hubs

Item Object Structure

{
  "id": "Required: unique ID",
  "url": "https://example.com/note/123",
  "external_url": "https://external.com/article",
  "title": "Item title",
  "content_html": "<p>HTML content</p>",
  "content_text": "Plain text content",
  "summary": "Brief summary",
  "image": "https://example.com/image.jpg",
  "banner_image": "https://example.com/banner.jpg",
  "date_published": "2024-11-25T12:00:00Z",
  "date_modified": "2024-11-25T13:00:00Z",
  "authors": [],
  "tags": ["tag1", "tag2"],
  "language": "en",
  "attachments": [],
  "_custom": {}
}

Required Item Fields

Field Type Description
id String Unique, stable ID

Optional Item Fields

Field Type Description
url String Item permalink
external_url String Link to external content
title String Item title
content_html String HTML content
content_text String Plain text content
summary String Brief summary
image String Main image URL
banner_image String Wide banner image
date_published String RFC 3339 date
date_modified String RFC 3339 date
authors Array Item authors
tags Array String tags
language String Language code
attachments Array File attachments

Author Object

{
  "name": "Author Name",
  "url": "https://example.com/about",
  "avatar": "https://example.com/avatar.jpg"
}

Attachment Object

{
  "url": "https://example.com/file.pdf",
  "mime_type": "application/pdf",
  "title": "Attachment Title",
  "size_in_bytes": 1024000,
  "duration_in_seconds": 300
}

Implementation Design

JSON Feed Generator Class

import json
from typing import List, Dict, Any, Iterator
from datetime import datetime, timezone

class JsonFeedGenerator:
    """JSON Feed 1.1 generator with streaming support"""

    def __init__(self, site_url: str, site_name: str, site_description: str,
                 author_name: str = None, author_url: str = None, author_avatar: str = None):
        self.site_url = site_url.rstrip('/')
        self.site_name = site_name
        self.site_description = site_description
        self.author = {
            'name': author_name,
            'url': author_url,
            'avatar': author_avatar
        } if author_name else None

    def generate(self, notes: List[Note], limit: int = 50) -> str:
        """Generate complete JSON feed

        IMPORTANT: Notes are expected to be in DESC order (newest first)
        from the database. This order MUST be preserved in the feed.
        """
        feed = self._build_feed_object(notes[:limit])
        return json.dumps(feed, ensure_ascii=False, indent=2)

    def generate_streaming(self, notes: List[Note], limit: int = 50) -> Iterator[str]:
        """Generate JSON feed as stream of chunks

        IMPORTANT: Notes are expected to be in DESC order (newest first)
        from the database. This order MUST be preserved in the feed.
        """
        # Start feed object
        yield '{\n'
        yield '  "version": "https://jsonfeed.org/version/1.1",\n'
        yield f'  "title": {json.dumps(self.site_name)},\n'

        # Add optional feed metadata
        yield from self._stream_feed_metadata()

        # Start items array
        yield '  "items": [\n'

        # Stream items - maintain DESC order (newest first)
        # DO NOT reverse! Database order is correct
        items = notes[:limit]
        for i, note in enumerate(items):
            item_json = json.dumps(self._build_item_object(note), indent=4)
            # Indent items properly
            indented = '\n'.join('    ' + line for line in item_json.split('\n'))
            yield indented

            if i < len(items) - 1:
                yield ',\n'
            else:
                yield '\n'

        # Close items array and feed
        yield '  ]\n'
        yield '}\n'

    def _build_feed_object(self, notes: List[Note]) -> Dict[str, Any]:
        """Build complete feed object"""
        feed = {
            'version': 'https://jsonfeed.org/version/1.1',
            'title': self.site_name,
            'home_page_url': self.site_url,
            'feed_url': f'{self.site_url}/feed.json',
            'description': self.site_description,
            'items': [self._build_item_object(note) for note in notes]
        }

        # Add optional fields
        if self.author:
            feed['authors'] = [self._clean_author(self.author)]

        feed['language'] = 'en'  # Make configurable

        # Add icon/favicon if configured
        icon_url = self._get_icon_url()
        if icon_url:
            feed['icon'] = icon_url

        favicon_url = self._get_favicon_url()
        if favicon_url:
            feed['favicon'] = favicon_url

        return feed

    def _build_item_object(self, note: Note) -> Dict[str, Any]:
        """Build item object from note"""
        permalink = f'{self.site_url}{note.permalink}'

        item = {
            'id': permalink,
            'url': permalink,
            'title': note.title or self._format_date_title(note.created_at),
            'date_published': self._format_json_date(note.created_at)
        }

        # Add content (prefer HTML)
        if note.html:
            item['content_html'] = note.html
        elif note.content:
            item['content_text'] = note.content

        # Add modified date if different
        if hasattr(note, 'updated_at') and note.updated_at != note.created_at:
            item['date_modified'] = self._format_json_date(note.updated_at)

        # Add summary if available
        if hasattr(note, 'summary') and note.summary:
            item['summary'] = note.summary

        # Add tags if available
        if hasattr(note, 'tags') and note.tags:
            item['tags'] = note.tags

        # Add author if different from feed author
        if hasattr(note, 'author') and note.author != self.author:
            item['authors'] = [self._clean_author(note.author)]

        # Add image if available
        image_url = self._extract_image_url(note)
        if image_url:
            item['image'] = image_url

        # Add custom extensions
        item['_starpunk'] = {
            'permalink_path': note.permalink,
            'word_count': len(note.content.split()) if note.content else 0
        }

        return item

    def _clean_author(self, author: Any) -> Dict[str, str]:
        """Clean author object for JSON"""
        clean = {}

        if isinstance(author, dict):
            if author.get('name'):
                clean['name'] = author['name']
            if author.get('url'):
                clean['url'] = author['url']
            if author.get('avatar'):
                clean['avatar'] = author['avatar']
        elif hasattr(author, 'name'):
            clean['name'] = author.name
            if hasattr(author, 'url'):
                clean['url'] = author.url
            if hasattr(author, 'avatar'):
                clean['avatar'] = author.avatar
        else:
            clean['name'] = str(author)

        return clean

    def _format_json_date(self, dt: datetime) -> str:
        """Format datetime to RFC 3339 for JSON Feed

        Format: 2024-11-25T12:00:00Z or 2024-11-25T12:00:00-05:00
        """
        if dt.tzinfo is None:
            dt = dt.replace(tzinfo=timezone.utc)

        # Use Z for UTC
        if dt.tzinfo == timezone.utc:
            return dt.strftime('%Y-%m-%dT%H:%M:%SZ')
        else:
            return dt.isoformat()

    def _extract_image_url(self, note: Note) -> Optional[str]:
        """Extract first image URL from note content"""
        if not note.html:
            return None

        # Simple regex to find first img tag
        import re
        match = re.search(r'<img[^>]+src="([^"]+)"', note.html)
        if match:
            img_url = match.group(1)
            # Make absolute if relative
            if not img_url.startswith('http'):
                img_url = f'{self.site_url}{img_url}'
            return img_url

        return None

Streaming JSON Generation

For memory efficiency with large feeds:

class StreamingJsonEncoder:
    """Helper for streaming JSON generation"""

    @staticmethod
    def stream_object(obj: Dict[str, Any], indent: int = 0) -> Iterator[str]:
        """Stream a JSON object"""
        indent_str = ' ' * indent
        yield indent_str + '{\n'

        items = list(obj.items())
        for i, (key, value) in enumerate(items):
            yield f'{indent_str}  "{key}": '

            if isinstance(value, dict):
                yield from StreamingJsonEncoder.stream_object(value, indent + 2)
            elif isinstance(value, list):
                yield from StreamingJsonEncoder.stream_array(value, indent + 2)
            else:
                yield json.dumps(value)

            if i < len(items) - 1:
                yield ','
            yield '\n'

        yield indent_str + '}'

    @staticmethod
    def stream_array(arr: List[Any], indent: int = 0) -> Iterator[str]:
        """Stream a JSON array"""
        indent_str = ' ' * indent
        yield '[\n'

        for i, item in enumerate(arr):
            if isinstance(item, dict):
                yield from StreamingJsonEncoder.stream_object(item, indent + 2)
            else:
                yield indent_str + '  ' + json.dumps(item)

            if i < len(arr) - 1:
                yield ','
            yield '\n'

        yield indent_str + ']'

Complete JSON Feed Example

{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "StarPunk Notes",
  "home_page_url": "https://example.com/",
  "feed_url": "https://example.com/feed.json",
  "description": "Personal notes and thoughts",
  "authors": [
    {
      "name": "John Doe",
      "url": "https://example.com/about",
      "avatar": "https://example.com/avatar.jpg"
    }
  ],
  "language": "en",
  "icon": "https://example.com/icon.png",
  "favicon": "https://example.com/favicon.ico",
  "items": [
    {
      "id": "https://example.com/notes/2024/11/25/first-note",
      "url": "https://example.com/notes/2024/11/25/first-note",
      "title": "My First Note",
      "content_html": "<p>This is my first note with <strong>bold</strong> text.</p>",
      "summary": "Introduction to my notes",
      "image": "https://example.com/images/first.jpg",
      "date_published": "2024-11-25T10:00:00Z",
      "date_modified": "2024-11-25T10:30:00Z",
      "tags": ["personal", "introduction"],
      "_starpunk": {
        "permalink_path": "/notes/2024/11/25/first-note",
        "word_count": 8
      }
    },
    {
      "id": "https://example.com/notes/2024/11/24/another-note",
      "url": "https://example.com/notes/2024/11/24/another-note",
      "title": "Another Note",
      "content_text": "Plain text content for this note.",
      "date_published": "2024-11-24T15:45:00Z",
      "tags": ["thoughts"],
      "_starpunk": {
        "permalink_path": "/notes/2024/11/24/another-note",
        "word_count": 6
      }
    }
  ]
}

Validation

JSON Feed Validator

Validate against the official validator:

Common Validation Issues

  1. Invalid JSON Syntax

    • Proper escaping of quotes
    • Valid UTF-8 encoding
    • No trailing commas
  2. Missing Required Fields

    • version, title, items required
    • Each item needs id
  3. Invalid Date Format

    • Must be RFC 3339
    • Include timezone
  4. Invalid URLs

    • Must be absolute URLs
    • Properly encoded

Testing Strategy

Unit Tests

class TestJsonFeedGenerator:
    def test_required_fields(self):
        """Test all required fields are present"""
        generator = JsonFeedGenerator(site_url, site_name, site_description)
        feed_json = generator.generate(notes)
        feed = json.loads(feed_json)

        assert feed['version'] == 'https://jsonfeed.org/version/1.1'
        assert 'title' in feed
        assert 'items' in feed

    def test_feed_order_newest_first(self):
        """Test JSON feed shows newest entries first (spec convention)"""
        # Create notes with different timestamps
        old_note = Note(
            title="Old Note",
            created_at=datetime(2024, 11, 20, 10, 0, 0, tzinfo=timezone.utc)
        )
        new_note = Note(
            title="New Note",
            created_at=datetime(2024, 11, 25, 10, 0, 0, tzinfo=timezone.utc)
        )

        # Generate feed with notes in DESC order (as from database)
        generator = JsonFeedGenerator(site_url, site_name, site_description)
        feed_json = generator.generate([new_note, old_note])
        feed = json.loads(feed_json)

        # First item should be newest
        assert feed['items'][0]['title'] == "New Note"
        assert '2024-11-25' in feed['items'][0]['date_published']

        # Second item should be oldest
        assert feed['items'][1]['title'] == "Old Note"
        assert '2024-11-20' in feed['items'][1]['date_published']

    def test_json_validity(self):
        """Test output is valid JSON"""
        generator = JsonFeedGenerator(site_url, site_name, site_description)
        feed_json = generator.generate(notes)

        # Should parse without error
        feed = json.loads(feed_json)
        assert isinstance(feed, dict)

    def test_date_formatting(self):
        """Test RFC 3339 date formatting"""
        dt = datetime(2024, 11, 25, 12, 0, 0, tzinfo=timezone.utc)
        formatted = generator._format_json_date(dt)

        assert formatted == '2024-11-25T12:00:00Z'

    def test_streaming_generation(self):
        """Test streaming produces valid JSON"""
        generator = JsonFeedGenerator(site_url, site_name, site_description)
        chunks = list(generator.generate_streaming(notes))
        feed_json = ''.join(chunks)

        # Should be valid JSON
        feed = json.loads(feed_json)
        assert feed['version'] == 'https://jsonfeed.org/version/1.1'

    def test_custom_extensions(self):
        """Test custom _starpunk extension"""
        generator = JsonFeedGenerator(site_url, site_name, site_description)
        feed_json = generator.generate([sample_note])
        feed = json.loads(feed_json)

        item = feed['items'][0]
        assert '_starpunk' in item
        assert 'permalink_path' in item['_starpunk']
        assert 'word_count' in item['_starpunk']

Integration Tests

def test_json_feed_endpoint():
    """Test JSON feed endpoint"""
    response = client.get('/feed.json')

    assert response.status_code == 200
    assert response.content_type == 'application/feed+json'

    feed = json.loads(response.data)
    assert feed['version'] == 'https://jsonfeed.org/version/1.1'

def test_content_negotiation_json():
    """Test content negotiation prefers JSON"""
    response = client.get('/feed', headers={'Accept': 'application/json'})

    assert response.status_code == 200
    assert 'json' in response.content_type.lower()

def test_feed_reader_compatibility():
    """Test with JSON Feed readers"""
    readers = [
        'Feedbin',
        'Inoreader',
        'NewsBlur',
        'NetNewsWire'
    ]

    for reader in readers:
        assert validate_with_reader(feed_url, reader, format='json')

Validation Tests

def test_jsonfeed_validation():
    """Validate against official validator"""
    generator = JsonFeedGenerator(site_url, site_name, site_description)
    feed_json = generator.generate(sample_notes)

    # Submit to validator
    result = validate_json_feed(feed_json)
    assert result['valid'] == True
    assert len(result['errors']) == 0

Performance Benchmarks

Generation Speed

def benchmark_json_generation():
    """Benchmark JSON feed generation"""
    notes = generate_sample_notes(100)
    generator = JsonFeedGenerator(site_url, site_name, site_description)

    start = time.perf_counter()
    feed_json = generator.generate(notes, limit=50)
    duration = time.perf_counter() - start

    assert duration < 0.05  # Less than 50ms
    assert len(feed_json) > 0

Size Comparison

def test_json_vs_xml_size():
    """Compare JSON feed size to RSS/ATOM"""
    notes = generate_sample_notes(50)

    # Generate all formats
    json_feed = json_generator.generate(notes)
    rss_feed = rss_generator.generate(notes)
    atom_feed = atom_generator.generate(notes)

    # JSON should be more compact
    print(f"JSON: {len(json_feed)} bytes")
    print(f"RSS:  {len(rss_feed)} bytes")
    print(f"ATOM: {len(atom_feed)} bytes")

    # Typically JSON is 20-30% smaller

Configuration

JSON Feed Settings

# JSON Feed configuration
STARPUNK_FEED_JSON_ENABLED=true
STARPUNK_FEED_JSON_AUTHOR_NAME=John Doe
STARPUNK_FEED_JSON_AUTHOR_URL=https://example.com/about
STARPUNK_FEED_JSON_AUTHOR_AVATAR=https://example.com/avatar.jpg
STARPUNK_FEED_JSON_ICON=https://example.com/icon.png
STARPUNK_FEED_JSON_FAVICON=https://example.com/favicon.ico
STARPUNK_FEED_JSON_LANGUAGE=en
STARPUNK_FEED_JSON_HUB_URL=  # WebSub hub URL (optional)

Security Considerations

  1. JSON Injection Prevention

    • Proper JSON escaping
    • No raw user input
    • Validate all URLs
  2. Content Security

    • HTML content sanitized
    • No script injection
    • Safe JSON encoding
  3. Size Limits

    • Maximum feed size
    • Item count limits
    • Timeout protection

Migration Notes

Adding JSON Feed

  • Runs parallel to RSS/ATOM
  • No changes to existing feeds
  • Shared caching infrastructure
  • Same data source

Advanced Features

WebSub Support (Future)

{
  "hubs": [
    {
      "type": "WebSub",
      "url": "https://example.com/hub"
    }
  ]
}

Pagination

{
  "next_url": "https://example.com/feed.json?page=2"
}

Attachments

{
  "attachments": [
    {
      "url": "https://example.com/podcast.mp3",
      "mime_type": "audio/mpeg",
      "title": "Podcast Episode",
      "size_in_bytes": 25000000,
      "duration_in_seconds": 1800
    }
  ]
}

Acceptance Criteria

  1. Valid JSON Feed 1.1 generation
  2. All required fields present
  3. RFC 3339 dates correct
  4. Valid JSON syntax
  5. Streaming generation working
  6. Official validator passing
  7. Works with 5+ JSON Feed readers
  8. Performance target met (<50ms)
  9. Custom extensions working
  10. Security review passed