StarPunk/docs/design/v1.1.2/json-feed-specification.md

# JSON Feed Specification - v1.1.2

## Overview

This specification defines the implementation of JSON Feed 1.1 format for StarPunk, providing a modern, developer-friendly syndication format that's easier to parse than XML-based feeds.

## Requirements

### Functional Requirements

1. **JSON Feed 1.1 Compliance**
   - Full conformance to JSON Feed 1.1 spec
   - Valid JSON structure
   - Required fields present
   - Proper date formatting

2. **Rich Content Support**
   - HTML content
   - Plain text content
   - Summary field
   - Image attachments
   - External URLs

3. **Enhanced Metadata**
   - Author objects with avatars
   - Tags array
   - Language specification
   - Custom extensions

4. **Efficient Generation**
   - Streaming JSON output
   - Minimal memory usage
   - Fast serialization

### Non-Functional Requirements

1. **Performance**
   - Generation <50ms for 50 items
   - Compact JSON output
   - Efficient serialization

2. **Compatibility**
   - Valid JSON syntax
   - Works with JSON Feed readers
   - Proper MIME type handling

## JSON Feed Structure

### Top-Level Object

```json
{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Required: Feed title",
  "items": [],

  "home_page_url": "https://example.com/",
  "feed_url": "https://example.com/feed.json",
  "description": "Feed description",
  "user_comment": "Free-form comment",
  "next_url": "https://example.com/feed.json?page=2",
  "icon": "https://example.com/icon.png",
  "favicon": "https://example.com/favicon.ico",
  "authors": [],
  "language": "en-US",
  "expired": false,
  "hubs": []
}
```

### Required Fields

| Field | Type | Description |
|-------|------|-------------|
| `version` | String | Must be "https://jsonfeed.org/version/1.1" |
| `title` | String | Feed title |
| `items` | Array | Array of item objects |

### Optional Feed Fields

| Field | Type | Description |
|-------|------|-------------|
| `home_page_url` | String | Website URL |
| `feed_url` | String | URL of this feed |
| `description` | String | Feed description |
| `user_comment` | String | Implementation notes |
| `next_url` | String | Pagination next page |
| `icon` | String | 512x512+ image |
| `favicon` | String | Website favicon |
| `authors` | Array | Feed authors |
| `language` | String | RFC 5646 language tag |
| `expired` | Boolean | Feed no longer updated |
| `hubs` | Array | WebSub hubs |

### Item Object Structure

```json
{
  "id": "Required: unique ID",
  "url": "https://example.com/note/123",
  "external_url": "https://external.com/article",
  "title": "Item title",
  "content_html": "<p>HTML content</p>",
  "content_text": "Plain text content",
  "summary": "Brief summary",
  "image": "https://example.com/image.jpg",
  "banner_image": "https://example.com/banner.jpg",
  "date_published": "2024-11-25T12:00:00Z",
  "date_modified": "2024-11-25T13:00:00Z",
  "authors": [],
  "tags": ["tag1", "tag2"],
  "language": "en",
  "attachments": [],
  "_custom": {}
}
```

### Required Item Fields

| Field | Type | Description |
|-------|------|-------------|
| `id` | String | Unique, stable ID |

### Optional Item Fields

| Field | Type | Description |
|-------|------|-------------|
| `url` | String | Item permalink |
| `external_url` | String | Link to external content |
| `title` | String | Item title |
| `content_html` | String | HTML content |
| `content_text` | String | Plain text content |
| `summary` | String | Brief summary |
| `image` | String | Main image URL |
| `banner_image` | String | Wide banner image |
| `date_published` | String | RFC 3339 date |
| `date_modified` | String | RFC 3339 date |
| `authors` | Array | Item authors |
| `tags` | Array | String tags |
| `language` | String | Language code |
| `attachments` | Array | File attachments |

### Author Object

```json
{
  "name": "Author Name",
  "url": "https://example.com/about",
  "avatar": "https://example.com/avatar.jpg"
}
```

### Attachment Object

```json
{
  "url": "https://example.com/file.pdf",
  "mime_type": "application/pdf",
  "title": "Attachment Title",
  "size_in_bytes": 1024000,
  "duration_in_seconds": 300
}
```

## Implementation Design

### JSON Feed Generator Class

```python
import json
from typing import List, Dict, Any, Iterator
from datetime import datetime, timezone

class JsonFeedGenerator:
    """JSON Feed 1.1 generator with streaming support"""

    def __init__(self, site_url: str, site_name: str, site_description: str,
                 author_name: str = None, author_url: str = None, author_avatar: str = None):
        self.site_url = site_url.rstrip('/')
        self.site_name = site_name
        self.site_description = site_description
        self.author = {
            'name': author_name,
            'url': author_url,
            'avatar': author_avatar
        } if author_name else None

    def generate(self, notes: List[Note], limit: int = 50) -> str:
        """Generate complete JSON feed

        IMPORTANT: Notes are expected to be in DESC order (newest first)
        from the database. This order MUST be preserved in the feed.
        """
        feed = self._build_feed_object(notes[:limit])
        return json.dumps(feed, ensure_ascii=False, indent=2)

    def generate_streaming(self, notes: List[Note], limit: int = 50) -> Iterator[str]:
        """Generate JSON feed as stream of chunks

        IMPORTANT: Notes are expected to be in DESC order (newest first)
        from the database. This order MUST be preserved in the feed.
        """
        # Start feed object
        yield '{\n'
        yield '  "version": "https://jsonfeed.org/version/1.1",\n'
        yield f'  "title": {json.dumps(self.site_name)},\n'

        # Add optional feed metadata
        yield from self._stream_feed_metadata()

        # Start items array
        yield '  "items": [\n'

        # Stream items - maintain DESC order (newest first)
        # DO NOT reverse! Database order is correct
        items = notes[:limit]
        for i, note in enumerate(items):
            item_json = json.dumps(self._build_item_object(note), indent=4)
            # Indent items properly
            indented = '\n'.join('    ' + line for line in item_json.split('\n'))
            yield indented

            if i < len(items) - 1:
                yield ',\n'
            else:
                yield '\n'

        # Close items array and feed
        yield '  ]\n'
        yield '}\n'

    def _build_feed_object(self, notes: List[Note]) -> Dict[str, Any]:
        """Build complete feed object"""
        feed = {
            'version': 'https://jsonfeed.org/version/1.1',
            'title': self.site_name,
            'home_page_url': self.site_url,
            'feed_url': f'{self.site_url}/feed.json',
            'description': self.site_description,
            'items': [self._build_item_object(note) for note in notes]
        }

        # Add optional fields
        if self.author:
            feed['authors'] = [self._clean_author(self.author)]

        feed['language'] = 'en'  # Make configurable

        # Add icon/favicon if configured
        icon_url = self._get_icon_url()
        if icon_url:
            feed['icon'] = icon_url

        favicon_url = self._get_favicon_url()
        if favicon_url:
            feed['favicon'] = favicon_url

        return feed

    def _build_item_object(self, note: Note) -> Dict[str, Any]:
        """Build item object from note"""
        permalink = f'{self.site_url}{note.permalink}'

        item = {
            'id': permalink,
            'url': permalink,
            'title': note.title or self._format_date_title(note.created_at),
            'date_published': self._format_json_date(note.created_at)
        }

        # Add content (prefer HTML)
        if note.html:
            item['content_html'] = note.html
        elif note.content:
            item['content_text'] = note.content

        # Add modified date if different
        if hasattr(note, 'updated_at') and note.updated_at != note.created_at:
            item['date_modified'] = self._format_json_date(note.updated_at)

        # Add summary if available
        if hasattr(note, 'summary') and note.summary:
            item['summary'] = note.summary

        # Add tags if available
        if hasattr(note, 'tags') and note.tags:
            item['tags'] = note.tags

        # Add author if different from feed author
        if hasattr(note, 'author') and note.author != self.author:
            item['authors'] = [self._clean_author(note.author)]

        # Add image if available
        image_url = self._extract_image_url(note)
        if image_url:
            item['image'] = image_url

        # Add custom extensions
        item['_starpunk'] = {
            'permalink_path': note.permalink,
            'word_count': len(note.content.split()) if note.content else 0
        }

        return item

    def _clean_author(self, author: Any) -> Dict[str, str]:
        """Clean author object for JSON"""
        clean = {}

        if isinstance(author, dict):
            if author.get('name'):
                clean['name'] = author['name']
            if author.get('url'):
                clean['url'] = author['url']
            if author.get('avatar'):
                clean['avatar'] = author['avatar']
        elif hasattr(author, 'name'):
            clean['name'] = author.name
            if hasattr(author, 'url'):
                clean['url'] = author.url
            if hasattr(author, 'avatar'):
                clean['avatar'] = author.avatar
        else:
            clean['name'] = str(author)

        return clean

    def _format_json_date(self, dt: datetime) -> str:
        """Format datetime to RFC 3339 for JSON Feed

        Format: 2024-11-25T12:00:00Z or 2024-11-25T12:00:00-05:00
        """
        if dt.tzinfo is None:
            dt = dt.replace(tzinfo=timezone.utc)

        # Use Z for UTC
        if dt.tzinfo == timezone.utc:
            return dt.strftime('%Y-%m-%dT%H:%M:%SZ')
        else:
            return dt.isoformat()

    def _extract_image_url(self, note: Note) -> Optional[str]:
        """Extract first image URL from note content"""
        if not note.html:
            return None

        # Simple regex to find first img tag
        import re
        match = re.search(r'<img[^>]+src="([^"]+)"', note.html)
        if match:
            img_url = match.group(1)
            # Make absolute if relative
            if not img_url.startswith('http'):
                img_url = f'{self.site_url}{img_url}'
            return img_url

        return None
```

### Streaming JSON Generation

For memory efficiency with large feeds:

```python
class StreamingJsonEncoder:
    """Helper for streaming JSON generation"""

    @staticmethod
    def stream_object(obj: Dict[str, Any], indent: int = 0) -> Iterator[str]:
        """Stream a JSON object"""
        indent_str = ' ' * indent
        yield indent_str + '{\n'

        items = list(obj.items())
        for i, (key, value) in enumerate(items):
            yield f'{indent_str}  "{key}": '

            if isinstance(value, dict):
                yield from StreamingJsonEncoder.stream_object(value, indent + 2)
            elif isinstance(value, list):
                yield from StreamingJsonEncoder.stream_array(value, indent + 2)
            else:
                yield json.dumps(value)

            if i < len(items) - 1:
                yield ','
            yield '\n'

        yield indent_str + '}'

    @staticmethod
    def stream_array(arr: List[Any], indent: int = 0) -> Iterator[str]:
        """Stream a JSON array"""
        indent_str = ' ' * indent
        yield '[\n'

        for i, item in enumerate(arr):
            if isinstance(item, dict):
                yield from StreamingJsonEncoder.stream_object(item, indent + 2)
            else:
                yield indent_str + '  ' + json.dumps(item)

            if i < len(arr) - 1:
                yield ','
            yield '\n'

        yield indent_str + ']'
```

## Complete JSON Feed Example

```json
{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "StarPunk Notes",
  "home_page_url": "https://example.com/",
  "feed_url": "https://example.com/feed.json",
  "description": "Personal notes and thoughts",
  "authors": [
    {
      "name": "John Doe",
      "url": "https://example.com/about",
      "avatar": "https://example.com/avatar.jpg"
    }
  ],
  "language": "en",
  "icon": "https://example.com/icon.png",
  "favicon": "https://example.com/favicon.ico",
  "items": [
    {
      "id": "https://example.com/notes/2024/11/25/first-note",
      "url": "https://example.com/notes/2024/11/25/first-note",
      "title": "My First Note",
      "content_html": "<p>This is my first note with <strong>bold</strong> text.</p>",
      "summary": "Introduction to my notes",
      "image": "https://example.com/images/first.jpg",
      "date_published": "2024-11-25T10:00:00Z",
      "date_modified": "2024-11-25T10:30:00Z",
      "tags": ["personal", "introduction"],
      "_starpunk": {
        "permalink_path": "/notes/2024/11/25/first-note",
        "word_count": 8
      }
    },
    {
      "id": "https://example.com/notes/2024/11/24/another-note",
      "url": "https://example.com/notes/2024/11/24/another-note",
      "title": "Another Note",
      "content_text": "Plain text content for this note.",
      "date_published": "2024-11-24T15:45:00Z",
      "tags": ["thoughts"],
      "_starpunk": {
        "permalink_path": "/notes/2024/11/24/another-note",
        "word_count": 6
      }
    }
  ]
}
```

## Validation

### JSON Feed Validator

Validate against the official validator:
- https://validator.jsonfeed.org/

### Common Validation Issues

1. **Invalid JSON Syntax**
   - Proper escaping of quotes
   - Valid UTF-8 encoding
   - No trailing commas

2. **Missing Required Fields**
   - version, title, items required
   - Each item needs id

3. **Invalid Date Format**
   - Must be RFC 3339
   - Include timezone

4. **Invalid URLs**
   - Must be absolute URLs
   - Properly encoded

## Testing Strategy

### Unit Tests

```python
class TestJsonFeedGenerator:
    def test_required_fields(self):
        """Test all required fields are present"""
        generator = JsonFeedGenerator(site_url, site_name, site_description)
        feed_json = generator.generate(notes)
        feed = json.loads(feed_json)

        assert feed['version'] == 'https://jsonfeed.org/version/1.1'
        assert 'title' in feed
        assert 'items' in feed

    def test_feed_order_newest_first(self):
        """Test JSON feed shows newest entries first (spec convention)"""
        # Create notes with different timestamps
        old_note = Note(
            title="Old Note",
            created_at=datetime(2024, 11, 20, 10, 0, 0, tzinfo=timezone.utc)
        )
        new_note = Note(
            title="New Note",
            created_at=datetime(2024, 11, 25, 10, 0, 0, tzinfo=timezone.utc)
        )

        # Generate feed with notes in DESC order (as from database)
        generator = JsonFeedGenerator(site_url, site_name, site_description)
        feed_json = generator.generate([new_note, old_note])
        feed = json.loads(feed_json)

        # First item should be newest
        assert feed['items'][0]['title'] == "New Note"
        assert '2024-11-25' in feed['items'][0]['date_published']

        # Second item should be oldest
        assert feed['items'][1]['title'] == "Old Note"
        assert '2024-11-20' in feed['items'][1]['date_published']

    def test_json_validity(self):
        """Test output is valid JSON"""
        generator = JsonFeedGenerator(site_url, site_name, site_description)
        feed_json = generator.generate(notes)

        # Should parse without error
        feed = json.loads(feed_json)
        assert isinstance(feed, dict)

    def test_date_formatting(self):
        """Test RFC 3339 date formatting"""
        dt = datetime(2024, 11, 25, 12, 0, 0, tzinfo=timezone.utc)
        formatted = generator._format_json_date(dt)

        assert formatted == '2024-11-25T12:00:00Z'

    def test_streaming_generation(self):
        """Test streaming produces valid JSON"""
        generator = JsonFeedGenerator(site_url, site_name, site_description)
        chunks = list(generator.generate_streaming(notes))
        feed_json = ''.join(chunks)

        # Should be valid JSON
        feed = json.loads(feed_json)
        assert feed['version'] == 'https://jsonfeed.org/version/1.1'

    def test_custom_extensions(self):
        """Test custom _starpunk extension"""
        generator = JsonFeedGenerator(site_url, site_name, site_description)
        feed_json = generator.generate([sample_note])
        feed = json.loads(feed_json)

        item = feed['items'][0]
        assert '_starpunk' in item
        assert 'permalink_path' in item['_starpunk']
        assert 'word_count' in item['_starpunk']
```

### Integration Tests

```python
def test_json_feed_endpoint():
    """Test JSON feed endpoint"""
    response = client.get('/feed.json')

    assert response.status_code == 200
    assert response.content_type == 'application/feed+json'

    feed = json.loads(response.data)
    assert feed['version'] == 'https://jsonfeed.org/version/1.1'

def test_content_negotiation_json():
    """Test content negotiation prefers JSON"""
    response = client.get('/feed', headers={'Accept': 'application/json'})

    assert response.status_code == 200
    assert 'json' in response.content_type.lower()

def test_feed_reader_compatibility():
    """Test with JSON Feed readers"""
    readers = [
        'Feedbin',
        'Inoreader',
        'NewsBlur',
        'NetNewsWire'
    ]

    for reader in readers:
        assert validate_with_reader(feed_url, reader, format='json')
```

### Validation Tests

```python
def test_jsonfeed_validation():
    """Validate against official validator"""
    generator = JsonFeedGenerator(site_url, site_name, site_description)
    feed_json = generator.generate(sample_notes)

    # Submit to validator
    result = validate_json_feed(feed_json)
    assert result['valid'] == True
    assert len(result['errors']) == 0
```

## Performance Benchmarks

### Generation Speed

```python
def benchmark_json_generation():
    """Benchmark JSON feed generation"""
    notes = generate_sample_notes(100)
    generator = JsonFeedGenerator(site_url, site_name, site_description)

    start = time.perf_counter()
    feed_json = generator.generate(notes, limit=50)
    duration = time.perf_counter() - start

    assert duration < 0.05  # Less than 50ms
    assert len(feed_json) > 0
```

### Size Comparison

```python
def test_json_vs_xml_size():
    """Compare JSON feed size to RSS/ATOM"""
    notes = generate_sample_notes(50)

    # Generate all formats
    json_feed = json_generator.generate(notes)
    rss_feed = rss_generator.generate(notes)
    atom_feed = atom_generator.generate(notes)

    # JSON should be more compact
    print(f"JSON: {len(json_feed)} bytes")
    print(f"RSS:  {len(rss_feed)} bytes")
    print(f"ATOM: {len(atom_feed)} bytes")

    # Typically JSON is 20-30% smaller
```

## Configuration

### JSON Feed Settings

```ini
# JSON Feed configuration
STARPUNK_FEED_JSON_ENABLED=true
STARPUNK_FEED_JSON_AUTHOR_NAME=John Doe
STARPUNK_FEED_JSON_AUTHOR_URL=https://example.com/about
STARPUNK_FEED_JSON_AUTHOR_AVATAR=https://example.com/avatar.jpg
STARPUNK_FEED_JSON_ICON=https://example.com/icon.png
STARPUNK_FEED_JSON_FAVICON=https://example.com/favicon.ico
STARPUNK_FEED_JSON_LANGUAGE=en
STARPUNK_FEED_JSON_HUB_URL=  # WebSub hub URL (optional)
```

## Security Considerations

1. **JSON Injection Prevention**
   - Proper JSON escaping
   - No raw user input
   - Validate all URLs

2. **Content Security**
   - HTML content sanitized
   - No script injection
   - Safe JSON encoding

3. **Size Limits**
   - Maximum feed size
   - Item count limits
   - Timeout protection

## Migration Notes

### Adding JSON Feed

- Runs parallel to RSS/ATOM
- No changes to existing feeds
- Shared caching infrastructure
- Same data source

## Advanced Features

### WebSub Support (Future)

```json
{
  "hubs": [
    {
      "type": "WebSub",
      "url": "https://example.com/hub"
    }
  ]
}
```

### Pagination

```json
{
  "next_url": "https://example.com/feed.json?page=2"
}
```

### Attachments

```json
{
  "attachments": [
    {
      "url": "https://example.com/podcast.mp3",
      "mime_type": "audio/mpeg",
      "title": "Podcast Episode",
      "size_in_bytes": 25000000,
      "duration_in_seconds": 1800
    }
  ]
}
```

## Acceptance Criteria

1. ✅ Valid JSON Feed 1.1 generation
2. ✅ All required fields present
3. ✅ RFC 3339 dates correct
4. ✅ Valid JSON syntax
5. ✅ Streaming generation working
6. ✅ Official validator passing
7. ✅ Works with 5+ JSON Feed readers
8. ✅ Performance target met (<50ms)
9. ✅ Custom extensions working
10. ✅ Security review passed