feat: v1.4.0 Phase 3 - Micropub Media Endpoint

Implement W3C Micropub media endpoint for external client uploads. Changes: - Add POST /micropub/media endpoint in routes/micropub.py - Accept multipart/form-data with 'file' field - Require bearer token with 'create' scope - Return 201 Created with Location header - Validate, optimize, and generate variants via save_media() - Update q=config response to advertise media-endpoint - Include media-endpoint URL in config response - Add 'photo' post-type to supported types - Add photo property support to Micropub create - extract_photos() function to parse photo property - Handles both simple URL strings and structured objects with alt text - _attach_photos_to_note() function to attach photos by URL - Only attach photos from our server (by URL match) - External URLs logged but ignored (no download) - Maximum 4 photos per note (per ADR-057) - SITE_URL normalization pattern - Use .rstrip('/') for consistent URL comparison - Applied in media endpoint and photo attachment Per design document: docs/design/v1.4.0/media-implementation-design.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-10 18:32:21 -07:00
parent 501a711050
commit c64feaea23
5 changed files with 2171 additions and 5 deletions
--- a/docs/design/v1.3.1/feed-tags-design.md
+++ b/docs/design/v1.3.1/feed-tags-design.md
@@ -0,0 +1,302 @@
 # Feed Tags Implementation Design
 **Version**: 1.3.1 "Syndicate Tags"
 **Status**: Ready for Implementation
 **Estimated Effort**: 1-2 hours
 ## Overview
 This document specifies the implementation for adding tags/categories to all three syndication feed formats. Tags were added to the backend in v1.3.0 but are not currently included in feed output.
 ## Current State Analysis
 ### Tag Data Structure
 Tags are stored as dictionaries with two fields:
 - `name`: Normalized, URL-safe identifier (e.g., `machine-learning`)
 - `display_name`: Human-readable label (e.g., `Machine Learning`)
 The `get_note_tags(note_id)` function returns a list of these dictionaries, ordered alphabetically by display_name.
 ### Feed Generation Routes
 The `_get_cached_notes()` function in `starpunk/routes/public.py` already attaches media to notes but **does not attach tags**. This is the key change needed to make tags available to feed generators.
 ### Feed Generator Functions
 Each feed module uses a consistent pattern:
 - Non-streaming function builds complete feed
 - Streaming function yields chunks
 - Both accept `notes: list[Note]` where notes may have attached attributes
 ## Design Decisions
 Per user confirmation:
 1. **Omit `scheme`/`domain` attributes** - Keep implementation minimal
 2. **Omit `tags` field when empty** - Do not output empty array in JSON Feed
 ## Implementation Specification
 ### Phase 1: Load Tags in Feed Routes
 **File**: `starpunk/routes/public.py`
 **Change**: Modify `_get_cached_notes()` to attach tags to each note.
 **Current code** (lines 66-69):
 ```python
 # Attach media to each note (v1.2.0 Phase 3)
 for note in notes:
    media = get_note_media(note.id)
    object.__setattr__(note, 'media', media)
 ```
 **Required change**: Add tag loading after media loading:
 ```python
 # Attach media to each note (v1.2.0 Phase 3)
 for note in notes:
    media = get_note_media(note.id)
    object.__setattr__(note, 'media', media)
    # Attach tags to each note (v1.3.1)
    tags = get_note_tags(note.id)
    object.__setattr__(note, 'tags', tags)
 ```
 **Import needed**: Add `get_note_tags` to imports from `starpunk.tags`.
 ### Phase 2: RSS 2.0 Categories
 **File**: `starpunk/feeds/rss.py`
 **Standard**: RSS 2.0 Specification - `<category>` sub-element of `<item>`
 **Format**:
 ```xml
 <category>Display Name</category>
 ```
 #### Non-Streaming Function (`generate_rss`)
 The feedgen library's `FeedEntry` supports categories via `fe.category()`.
 **Location**: After description is set (around line 143), add:
 ```python
 # Add category elements for tags (v1.3.1)
 if hasattr(note, 'tags') and note.tags:
    for tag in note.tags:
        fe.category({'term': tag['display_name']})
 ```
 Note: feedgen's category accepts a dict with 'term' key for RSS output.
 #### Streaming Function (`generate_rss_streaming`)
 **Location**: After description in the item XML building (around line 293), add category elements:
 Insert after the `<description>` CDATA section and before the media elements:
 ```python
 # Add category elements for tags (v1.3.1)
 if hasattr(note, 'tags') and note.tags:
    for tag in note.tags:
        item_xml += f"""
      <category>{_escape_xml(tag['display_name'])}</category>"""
 ```
 **Expected output**:
 ```xml
 <item>
  <title>My Post</title>
  <link>https://example.com/note/my-post</link>
  <guid isPermaLink="true">https://example.com/note/my-post</guid>
  <pubDate>Mon, 18 Nov 2024 12:00:00 +0000</pubDate>
  <description><![CDATA[...]]></description>
  <category>Machine Learning</category>
  <category>Python</category>
  ...
 </item>
 ```
 ### Phase 3: Atom 1.0 Categories
 **File**: `starpunk/feeds/atom.py`
 **Standard**: RFC 4287 Section 4.2.2 - The `atom:category` Element
 **Format**:
 ```xml
 <category term="machine-learning" label="Machine Learning"/>
 ```
 - `term` (REQUIRED): Normalized tag name for machine processing
 - `label` (OPTIONAL): Human-readable display name
 #### Streaming Function (`generate_atom_streaming`)
 Note: `generate_atom()` delegates to streaming, so only one change needed.
 **Location**: After the entry link element (around line 179), before content:
 ```python
 # Add category elements for tags (v1.3.1)
 if hasattr(note, 'tags') and note.tags:
    for tag in note.tags:
        yield f'    <category term="{_escape_xml(tag["name"])}" label="{_escape_xml(tag["display_name"])}"/>\n'
 ```
 **Expected output**:
 ```xml
 <entry>
  <id>https://example.com/note/my-post</id>
  <title>My Post</title>
  <published>2024-11-25T12:00:00Z</published>
  <updated>2024-11-25T12:00:00Z</updated>
  <link rel="alternate" type="text/html" href="https://example.com/note/my-post"/>
  <category term="machine-learning" label="Machine Learning"/>
  <category term="python" label="Python"/>
  <content type="html">...</content>
 </entry>
 ```
 ### Phase 4: JSON Feed 1.1 Tags
 **File**: `starpunk/feeds/json_feed.py`
 **Standard**: JSON Feed 1.1 Specification - `tags` field
 **Format**:
 ```json
 {
  "tags": ["Machine Learning", "Python"]
 }
 ```
 Per user decision: **Omit `tags` field entirely when no tags** (do not output empty array).
 #### Item Builder Function (`_build_item_object`)
 **Location**: After attachments section (around line 308), before `_starpunk` extension:
 ```python
 # Add tags array (v1.3.1)
 # Per spec: array of plain strings (tags, not categories)
 # Omit field when no tags (user decision: no empty array)
 if hasattr(note, 'tags') and note.tags:
    item["tags"] = [tag['display_name'] for tag in note.tags]
 ```
 **Expected output** (note with tags):
 ```json
 {
  "id": "https://example.com/note/my-post",
  "url": "https://example.com/note/my-post",
  "title": "My Post",
  "content_html": "...",
  "date_published": "2024-11-25T12:00:00Z",
  "tags": ["Machine Learning", "Python"],
  "_starpunk": {...}
 }
 ```
 **Expected output** (note without tags):
 ```json
 {
  "id": "https://example.com/note/my-post",
  "url": "https://example.com/note/my-post",
  "title": "My Post",
  "content_html": "...",
  "date_published": "2024-11-25T12:00:00Z",
  "_starpunk": {...}
 }
 ```
 Note: No `"tags"` field at all when empty.
 ## Testing Requirements
 ### Unit Tests
 Create test file: `tests/unit/feeds/test_feed_tags.py`
 #### RSS Tests
 1. `test_rss_note_with_tags_has_category_elements`
 2. `test_rss_note_without_tags_has_no_category_elements`
 3. `test_rss_multiple_tags_multiple_categories`
 4. `test_rss_streaming_tags`
 #### Atom Tests
 1. `test_atom_note_with_tags_has_category_elements`
 2. `test_atom_category_has_term_and_label_attributes`
 3. `test_atom_note_without_tags_has_no_category_elements`
 4. `test_atom_streaming_tags`
 #### JSON Feed Tests
 1. `test_json_note_with_tags_has_tags_array`
 2. `test_json_note_without_tags_omits_tags_field`
 3. `test_json_tags_array_contains_display_names`
 4. `test_json_streaming_tags`
 ### Integration Tests
 Add to existing feed integration tests:
 1. `test_feed_generation_with_mixed_tagged_notes` - Mix of notes with and without tags
 2. `test_feed_tags_ordering` - Tags appear in alphabetical order by display_name
 ### Test Data Setup
 ```python
 # Test note with tags attached
 note = Note(...)
 object.__setattr__(note, 'tags', [
    {'name': 'machine-learning', 'display_name': 'Machine Learning'},
    {'name': 'python', 'display_name': 'Python'},
 ])
 # Test note without tags
 note_no_tags = Note(...)
 object.__setattr__(note_no_tags, 'tags', [])
 ```
 ## Implementation Order
 1. **Routes change** (`public.py`) - Load tags in `_get_cached_notes()`
 2. **JSON Feed** (`json_feed.py`) - Simplest change, good for validation
 3. **Atom Feed** (`atom.py`) - Single streaming function
 4. **RSS Feed** (`rss.py`) - Both streaming and non-streaming functions
 5. **Tests** - Unit and integration tests
 ## Validation Checklist
 - [ ] RSS feed validates against RSS 2.0 spec
 - [ ] Atom feed validates against RFC 4287
 - [ ] JSON Feed validates against JSON Feed 1.1 spec
 - [ ] Notes without tags produce valid feeds (no empty elements/arrays)
 - [ ] Special characters in tag names are properly escaped
 - [ ] Existing tests continue to pass
 - [ ] Feed caching works correctly with tags
 ## Standards References
 - [RSS 2.0 - category element](https://www.rssboard.org/rss-specification#ltcategorygtSubelementOfLtitemgt)
 - [RFC 4287 Section 4.2.2 - atom:category](https://datatracker.ietf.org/doc/html/rfc4287#section-4.2.2)
 - [JSON Feed 1.1 - tags](https://www.jsonfeed.org/version/1.1/)
 ## Files to Modify
 | File | Change |
 |------|--------|
 | `starpunk/routes/public.py` | Add tag loading to `_get_cached_notes()` |
 | `starpunk/feeds/rss.py` | Add `<category>` elements in both functions |
 | `starpunk/feeds/atom.py` | Add `<category term="..." label="..."/>` elements |
 | `starpunk/feeds/json_feed.py` | Add `tags` array to `_build_item_object()` |
 | `tests/unit/feeds/test_feed_tags.py` | New test file |
 ## Summary
 This is a straightforward feature addition:
 - One route change to load tags
 - Three feed module changes to render tags
 - Follows established patterns in existing code
 - No new dependencies required
 - Backward compatible (tags are optional in all specs)
--- a/docs/design/v1.4.0/media-implementation-design.md
+++ b/docs/design/v1.4.0/media-implementation-design.md
--- a/docs/projectplan/BACKLOG.md
+++ b/docs/projectplan/BACKLOG.md
@@ -36,10 +36,25 @@
 - ATOM enclosure links for all media
 - See: ADR-059
 ### POSSE 
 - Native syndication to social networks
 - Supported networks:
  - First iteration: 
    - Mastodon (and compatible services)
    - Bluesky
  - Second iteration 
    - TBD 
 - Solution should include a configuration UI for setup
 ---
 ## Medium
 ### Default slug change 
 - The default slug should be a date time stamp 
 - YYYYMMDDHHMMSS
 - Edge case, if the slug would somehow be a duplicate append a "-x" e.g. -1 
 ### Tag Enhancements (v1.3.0 Follow-up)
 - Tag pagination on archive pages (when note count exceeds threshold)
 - Tag autocomplete in admin interface
--- a/starpunk/micropub.py
+++ b/starpunk/micropub.py
@@ -264,6 +264,106 @@ def extract_published_date(properties: dict) -> Optional[datetime]:
 # Action Handlers
 def extract_photos(properties: dict) -> list[dict[str, str]]:
    """
    Extract photo URLs and alt text from Micropub properties
    Handles both simple URL strings and structured photo objects with alt text.
    Args:
        properties: Normalized Micropub properties dict
    Returns:
        List of dicts with 'url' and optional 'alt' keys
    Examples:
        >>> # Simple URL
        >>> extract_photos({'photo': ['https://example.com/photo.jpg']})
        [{'url': 'https://example.com/photo.jpg', 'alt': ''}]
        >>> # With alt text
        >>> extract_photos({'photo': [{'value': 'https://example.com/photo.jpg', 'alt': 'Sunset'}]})
        [{'url': 'https://example.com/photo.jpg', 'alt': 'Sunset'}]
    """
    photos = properties.get("photo", [])
    result = []
    for photo in photos:
        if isinstance(photo, str):
            # Simple URL string
            result.append({'url': photo, 'alt': ''})
        elif isinstance(photo, dict):
            # Structured object with value and alt
            url = photo.get('value') or photo.get('url', '')
            alt = photo.get('alt', '')
            if url:
                result.append({'url': url, 'alt': alt})
    return result
 def _attach_photos_to_note(note_id: int, photos: list[dict[str, str]]) -> None:
    """
    Attach photos to a note by URL
    Photos must already exist on this server (uploaded via media endpoint).
    External URLs are accepted but stored as-is (no download).
    Args:
        note_id: ID of the note to attach to
        photos: List of dicts with 'url' and 'alt' keys
    """
    from starpunk.database import get_db
    from starpunk.media import attach_media_to_note
    # Normalize SITE_URL by stripping trailing slash for consistent comparison
    site_url = current_app.config.get("SITE_URL", "http://localhost:5000").rstrip('/')
    db = get_db(current_app)
    media_ids = []
    captions = []
    # Log warning if photos are being truncated
    if len(photos) > 4:
        current_app.logger.warning(
            f"Micropub create received {len(photos)} photos, truncating to 4 per ADR-057"
        )
    for photo in photos[:4]:  # Max 4 photos per ADR-057
        url = photo['url']
        alt = photo.get('alt', '')
        # Check if URL is on our server
        if url.startswith(site_url) or url.startswith('/media/'):
            # Extract path from URL
            if url.startswith(site_url):
                path = url[len(site_url):]
            else:
                path = url
            # Remove leading /media/ if present
            if path.startswith('/media/'):
                path = path[7:]
            # Look up media by path
            row = db.execute(
                "SELECT id FROM media WHERE path = ?",
                (path,)
            ).fetchone()
            if row:
                media_ids.append(row[0])
                captions.append(alt)
            else:
                current_app.logger.warning(f"Photo URL not found in media: {url}")
        else:
            # External URL - log but don't fail
            current_app.logger.info(f"External photo URL ignored: {url}")
    if media_ids:
        attach_media_to_note(note_id, media_ids, captions)
 def handle_create(data: dict, token_info: dict):
    """
    Handle Micropub create action
@@ -305,6 +405,7 @@ def handle_create(data: dict, token_info: dict):
        title = extract_title(properties)
        tags = extract_tags(properties)
        published_date = extract_published_date(properties)
        photos = extract_photos(properties)  # v1.4.0
    except MicropubValidationError as e:
        raise e
@@ -322,6 +423,10 @@ def handle_create(data: dict, token_info: dict):
            tags=tags if tags else None  # Pass tags to create_note (v1.3.0)
        )
        # Attach photos if present (v1.4.0)
        if photos:
            _attach_photos_to_note(note.id, photos)
        # Build permalink URL
        # Note: SITE_URL is normalized to include trailing slash (for IndieAuth spec compliance)
        site_url = current_app.config.get("SITE_URL", "http://localhost:5000")
@@ -358,11 +463,15 @@ def handle_query(args: dict, token_info: dict):
    q = args.get("q")
    if q == "config":
-        # Return server configuration
+        # Return server configuration with media endpoint (v1.4.0)
        site_url = current_app.config.get("SITE_URL", "http://localhost:5000").rstrip('/')
        config = {
-            "media-endpoint": None,  # No media endpoint in V1
+            "media-endpoint": f"{site_url}/micropub/media",
            "syndicate-to": [],  # No syndication targets in V1
-            "post-types": [{"type": "note", "name": "Note", "properties": ["content"]}],
+            "post-types": [
                {"type": "note", "name": "Note", "properties": ["content"]},
                {"type": "photo", "name": "Photo", "properties": ["photo"]}
            ],
        }
        return jsonify(config), 200
--- a/starpunk/routes/micropub.py
+++ b/starpunk/routes/micropub.py
@@ -19,7 +19,7 @@ References:
    - ADR-029: Micropub IndieAuth Integration Strategy
 """
-from flask import Blueprint, current_app, request
+from flask import Blueprint, current_app, request, make_response
 from starpunk.micropub import (
    MicropubError,
@@ -28,7 +28,7 @@ from starpunk.micropub import (
    handle_create,
    handle_query,
 )
-from starpunk.auth_external import verify_external_token
+from starpunk.auth_external import verify_external_token, check_scope
 # Create blueprint
 bp = Blueprint("micropub", __name__)
@@ -119,3 +119,85 @@ def micropub_endpoint():
    except Exception as e:
        current_app.logger.error(f"Micropub action error: {e}")
        return error_response("server_error", "An unexpected error occurred", 500)
@bp.route('/media', methods=['POST'])
 def media_endpoint():
    """
    Micropub media endpoint for file uploads
    W3C Micropub Specification compliant media upload.
    Accepts multipart/form-data with single file part named 'file'.
    Returns:
        201 Created with Location header on success
        4xx/5xx error responses per OAuth 2.0 format
    """
    from starpunk.media import save_media
    # Extract and verify token
    token = extract_bearer_token(request)
    if not token:
        return error_response("unauthorized", "No access token provided", 401)
    token_info = verify_external_token(token)
    if not token_info:
        return error_response("unauthorized", "Invalid or expired access token", 401)
    # Check scope (create scope allows media upload)
    if not check_scope("create", token_info.get("scope", "")):
        return error_response(
            "insufficient_scope",
            "Token lacks create scope",
            403
        )
    # Validate content type
    content_type = request.headers.get("Content-Type", "")
    if "multipart/form-data" not in content_type:
        return error_response(
            "invalid_request",
            "Content-Type must be multipart/form-data",
            400
        )
    # Extract file
    if 'file' not in request.files:
        return error_response(
            "invalid_request",
            "No file provided. Use 'file' as the form field name.",
            400
        )
    uploaded_file = request.files['file']
    if not uploaded_file.filename:
        return error_response(
            "invalid_request",
            "No filename provided",
            400
        )
    try:
        # Read file data
        file_data = uploaded_file.read()
        # Save media (validates, optimizes, generates variants)
        media = save_media(file_data, uploaded_file.filename)
        # Build media URL (normalize SITE_URL by removing trailing slash)
        site_url = current_app.config.get("SITE_URL", "http://localhost:5000").rstrip('/')
        media_url = f"{site_url}/media/{media['path']}"
        # Return 201 with Location header (per W3C Micropub spec)
        response = make_response("", 201)
        response.headers["Location"] = media_url
        return response
    except ValueError as e:
        # Validation errors (file too large, invalid format, etc.)
        return error_response("invalid_request", str(e), 400)
    except Exception as e:
        current_app.logger.error(f"Media upload failed: {e}")
        return error_response("server_error", "Failed to process upload", 500)