feat: v1.4.0 Phase 3 - Micropub Media Endpoint

Implement W3C Micropub media endpoint for external client uploads.

Changes:
- Add POST /micropub/media endpoint in routes/micropub.py
  - Accept multipart/form-data with 'file' field
  - Require bearer token with 'create' scope
  - Return 201 Created with Location header
  - Validate, optimize, and generate variants via save_media()

- Update q=config response to advertise media-endpoint
  - Include media-endpoint URL in config response
  - Add 'photo' post-type to supported types

- Add photo property support to Micropub create
  - extract_photos() function to parse photo property
  - Handles both simple URL strings and structured objects with alt text
  - _attach_photos_to_note() function to attach photos by URL
  - Only attach photos from our server (by URL match)
  - External URLs logged but ignored (no download)
  - Maximum 4 photos per note (per ADR-057)

- SITE_URL normalization pattern
  - Use .rstrip('/') for consistent URL comparison
  - Applied in media endpoint and photo attachment

Per design document: docs/design/v1.4.0/media-implementation-design.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-12-10 18:32:21 -07:00
parent 501a711050
commit c64feaea23
5 changed files with 2171 additions and 5 deletions

View File

@@ -0,0 +1,302 @@
# Feed Tags Implementation Design
**Version**: 1.3.1 "Syndicate Tags"
**Status**: Ready for Implementation
**Estimated Effort**: 1-2 hours
## Overview
This document specifies the implementation for adding tags/categories to all three syndication feed formats. Tags were added to the backend in v1.3.0 but are not currently included in feed output.
## Current State Analysis
### Tag Data Structure
Tags are stored as dictionaries with two fields:
- `name`: Normalized, URL-safe identifier (e.g., `machine-learning`)
- `display_name`: Human-readable label (e.g., `Machine Learning`)
The `get_note_tags(note_id)` function returns a list of these dictionaries, ordered alphabetically by display_name.
### Feed Generation Routes
The `_get_cached_notes()` function in `starpunk/routes/public.py` already attaches media to notes but **does not attach tags**. This is the key change needed to make tags available to feed generators.
### Feed Generator Functions
Each feed module uses a consistent pattern:
- Non-streaming function builds complete feed
- Streaming function yields chunks
- Both accept `notes: list[Note]` where notes may have attached attributes
## Design Decisions
Per user confirmation:
1. **Omit `scheme`/`domain` attributes** - Keep implementation minimal
2. **Omit `tags` field when empty** - Do not output empty array in JSON Feed
## Implementation Specification
### Phase 1: Load Tags in Feed Routes
**File**: `starpunk/routes/public.py`
**Change**: Modify `_get_cached_notes()` to attach tags to each note.
**Current code** (lines 66-69):
```python
# Attach media to each note (v1.2.0 Phase 3)
for note in notes:
media = get_note_media(note.id)
object.__setattr__(note, 'media', media)
```
**Required change**: Add tag loading after media loading:
```python
# Attach media to each note (v1.2.0 Phase 3)
for note in notes:
media = get_note_media(note.id)
object.__setattr__(note, 'media', media)
# Attach tags to each note (v1.3.1)
tags = get_note_tags(note.id)
object.__setattr__(note, 'tags', tags)
```
**Import needed**: Add `get_note_tags` to imports from `starpunk.tags`.
### Phase 2: RSS 2.0 Categories
**File**: `starpunk/feeds/rss.py`
**Standard**: RSS 2.0 Specification - `<category>` sub-element of `<item>`
**Format**:
```xml
<category>Display Name</category>
```
#### Non-Streaming Function (`generate_rss`)
The feedgen library's `FeedEntry` supports categories via `fe.category()`.
**Location**: After description is set (around line 143), add:
```python
# Add category elements for tags (v1.3.1)
if hasattr(note, 'tags') and note.tags:
for tag in note.tags:
fe.category({'term': tag['display_name']})
```
Note: feedgen's category accepts a dict with 'term' key for RSS output.
#### Streaming Function (`generate_rss_streaming`)
**Location**: After description in the item XML building (around line 293), add category elements:
Insert after the `<description>` CDATA section and before the media elements:
```python
# Add category elements for tags (v1.3.1)
if hasattr(note, 'tags') and note.tags:
for tag in note.tags:
item_xml += f"""
<category>{_escape_xml(tag['display_name'])}</category>"""
```
**Expected output**:
```xml
<item>
<title>My Post</title>
<link>https://example.com/note/my-post</link>
<guid isPermaLink="true">https://example.com/note/my-post</guid>
<pubDate>Mon, 18 Nov 2024 12:00:00 +0000</pubDate>
<description><![CDATA[...]]></description>
<category>Machine Learning</category>
<category>Python</category>
...
</item>
```
### Phase 3: Atom 1.0 Categories
**File**: `starpunk/feeds/atom.py`
**Standard**: RFC 4287 Section 4.2.2 - The `atom:category` Element
**Format**:
```xml
<category term="machine-learning" label="Machine Learning"/>
```
- `term` (REQUIRED): Normalized tag name for machine processing
- `label` (OPTIONAL): Human-readable display name
#### Streaming Function (`generate_atom_streaming`)
Note: `generate_atom()` delegates to streaming, so only one change needed.
**Location**: After the entry link element (around line 179), before content:
```python
# Add category elements for tags (v1.3.1)
if hasattr(note, 'tags') and note.tags:
for tag in note.tags:
yield f' <category term="{_escape_xml(tag["name"])}" label="{_escape_xml(tag["display_name"])}"/>\n'
```
**Expected output**:
```xml
<entry>
<id>https://example.com/note/my-post</id>
<title>My Post</title>
<published>2024-11-25T12:00:00Z</published>
<updated>2024-11-25T12:00:00Z</updated>
<link rel="alternate" type="text/html" href="https://example.com/note/my-post"/>
<category term="machine-learning" label="Machine Learning"/>
<category term="python" label="Python"/>
<content type="html">...</content>
</entry>
```
### Phase 4: JSON Feed 1.1 Tags
**File**: `starpunk/feeds/json_feed.py`
**Standard**: JSON Feed 1.1 Specification - `tags` field
**Format**:
```json
{
"tags": ["Machine Learning", "Python"]
}
```
Per user decision: **Omit `tags` field entirely when no tags** (do not output empty array).
#### Item Builder Function (`_build_item_object`)
**Location**: After attachments section (around line 308), before `_starpunk` extension:
```python
# Add tags array (v1.3.1)
# Per spec: array of plain strings (tags, not categories)
# Omit field when no tags (user decision: no empty array)
if hasattr(note, 'tags') and note.tags:
item["tags"] = [tag['display_name'] for tag in note.tags]
```
**Expected output** (note with tags):
```json
{
"id": "https://example.com/note/my-post",
"url": "https://example.com/note/my-post",
"title": "My Post",
"content_html": "...",
"date_published": "2024-11-25T12:00:00Z",
"tags": ["Machine Learning", "Python"],
"_starpunk": {...}
}
```
**Expected output** (note without tags):
```json
{
"id": "https://example.com/note/my-post",
"url": "https://example.com/note/my-post",
"title": "My Post",
"content_html": "...",
"date_published": "2024-11-25T12:00:00Z",
"_starpunk": {...}
}
```
Note: No `"tags"` field at all when empty.
## Testing Requirements
### Unit Tests
Create test file: `tests/unit/feeds/test_feed_tags.py`
#### RSS Tests
1. `test_rss_note_with_tags_has_category_elements`
2. `test_rss_note_without_tags_has_no_category_elements`
3. `test_rss_multiple_tags_multiple_categories`
4. `test_rss_streaming_tags`
#### Atom Tests
1. `test_atom_note_with_tags_has_category_elements`
2. `test_atom_category_has_term_and_label_attributes`
3. `test_atom_note_without_tags_has_no_category_elements`
4. `test_atom_streaming_tags`
#### JSON Feed Tests
1. `test_json_note_with_tags_has_tags_array`
2. `test_json_note_without_tags_omits_tags_field`
3. `test_json_tags_array_contains_display_names`
4. `test_json_streaming_tags`
### Integration Tests
Add to existing feed integration tests:
1. `test_feed_generation_with_mixed_tagged_notes` - Mix of notes with and without tags
2. `test_feed_tags_ordering` - Tags appear in alphabetical order by display_name
### Test Data Setup
```python
# Test note with tags attached
note = Note(...)
object.__setattr__(note, 'tags', [
{'name': 'machine-learning', 'display_name': 'Machine Learning'},
{'name': 'python', 'display_name': 'Python'},
])
# Test note without tags
note_no_tags = Note(...)
object.__setattr__(note_no_tags, 'tags', [])
```
## Implementation Order
1. **Routes change** (`public.py`) - Load tags in `_get_cached_notes()`
2. **JSON Feed** (`json_feed.py`) - Simplest change, good for validation
3. **Atom Feed** (`atom.py`) - Single streaming function
4. **RSS Feed** (`rss.py`) - Both streaming and non-streaming functions
5. **Tests** - Unit and integration tests
## Validation Checklist
- [ ] RSS feed validates against RSS 2.0 spec
- [ ] Atom feed validates against RFC 4287
- [ ] JSON Feed validates against JSON Feed 1.1 spec
- [ ] Notes without tags produce valid feeds (no empty elements/arrays)
- [ ] Special characters in tag names are properly escaped
- [ ] Existing tests continue to pass
- [ ] Feed caching works correctly with tags
## Standards References
- [RSS 2.0 - category element](https://www.rssboard.org/rss-specification#ltcategorygtSubelementOfLtitemgt)
- [RFC 4287 Section 4.2.2 - atom:category](https://datatracker.ietf.org/doc/html/rfc4287#section-4.2.2)
- [JSON Feed 1.1 - tags](https://www.jsonfeed.org/version/1.1/)
## Files to Modify
| File | Change |
|------|--------|
| `starpunk/routes/public.py` | Add tag loading to `_get_cached_notes()` |
| `starpunk/feeds/rss.py` | Add `<category>` elements in both functions |
| `starpunk/feeds/atom.py` | Add `<category term="..." label="..."/>` elements |
| `starpunk/feeds/json_feed.py` | Add `tags` array to `_build_item_object()` |
| `tests/unit/feeds/test_feed_tags.py` | New test file |
## Summary
This is a straightforward feature addition:
- One route change to load tags
- Three feed module changes to render tags
- Follows established patterns in existing code
- No new dependencies required
- Backward compatible (tags are optional in all specs)

File diff suppressed because it is too large Load Diff

View File

@@ -36,10 +36,25 @@
- ATOM enclosure links for all media - ATOM enclosure links for all media
- See: ADR-059 - See: ADR-059
### POSSE
- Native syndication to social networks
- Supported networks:
- First iteration:
- Mastodon (and compatible services)
- Bluesky
- Second iteration
- TBD
- Solution should include a configuration UI for setup
--- ---
## Medium ## Medium
### Default slug change
- The default slug should be a date time stamp
- YYYYMMDDHHMMSS
- Edge case, if the slug would somehow be a duplicate append a "-x" e.g. -1
### Tag Enhancements (v1.3.0 Follow-up) ### Tag Enhancements (v1.3.0 Follow-up)
- Tag pagination on archive pages (when note count exceeds threshold) - Tag pagination on archive pages (when note count exceeds threshold)
- Tag autocomplete in admin interface - Tag autocomplete in admin interface

View File

@@ -264,6 +264,106 @@ def extract_published_date(properties: dict) -> Optional[datetime]:
# Action Handlers # Action Handlers
def extract_photos(properties: dict) -> list[dict[str, str]]:
"""
Extract photo URLs and alt text from Micropub properties
Handles both simple URL strings and structured photo objects with alt text.
Args:
properties: Normalized Micropub properties dict
Returns:
List of dicts with 'url' and optional 'alt' keys
Examples:
>>> # Simple URL
>>> extract_photos({'photo': ['https://example.com/photo.jpg']})
[{'url': 'https://example.com/photo.jpg', 'alt': ''}]
>>> # With alt text
>>> extract_photos({'photo': [{'value': 'https://example.com/photo.jpg', 'alt': 'Sunset'}]})
[{'url': 'https://example.com/photo.jpg', 'alt': 'Sunset'}]
"""
photos = properties.get("photo", [])
result = []
for photo in photos:
if isinstance(photo, str):
# Simple URL string
result.append({'url': photo, 'alt': ''})
elif isinstance(photo, dict):
# Structured object with value and alt
url = photo.get('value') or photo.get('url', '')
alt = photo.get('alt', '')
if url:
result.append({'url': url, 'alt': alt})
return result
def _attach_photos_to_note(note_id: int, photos: list[dict[str, str]]) -> None:
"""
Attach photos to a note by URL
Photos must already exist on this server (uploaded via media endpoint).
External URLs are accepted but stored as-is (no download).
Args:
note_id: ID of the note to attach to
photos: List of dicts with 'url' and 'alt' keys
"""
from starpunk.database import get_db
from starpunk.media import attach_media_to_note
# Normalize SITE_URL by stripping trailing slash for consistent comparison
site_url = current_app.config.get("SITE_URL", "http://localhost:5000").rstrip('/')
db = get_db(current_app)
media_ids = []
captions = []
# Log warning if photos are being truncated
if len(photos) > 4:
current_app.logger.warning(
f"Micropub create received {len(photos)} photos, truncating to 4 per ADR-057"
)
for photo in photos[:4]: # Max 4 photos per ADR-057
url = photo['url']
alt = photo.get('alt', '')
# Check if URL is on our server
if url.startswith(site_url) or url.startswith('/media/'):
# Extract path from URL
if url.startswith(site_url):
path = url[len(site_url):]
else:
path = url
# Remove leading /media/ if present
if path.startswith('/media/'):
path = path[7:]
# Look up media by path
row = db.execute(
"SELECT id FROM media WHERE path = ?",
(path,)
).fetchone()
if row:
media_ids.append(row[0])
captions.append(alt)
else:
current_app.logger.warning(f"Photo URL not found in media: {url}")
else:
# External URL - log but don't fail
current_app.logger.info(f"External photo URL ignored: {url}")
if media_ids:
attach_media_to_note(note_id, media_ids, captions)
def handle_create(data: dict, token_info: dict): def handle_create(data: dict, token_info: dict):
""" """
Handle Micropub create action Handle Micropub create action
@@ -305,6 +405,7 @@ def handle_create(data: dict, token_info: dict):
title = extract_title(properties) title = extract_title(properties)
tags = extract_tags(properties) tags = extract_tags(properties)
published_date = extract_published_date(properties) published_date = extract_published_date(properties)
photos = extract_photos(properties) # v1.4.0
except MicropubValidationError as e: except MicropubValidationError as e:
raise e raise e
@@ -322,6 +423,10 @@ def handle_create(data: dict, token_info: dict):
tags=tags if tags else None # Pass tags to create_note (v1.3.0) tags=tags if tags else None # Pass tags to create_note (v1.3.0)
) )
# Attach photos if present (v1.4.0)
if photos:
_attach_photos_to_note(note.id, photos)
# Build permalink URL # Build permalink URL
# Note: SITE_URL is normalized to include trailing slash (for IndieAuth spec compliance) # Note: SITE_URL is normalized to include trailing slash (for IndieAuth spec compliance)
site_url = current_app.config.get("SITE_URL", "http://localhost:5000") site_url = current_app.config.get("SITE_URL", "http://localhost:5000")
@@ -358,11 +463,15 @@ def handle_query(args: dict, token_info: dict):
q = args.get("q") q = args.get("q")
if q == "config": if q == "config":
# Return server configuration # Return server configuration with media endpoint (v1.4.0)
site_url = current_app.config.get("SITE_URL", "http://localhost:5000").rstrip('/')
config = { config = {
"media-endpoint": None, # No media endpoint in V1 "media-endpoint": f"{site_url}/micropub/media",
"syndicate-to": [], # No syndication targets in V1 "syndicate-to": [], # No syndication targets in V1
"post-types": [{"type": "note", "name": "Note", "properties": ["content"]}], "post-types": [
{"type": "note", "name": "Note", "properties": ["content"]},
{"type": "photo", "name": "Photo", "properties": ["photo"]}
],
} }
return jsonify(config), 200 return jsonify(config), 200

View File

@@ -19,7 +19,7 @@ References:
- ADR-029: Micropub IndieAuth Integration Strategy - ADR-029: Micropub IndieAuth Integration Strategy
""" """
from flask import Blueprint, current_app, request from flask import Blueprint, current_app, request, make_response
from starpunk.micropub import ( from starpunk.micropub import (
MicropubError, MicropubError,
@@ -28,7 +28,7 @@ from starpunk.micropub import (
handle_create, handle_create,
handle_query, handle_query,
) )
from starpunk.auth_external import verify_external_token from starpunk.auth_external import verify_external_token, check_scope
# Create blueprint # Create blueprint
bp = Blueprint("micropub", __name__) bp = Blueprint("micropub", __name__)
@@ -119,3 +119,85 @@ def micropub_endpoint():
except Exception as e: except Exception as e:
current_app.logger.error(f"Micropub action error: {e}") current_app.logger.error(f"Micropub action error: {e}")
return error_response("server_error", "An unexpected error occurred", 500) return error_response("server_error", "An unexpected error occurred", 500)
@bp.route('/media', methods=['POST'])
def media_endpoint():
"""
Micropub media endpoint for file uploads
W3C Micropub Specification compliant media upload.
Accepts multipart/form-data with single file part named 'file'.
Returns:
201 Created with Location header on success
4xx/5xx error responses per OAuth 2.0 format
"""
from starpunk.media import save_media
# Extract and verify token
token = extract_bearer_token(request)
if not token:
return error_response("unauthorized", "No access token provided", 401)
token_info = verify_external_token(token)
if not token_info:
return error_response("unauthorized", "Invalid or expired access token", 401)
# Check scope (create scope allows media upload)
if not check_scope("create", token_info.get("scope", "")):
return error_response(
"insufficient_scope",
"Token lacks create scope",
403
)
# Validate content type
content_type = request.headers.get("Content-Type", "")
if "multipart/form-data" not in content_type:
return error_response(
"invalid_request",
"Content-Type must be multipart/form-data",
400
)
# Extract file
if 'file' not in request.files:
return error_response(
"invalid_request",
"No file provided. Use 'file' as the form field name.",
400
)
uploaded_file = request.files['file']
if not uploaded_file.filename:
return error_response(
"invalid_request",
"No filename provided",
400
)
try:
# Read file data
file_data = uploaded_file.read()
# Save media (validates, optimizes, generates variants)
media = save_media(file_data, uploaded_file.filename)
# Build media URL (normalize SITE_URL by removing trailing slash)
site_url = current_app.config.get("SITE_URL", "http://localhost:5000").rstrip('/')
media_url = f"{site_url}/media/{media['path']}"
# Return 201 with Location header (per W3C Micropub spec)
response = make_response("", 201)
response.headers["Location"] = media_url
return response
except ValueError as e:
# Validation errors (file too large, invalid format, etc.)
return error_response("invalid_request", str(e), 400)
except Exception as e:
current_app.logger.error(f"Media upload failed: {e}")
return error_response("server_error", "Failed to process upload", 500)