Implements tag/category system backend following microformats2 p-category specification. Database changes: - Migration 008: Add tags and note_tags tables - Normalized tag storage (case-insensitive lookup, display name preserved) - Indexes for performance New module: - starpunk/tags.py: Tag management functions - normalize_tag: Normalize tag strings - get_or_create_tag: Get or create tag records - add_tags_to_note: Associate tags with notes (replaces existing) - get_note_tags: Retrieve note tags (alphabetically ordered) - get_tag_by_name: Lookup tag by normalized name - get_notes_by_tag: Get all notes with specific tag - parse_tag_input: Parse comma-separated tag input Model updates: - Note.tags property (lazy-loaded, prefer pre-loading in routes) - Note.to_dict() add include_tags parameter CRUD updates: - create_note() accepts tags parameter - update_note() accepts tags parameter (None = no change, [] = remove all) Micropub integration: - Pass tags to create_note() (tags already extracted by extract_tags()) - Return tags in q=source response Per design doc: docs/design/v1.3.0/microformats-tags-design.md Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
808 lines
26 KiB
Markdown
808 lines
26 KiB
Markdown
# IndieAuth Endpoint Discovery Implementation Analysis
|
|
|
|
**Date**: 2025-11-24
|
|
**Developer**: StarPunk Fullstack Developer
|
|
**Status**: Ready for Architect Review
|
|
**Target Version**: 1.0.0-rc.5
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
I have reviewed the architect's corrected IndieAuth endpoint discovery design (ADR-043) and the W3C IndieAuth specification. The design is fundamentally sound and correctly implements the IndieAuth specification. However, I have **critical questions** about implementation details, particularly around the "chicken-and-egg" problem of determining which endpoint to verify a token with when we don't know the user's identity beforehand.
|
|
|
|
**Overall Assessment**: The design is architecturally correct, but needs clarification on practical implementation details before coding can begin.
|
|
|
|
---
|
|
|
|
## What I Understand
|
|
|
|
### 1. The Core Problem Fixed
|
|
|
|
The architect correctly identified that **hardcoding `TOKEN_ENDPOINT=https://tokens.indieauth.com/token` is fundamentally wrong**. This violates IndieAuth's core principle of user sovereignty.
|
|
|
|
**Correct Approach**:
|
|
- Store only `ADMIN_ME=https://admin.example.com/` in configuration
|
|
- Discover endpoints dynamically from the user's profile URL at runtime
|
|
- Each user can use their own IndieAuth provider
|
|
|
|
### 2. Endpoint Discovery Flow
|
|
|
|
Per W3C IndieAuth Section 4.2, I understand the discovery process:
|
|
|
|
```
|
|
1. Fetch user's profile URL (e.g., https://admin.example.com/)
|
|
2. Check in priority order:
|
|
a. HTTP Link headers (highest priority)
|
|
b. HTML <link> elements (document order)
|
|
c. IndieAuth metadata endpoint (optional)
|
|
3. Parse rel="authorization_endpoint" and rel="token_endpoint"
|
|
4. Resolve relative URLs against profile URL base
|
|
5. Cache discovered endpoints (with TTL)
|
|
```
|
|
|
|
**Example Discovery**:
|
|
```html
|
|
GET https://admin.example.com/ HTTP/1.1
|
|
|
|
HTTP/1.1 200 OK
|
|
Link: <https://auth.example.com/token>; rel="token_endpoint"
|
|
Content-Type: text/html
|
|
|
|
<html>
|
|
<head>
|
|
<link rel="authorization_endpoint" href="https://auth.example.com/authorize">
|
|
<link rel="token_endpoint" href="https://auth.example.com/token">
|
|
</head>
|
|
```
|
|
|
|
### 3. Token Verification Flow
|
|
|
|
Per W3C IndieAuth Section 6, I understand token verification:
|
|
|
|
```
|
|
1. Receive Bearer token in Authorization header
|
|
2. Make GET request to token endpoint with Bearer token
|
|
3. Token endpoint returns: {me, client_id, scope}
|
|
4. Validate 'me' matches expected identity
|
|
5. Check required scopes present
|
|
```
|
|
|
|
**Example Verification**:
|
|
```
|
|
GET https://auth.example.com/token HTTP/1.1
|
|
Authorization: Bearer xyz123
|
|
Accept: application/json
|
|
|
|
HTTP/1.1 200 OK
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"me": "https://admin.example.com/",
|
|
"client_id": "https://quill.p3k.io/",
|
|
"scope": "create update delete"
|
|
}
|
|
```
|
|
|
|
### 4. Security Considerations
|
|
|
|
I understand the security model from the architect's docs:
|
|
|
|
- **HTTPS Required**: Profile URLs and endpoints MUST use HTTPS in production
|
|
- **Redirect Limits**: Maximum 5 redirects to prevent loops
|
|
- **Cache Integrity**: Validate endpoints before caching
|
|
- **URL Validation**: Ensure discovered URLs are well-formed
|
|
- **Token Hashing**: Hash tokens before caching (SHA-256)
|
|
|
|
### 5. Implementation Components
|
|
|
|
I understand these modules need to be created:
|
|
|
|
1. **`endpoint_discovery.py`**: Discover endpoints from profile URLs
|
|
- HTTP Link header parsing
|
|
- HTML link element extraction
|
|
- URL resolution (relative to absolute)
|
|
- Error handling
|
|
|
|
2. **Updated `auth_external.py`**: Token verification with discovery
|
|
- Integrate endpoint discovery
|
|
- Cache discovered endpoints
|
|
- Verify tokens with discovered endpoints
|
|
- Validate responses
|
|
|
|
3. **`endpoint_cache.py`** (or part of auth_external): Caching layer
|
|
- Endpoint caching (TTL: 3600s)
|
|
- Token verification caching (TTL: 300s)
|
|
- Cache invalidation
|
|
|
|
### 6. Current Broken Code
|
|
|
|
From `starpunk/auth_external.py` line 49:
|
|
```python
|
|
token_endpoint = current_app.config.get("TOKEN_ENDPOINT")
|
|
```
|
|
|
|
This hardcoded approach is the problem we're fixing.
|
|
|
|
---
|
|
|
|
## Critical Questions for the Architect
|
|
|
|
### Question 1: The "Which Endpoint?" Problem ⚠️
|
|
|
|
**The Problem**: When Micropub receives a token, we need to verify it. But **which endpoint do we use to verify it**?
|
|
|
|
The W3C spec says:
|
|
> "GET request to the token endpoint containing an HTTP Authorization header with the Bearer Token according to [[RFC6750]]"
|
|
|
|
But it doesn't say **how we know which token endpoint to use** when we receive a token from an unknown source.
|
|
|
|
**Current Micropub Flow**:
|
|
```python
|
|
# micropub.py line 74
|
|
token_info = verify_external_token(token)
|
|
```
|
|
|
|
The token is an opaque string like `"abc123xyz"`. We have no idea:
|
|
- Which user it belongs to
|
|
- Which provider issued it
|
|
- Which endpoint to verify it with
|
|
|
|
**ADR-043-CORRECTED suggests (line 204-258)**:
|
|
```
|
|
4. Option A: If we have cached token info, use cached 'me' URL
|
|
5. Option B: Try verification with last known endpoint for similar tokens
|
|
6. Option C: Require 'me' parameter in Micropub request
|
|
```
|
|
|
|
**My Questions**:
|
|
|
|
**1a)** Which option should I implement? The ADR presents three options but doesn't specify which one.
|
|
|
|
**1b)** For **Option A** (cached token): How does the first request work? We need to verify a token to cache its 'me' URL, but we need the 'me' URL to know which endpoint to verify with. This is circular.
|
|
|
|
**1c)** For **Option B** (last known endpoint): How do we handle the first token ever received? What is the "last known endpoint" when the cache is empty?
|
|
|
|
**1d)** For **Option C** (require 'me' parameter): Does this violate the Micropub spec? The W3C Micropub specification doesn't include a 'me' parameter in requests. Is this a StarPunk-specific extension?
|
|
|
|
**1e)** **Proposed Solution** (awaiting architect approval):
|
|
|
|
Since StarPunk is a **single-user CMS**, we KNOW the only valid tokens are for `ADMIN_ME`. Therefore:
|
|
|
|
```python
|
|
def verify_external_token(token: str) -> Optional[Dict[str, Any]]:
|
|
"""Verify token for the admin user"""
|
|
admin_me = current_app.config.get("ADMIN_ME")
|
|
|
|
# Discover endpoints from ADMIN_ME
|
|
endpoints = discover_endpoints(admin_me)
|
|
token_endpoint = endpoints['token_endpoint']
|
|
|
|
# Verify token with discovered endpoint
|
|
response = httpx.get(
|
|
token_endpoint,
|
|
headers={'Authorization': f'Bearer {token}'}
|
|
)
|
|
|
|
token_info = response.json()
|
|
|
|
# Validate token belongs to admin
|
|
if normalize_url(token_info['me']) != normalize_url(admin_me):
|
|
raise TokenVerificationError("Token not for admin user")
|
|
|
|
return token_info
|
|
```
|
|
|
|
**Is this the correct approach?** This assumes:
|
|
- StarPunk only accepts tokens for `ADMIN_ME`
|
|
- We always discover from `ADMIN_ME` profile URL
|
|
- Multi-user support is explicitly out of scope for V1
|
|
|
|
Please confirm this is correct or provide the proper approach.
|
|
|
|
---
|
|
|
|
### Question 2: Caching Strategy Details
|
|
|
|
**ADR-043-CORRECTED suggests** (line 131-160):
|
|
- Endpoint cache TTL: 3600s (1 hour)
|
|
- Token verification cache TTL: 300s (5 minutes)
|
|
|
|
**My Questions**:
|
|
|
|
**2a)** **Cache Key for Endpoints**: Should the cache key be the profile URL (`admin_me`) or should we maintain a global cache?
|
|
|
|
For single-user StarPunk, we only have one profile URL (`ADMIN_ME`), so a simple cache like:
|
|
```python
|
|
self.cached_endpoints = None
|
|
self.cached_until = 0
|
|
```
|
|
|
|
Would suffice. Is this acceptable, or should I implement a full `profile_url -> endpoints` dict for future multi-user support?
|
|
|
|
**2b)** **Cache Key for Tokens**: The migration guide (line 259) suggests hashing tokens:
|
|
```python
|
|
token_hash = hashlib.sha256(token.encode()).hexdigest()
|
|
```
|
|
|
|
But if tokens are opaque and unpredictable, why hash them? Is this:
|
|
- To prevent tokens appearing in logs/debug output?
|
|
- To prevent tokens being extracted from memory dumps?
|
|
- Because cache keys should be fixed-length?
|
|
|
|
If it's for security, should I also:
|
|
- Use a constant-time comparison for token hash lookups?
|
|
- Add HMAC with a secret key instead of plain SHA-256?
|
|
|
|
**2c)** **Cache Invalidation**: When should I clear the cache?
|
|
- On application startup? (cache is in-memory, so yes?)
|
|
- On configuration changes? (how do I detect these?)
|
|
- On token verification failures? (what if it's a network issue, not a provider change?)
|
|
- Manual admin endpoint `/admin/clear-cache`? (should I implement this?)
|
|
|
|
**2d)** **Cache Storage**: The ADR shows in-memory caching. Should I:
|
|
- Use a simple dict with tuples: `cache[key] = (value, expiry)`
|
|
- Use `functools.lru_cache` decorator?
|
|
- Use `cachetools` library for TTL support?
|
|
- Implement custom `EndpointCache` class as shown in ADR?
|
|
|
|
For V1 simplicity, I propose **custom class with simple dict**, but please confirm.
|
|
|
|
---
|
|
|
|
### Question 3: HTML Parsing Implementation
|
|
|
|
**From `docs/migration/fix-hardcoded-endpoints.md`** line 139-159:
|
|
|
|
```python
|
|
from bs4 import BeautifulSoup
|
|
|
|
def _extract_from_html(self, html: str, base_url: str) -> Dict[str, str]:
|
|
soup = BeautifulSoup(html, 'html.parser')
|
|
|
|
auth_link = soup.find('link', rel='authorization_endpoint')
|
|
if auth_link and auth_link.get('href'):
|
|
endpoints['authorization_endpoint'] = urljoin(base_url, auth_link['href'])
|
|
```
|
|
|
|
**My Questions**:
|
|
|
|
**3a)** **Dependency**: Do we want to add BeautifulSoup4 as a dependency? Current dependencies (from quick check):
|
|
- Flask
|
|
- httpx
|
|
- Other core libs
|
|
|
|
BeautifulSoup4 is a new dependency. Alternatives:
|
|
- Use Python's built-in `html.parser` (more fragile)
|
|
- Use regex (bad for HTML, but endpoints are simple)
|
|
- Use `lxml` (faster, but C extension dependency)
|
|
|
|
**Recommendation**: Add BeautifulSoup4 with html.parser backend (pure Python). Confirm?
|
|
|
|
**3b)** **HTML Validation**: Should I validate HTML before parsing?
|
|
- Malformed HTML could cause parsing errors
|
|
- Should I catch and handle `ParserError`?
|
|
- What if there's no `<head>` section?
|
|
- What if `<link>` elements are in `<body>` (technically invalid but might exist)?
|
|
|
|
**3c)** **Case Sensitivity**: HTML `rel` attributes are case-insensitive per spec. Should I:
|
|
```python
|
|
soup.find('link', rel='token_endpoint') # Exact match
|
|
# vs
|
|
soup.find('link', rel=lambda x: x.lower() == 'token_endpoint' if x else False)
|
|
```
|
|
|
|
BeautifulSoup's `find()` is case-insensitive by default for attributes, so this should be fine, but confirm?
|
|
|
|
---
|
|
|
|
### Question 4: HTTP Link Header Parsing
|
|
|
|
**From `docs/migration/fix-hardcoded-endpoints.md`** line 126-136:
|
|
|
|
```python
|
|
def _parse_link_header(self, header: str, base_url: str) -> Dict[str, str]:
|
|
pattern = r'<([^>]+)>;\s*rel="([^"]+)"'
|
|
matches = re.findall(pattern, header)
|
|
```
|
|
|
|
**My Questions**:
|
|
|
|
**4a)** **Regex Robustness**: This regex assumes:
|
|
- Double quotes around rel value
|
|
- Semicolon separator
|
|
- No spaces in weird places
|
|
|
|
But HTTP Link header format (RFC 8288) is more complex:
|
|
```
|
|
Link: <url>; rel="value"; param="other"
|
|
Link: <url>; rel=value (no quotes allowed per spec)
|
|
Link: <url>;rel="value" (no space after semicolon)
|
|
```
|
|
|
|
Should I:
|
|
- Use a more robust regex?
|
|
- Use a proper Link header parser library (e.g., `httpx` has built-in parsing)?
|
|
- Stick with simple regex and document limitations?
|
|
|
|
**Recommendation**: Use `httpx.Headers` built-in Link header parsing if available, otherwise simple regex. Confirm?
|
|
|
|
**4b)** **Multiple Headers**: RFC 8288 allows multiple Link headers:
|
|
```
|
|
Link: <https://auth.example.com/authorize>; rel="authorization_endpoint"
|
|
Link: <https://auth.example.com/token>; rel="token_endpoint"
|
|
```
|
|
|
|
Or comma-separated in single header:
|
|
```
|
|
Link: <https://auth.example.com/authorize>; rel="authorization_endpoint", <https://auth.example.com/token>; rel="token_endpoint"
|
|
```
|
|
|
|
My regex with `re.findall()` should handle both. Confirm this is correct?
|
|
|
|
**4c)** **Priority Order**: ADR says "HTTP Link headers take precedence over HTML". But what if:
|
|
- Link header has `authorization_endpoint` but not `token_endpoint`
|
|
- HTML has both
|
|
|
|
Should I:
|
|
```python
|
|
# Option A: Once we find in Link header, stop looking
|
|
if 'token_endpoint' in link_header_endpoints:
|
|
return link_header_endpoints
|
|
else:
|
|
check_html()
|
|
|
|
# Option B: Merge Link header and HTML, Link header wins for conflicts
|
|
endpoints = html_endpoints.copy()
|
|
endpoints.update(link_header_endpoints) # Link header overwrites
|
|
```
|
|
|
|
The W3C spec says "first HTTP Link header takes precedence", which suggests **Option B** (merge and overwrite). Confirm?
|
|
|
|
---
|
|
|
|
### Question 5: URL Resolution and Validation
|
|
|
|
**From ADR-043-CORRECTED** line 217:
|
|
|
|
```python
|
|
from urllib.parse import urljoin
|
|
|
|
endpoints['token_endpoint'] = urljoin(profile_url, href)
|
|
```
|
|
|
|
**My Questions**:
|
|
|
|
**5a)** **URL Validation**: Should I validate discovered URLs? Checks:
|
|
- Must be absolute after resolution
|
|
- Must use HTTPS (in production)
|
|
- Must be valid URL format
|
|
- Hostname must be valid
|
|
- No localhost/127.0.0.1 in production (allow in dev?)
|
|
|
|
Example validation:
|
|
```python
|
|
def validate_endpoint_url(url: str, is_production: bool) -> bool:
|
|
parsed = urlparse(url)
|
|
|
|
if is_production and parsed.scheme != 'https':
|
|
raise DiscoveryError("HTTPS required in production")
|
|
|
|
if is_production and parsed.hostname in ['localhost', '127.0.0.1', '::1']:
|
|
raise DiscoveryError("localhost not allowed in production")
|
|
|
|
if not parsed.scheme or not parsed.netloc:
|
|
raise DiscoveryError("Invalid URL format")
|
|
|
|
return True
|
|
```
|
|
|
|
Is this overkill, or necessary? What validation do you want?
|
|
|
|
**5b)** **URL Normalization**: Should I normalize URLs before comparing?
|
|
```python
|
|
def normalize_url(url: str) -> str:
|
|
# Add trailing slash?
|
|
# Convert to lowercase?
|
|
# Remove default ports?
|
|
# Sort query params?
|
|
```
|
|
|
|
The current code does:
|
|
```python
|
|
# auth_external.py line 96
|
|
token_me = token_info["me"].rstrip("/")
|
|
expected_me = admin_me.rstrip("/")
|
|
```
|
|
|
|
Should endpoint URLs also be normalized? Or left as-is?
|
|
|
|
**5c)** **Relative URL Edge Cases**: What should happen with these?
|
|
|
|
```html
|
|
<!-- Relative path -->
|
|
<link rel="token_endpoint" href="/auth/token">
|
|
Result: https://admin.example.com/auth/token
|
|
|
|
<!-- Protocol-relative -->
|
|
<link rel="token_endpoint" href="//other-domain.com/token">
|
|
Result: https://other-domain.com/token (if profile was HTTPS)
|
|
|
|
<!-- No protocol -->
|
|
<link rel="token_endpoint" href="other-domain.com/token">
|
|
Result: https://admin.example.com/other-domain.com/token (broken!)
|
|
```
|
|
|
|
Python's `urljoin()` handles first two correctly. Third is ambiguous. Should I:
|
|
- Reject URLs without `://` or leading `/`?
|
|
- Try to detect and fix common mistakes?
|
|
- Document expected format and let it fail?
|
|
|
|
---
|
|
|
|
### Question 6: Error Handling and Retry Logic
|
|
|
|
**My Questions**:
|
|
|
|
**6a)** **Discovery Failures**: When endpoint discovery fails, what should happen?
|
|
|
|
Scenarios:
|
|
1. Profile URL unreachable (DNS failure, network timeout)
|
|
2. Profile URL returns 404/500
|
|
3. Profile HTML malformed (parsing fails)
|
|
4. No endpoints found in profile
|
|
5. Endpoints found but invalid URLs
|
|
|
|
For each scenario, should I:
|
|
- Return error immediately?
|
|
- Retry with backoff?
|
|
- Use cached endpoints if available (even if expired)?
|
|
- Fail open (allow access) or fail closed (deny access)?
|
|
|
|
**Recommendation**: Fail closed (deny access), use cached endpoints if available, no retries for discovery (but retries for token verification?). Confirm?
|
|
|
|
**6b)** **Token Verification Failures**: When token verification fails, what should happen?
|
|
|
|
Scenarios:
|
|
1. Token endpoint unreachable (timeout)
|
|
2. Token endpoint returns 400/401/403 (token invalid)
|
|
3. Token endpoint returns 500 (server error)
|
|
4. Token response missing required fields
|
|
5. Token 'me' doesn't match expected
|
|
|
|
For scenarios 1 and 3 (network/server errors), should I:
|
|
- Retry with backoff?
|
|
- Use cached token info if available?
|
|
- Fail immediately?
|
|
|
|
**Recommendation**: Retry up to 3 times with exponential backoff for network errors (1, 3). For invalid tokens (2, 4, 5), fail immediately. Confirm?
|
|
|
|
**6c)** **Timeout Configuration**: What timeouts should I use?
|
|
|
|
Suggested:
|
|
- Profile URL fetch: 5s (discovery is cached, so can be slow)
|
|
- Token verification: 3s (happens on every request, must be fast)
|
|
- Cache lookup: <1ms (in-memory)
|
|
|
|
Are these acceptable? Should they be configurable?
|
|
|
|
---
|
|
|
|
### Question 7: Testing Strategy
|
|
|
|
**My Questions**:
|
|
|
|
**7a)** **Mock vs Real**: Should tests:
|
|
- Mock all HTTP requests (faster, isolated)
|
|
- Hit real IndieAuth providers (slow, integration test)
|
|
- Both (unit tests mock, integration tests real)?
|
|
|
|
**Recommendation**: Unit tests mock everything, add one integration test for real IndieAuth.com. Confirm?
|
|
|
|
**7b)** **Test Fixtures**: Should I create test fixtures like:
|
|
|
|
```python
|
|
# tests/fixtures/profiles.py
|
|
PROFILE_WITH_LINK_HEADERS = {
|
|
'url': 'https://user.example.com/',
|
|
'headers': {
|
|
'Link': '<https://auth.example.com/token>; rel="token_endpoint"'
|
|
},
|
|
'expected': {'token_endpoint': 'https://auth.example.com/token'}
|
|
}
|
|
|
|
PROFILE_WITH_HTML_LINKS = {
|
|
'url': 'https://user.example.com/',
|
|
'html': '<link rel="token_endpoint" href="https://auth.example.com/token">',
|
|
'expected': {'token_endpoint': 'https://auth.example.com/token'}
|
|
}
|
|
|
|
# ... more fixtures
|
|
```
|
|
|
|
Or inline test data in test functions? Fixtures would be reusable across tests.
|
|
|
|
**7c)** **Test Coverage**: What coverage % is acceptable? Current test suite has 501 passing tests. I should aim for:
|
|
- 100% coverage of new endpoint discovery code?
|
|
- Edge cases covered (malformed HTML, network errors, etc.)?
|
|
- Integration tests for full flow?
|
|
|
|
---
|
|
|
|
### Question 8: Performance Implications
|
|
|
|
**My Questions**:
|
|
|
|
**8a)** **First Request Latency**: Without cached endpoints, first Micropub request will:
|
|
1. Fetch profile URL (HTTP GET): ~100-500ms
|
|
2. Parse HTML/headers: ~10-50ms
|
|
3. Verify token with endpoint: ~100-300ms
|
|
4. Total: ~200-850ms
|
|
|
|
Is this acceptable? User will notice delay on first post. Should I:
|
|
- Pre-warm cache on application startup?
|
|
- Show "Authenticating..." message to user?
|
|
- Accept the delay (only happens once per TTL)?
|
|
|
|
**8b)** **Cache Hit Rate**: With TTL of 3600s for endpoints and 300s for tokens:
|
|
- Endpoints discovered once per hour
|
|
- Tokens verified every 5 minutes
|
|
|
|
For active user posting frequently:
|
|
- First post: 850ms (discovery + verification)
|
|
- Posts within 5 min: <1ms (cached token)
|
|
- Posts after 5 min but within 1 hour: ~150ms (cached endpoint, verify token)
|
|
- Posts after 1 hour: 850ms again
|
|
|
|
Is this acceptable? Or should I increase token cache TTL?
|
|
|
|
**8c)** **Concurrent Requests**: If two Micropub requests arrive simultaneously with uncached token:
|
|
- Both will trigger endpoint discovery
|
|
- Race condition in cache update
|
|
|
|
Should I:
|
|
- Add locking around cache updates?
|
|
- Accept duplicate discoveries (harmless, just wasteful)?
|
|
- Use thread-safe cache implementation?
|
|
|
|
**Recommendation**: For V1 single-user CMS with low traffic, accept duplicates. Add locking in V2+ if needed.
|
|
|
|
---
|
|
|
|
### Question 9: Configuration and Deployment
|
|
|
|
**My Questions**:
|
|
|
|
**9a)** **Configuration Changes**: Current config has:
|
|
```ini
|
|
# .env (WRONG - to be removed)
|
|
TOKEN_ENDPOINT=https://tokens.indieauth.com/token
|
|
|
|
# .env (CORRECT - to be kept)
|
|
ADMIN_ME=https://admin.example.com/
|
|
```
|
|
|
|
Should I:
|
|
- Remove `TOKEN_ENDPOINT` from config.py immediately?
|
|
- Add deprecation warning if `TOKEN_ENDPOINT` is set?
|
|
- Provide migration instructions in CHANGELOG?
|
|
|
|
**9b)** **Backward Compatibility**: RC.4 was just released with `TOKEN_ENDPOINT` configuration. RC.5 will remove it. Should I:
|
|
- Provide migration script?
|
|
- Automatic migration (detect and convert)?
|
|
- Just document breaking change in CHANGELOG?
|
|
|
|
Since we're in RC phase, breaking changes are acceptable, but users might be testing. Recommendation?
|
|
|
|
**9c)** **Health Check**: Should the `/health` endpoint also check:
|
|
- Endpoint discovery working (fetch ADMIN_ME profile)?
|
|
- Token endpoint reachable?
|
|
|
|
Or is this too expensive for health checks?
|
|
|
|
---
|
|
|
|
### Question 10: Development and Testing Workflow
|
|
|
|
**My Questions**:
|
|
|
|
**10a)** **Local Development**: Developers typically use `http://localhost:5000` for SITE_URL. But IndieAuth requires HTTPS. How should developers test?
|
|
|
|
Options:
|
|
1. Allow HTTP in development mode (detect DEV_MODE=true)
|
|
2. Require ngrok/localhost.run for HTTPS tunneling
|
|
3. Use mock endpoints in dev mode
|
|
4. Accept that IndieAuth won't work locally without setup
|
|
|
|
Current `auth_external.py` doesn't have HTTPS check. Should I add it with dev mode exception?
|
|
|
|
**10b)** **Testing with Real Providers**: To test against real IndieAuth providers, I need:
|
|
- A real profile URL with IndieAuth links
|
|
- Valid tokens from that provider
|
|
|
|
Should I:
|
|
- Create test profile for integration tests?
|
|
- Document how developers can test?
|
|
- Skip real provider tests in CI (only run locally)?
|
|
|
|
---
|
|
|
|
## Implementation Readiness Assessment
|
|
|
|
### What's Clear and Ready to Implement
|
|
|
|
✅ **HTTP Link Header Parsing**: Clear algorithm, standard format
|
|
✅ **HTML Link Element Extraction**: Clear approach with BeautifulSoup4
|
|
✅ **URL Resolution**: Standard `urljoin()` from urllib.parse
|
|
✅ **Basic Caching**: In-memory dict with TTL expiry
|
|
✅ **Token Verification HTTP Request**: Standard GET with Bearer token
|
|
✅ **Response Validation**: Check for required fields (me, client_id, scope)
|
|
|
|
### What Needs Architect Clarification
|
|
|
|
⚠️ **Critical (blocks implementation)**:
|
|
- Q1: Which endpoint to verify tokens with (the "chicken-and-egg" problem)
|
|
- Q2a: Cache structure for single-user vs future multi-user
|
|
- Q3a: Add BeautifulSoup4 dependency?
|
|
|
|
⚠️ **Important (affects quality)**:
|
|
- Q5a: URL validation requirements
|
|
- Q6a: Error handling strategy (fail open vs closed)
|
|
- Q6b: Retry logic for network failures
|
|
- Q9a: Remove TOKEN_ENDPOINT config or deprecate?
|
|
|
|
⚠️ **Nice to have (can implement sensibly)**:
|
|
- Q2c: Cache invalidation triggers
|
|
- Q7a: Test strategy (mock vs real)
|
|
- Q8a: First request latency acceptable?
|
|
|
|
---
|
|
|
|
## Proposed Implementation Plan
|
|
|
|
Once questions are answered, here's my implementation approach:
|
|
|
|
### Phase 1: Core Discovery (Days 1-2)
|
|
1. Create `endpoint_discovery.py` module
|
|
- `EndpointDiscovery` class
|
|
- HTTP Link header parsing
|
|
- HTML link element extraction
|
|
- URL resolution and validation
|
|
- Error handling
|
|
|
|
2. Unit tests for discovery
|
|
- Test Link header parsing
|
|
- Test HTML parsing
|
|
- Test URL resolution
|
|
- Test error cases
|
|
|
|
### Phase 2: Token Verification Update (Day 3)
|
|
1. Update `auth_external.py`
|
|
- Integrate endpoint discovery
|
|
- Add caching layer
|
|
- Update `verify_external_token()`
|
|
- Remove hardcoded TOKEN_ENDPOINT usage
|
|
|
|
2. Unit tests for updated verification
|
|
- Test with discovered endpoints
|
|
- Test caching behavior
|
|
- Test error handling
|
|
|
|
### Phase 3: Integration and Testing (Day 4)
|
|
1. Integration tests
|
|
- Full Micropub request flow
|
|
- Cache behavior across requests
|
|
- Error scenarios
|
|
|
|
2. Update existing tests
|
|
- Fix any broken tests
|
|
- Update mocks to use discovery
|
|
|
|
### Phase 4: Configuration and Documentation (Day 5)
|
|
1. Update configuration
|
|
- Remove TOKEN_ENDPOINT from config.py
|
|
- Add deprecation warning if still set
|
|
- Update .env.example
|
|
|
|
2. Update documentation
|
|
- CHANGELOG entry for rc.5
|
|
- Migration guide if needed
|
|
- API documentation
|
|
|
|
### Phase 5: Manual Testing and Refinement (Day 6)
|
|
1. Test with real IndieAuth provider
|
|
2. Performance testing (cache effectiveness)
|
|
3. Error handling verification
|
|
4. Final refinements
|
|
|
|
**Estimated Total Time**: 5-7 days
|
|
|
|
---
|
|
|
|
## Dependencies to Add
|
|
|
|
Based on migration guide, I'll need to add:
|
|
|
|
```toml
|
|
# pyproject.toml or requirements.txt
|
|
beautifulsoup4>=4.12.0 # HTML parsing for link extraction
|
|
```
|
|
|
|
`httpx` is already a dependency (used in current auth_external.py).
|
|
|
|
---
|
|
|
|
## Risks and Concerns
|
|
|
|
### Risk 1: Breaking Change Timing
|
|
- **Issue**: RC.4 just shipped with TOKEN_ENDPOINT config
|
|
- **Impact**: Users testing RC.4 will need to reconfigure for RC.5
|
|
- **Mitigation**: Clear migration notes in CHANGELOG, consider grace period
|
|
|
|
### Risk 2: Performance Degradation
|
|
- **Issue**: First request will be slower (800ms vs <100ms cached)
|
|
- **Impact**: User experience on first post after restart/cache expiry
|
|
- **Mitigation**: Document expected behavior, consider pre-warming cache
|
|
|
|
### Risk 3: External Dependency
|
|
- **Issue**: StarPunk now depends on external profile URL availability
|
|
- **Impact**: If profile URL is down, Micropub stops working
|
|
- **Mitigation**: Cache endpoints for longer TTL, fail gracefully with clear errors
|
|
|
|
### Risk 4: Testing Complexity
|
|
- **Issue**: More moving parts to test (HTTP, HTML parsing, caching)
|
|
- **Impact**: More test code, more mocking, more edge cases
|
|
- **Mitigation**: Good test fixtures, clear test organization
|
|
|
|
---
|
|
|
|
## Recommended Next Steps
|
|
|
|
1. **Architect reviews this report** and answers questions
|
|
2. **I create test fixtures** based on ADR examples
|
|
3. **I implement Phase 1** (core discovery) with tests
|
|
4. **Checkpoint review** - verify discovery working correctly
|
|
5. **I implement Phase 2** (integration with token verification)
|
|
6. **Checkpoint review** - verify end-to-end flow
|
|
7. **I implement Phase 3-5** (tests, config, docs)
|
|
8. **Final review** before merge
|
|
|
|
---
|
|
|
|
## Questions Summary (Quick Reference)
|
|
|
|
**Critical** (must answer before coding):
|
|
1. Q1: Which endpoint to verify tokens with? Proposed: Use ADMIN_ME profile for single-user StarPunk
|
|
2. Q2a: Cache structure for single-user vs multi-user?
|
|
3. Q3a: Add BeautifulSoup4 dependency?
|
|
|
|
**Important** (affects implementation quality):
|
|
4. Q5a: URL validation requirements?
|
|
5. Q6a: Error handling strategy (fail open/closed)?
|
|
6. Q6b: Retry logic for network failures?
|
|
7. Q9a: Remove or deprecate TOKEN_ENDPOINT config?
|
|
|
|
**Can implement sensibly** (but prefer guidance):
|
|
8. Q2c: Cache invalidation triggers?
|
|
9. Q7a: Test strategy (mock vs real)?
|
|
10. Q8a: First request latency acceptable?
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
The architect's corrected design is sound and properly implements IndieAuth endpoint discovery per the W3C specification. The primary blocker is clarifying the "which endpoint?" question for token verification in a single-user CMS context.
|
|
|
|
My proposed solution (always use ADMIN_ME profile for endpoint discovery) seems correct for StarPunk's single-user model, but I need architect confirmation before proceeding.
|
|
|
|
Once questions are answered, I'm ready to implement with high confidence. The code will be clean, tested, and follow the specifications exactly.
|
|
|
|
**Status**: ⏸️ **Waiting for Architect Review**
|
|
|
|
---
|
|
|
|
**Document Version**: 1.0
|
|
**Created**: 2025-11-24
|
|
**Author**: StarPunk Fullstack Developer
|
|
**Next Review**: After architect responds to questions
|