Implements tag/category system backend following microformats2 p-category specification. Database changes: - Migration 008: Add tags and note_tags tables - Normalized tag storage (case-insensitive lookup, display name preserved) - Indexes for performance New module: - starpunk/tags.py: Tag management functions - normalize_tag: Normalize tag strings - get_or_create_tag: Get or create tag records - add_tags_to_note: Associate tags with notes (replaces existing) - get_note_tags: Retrieve note tags (alphabetically ordered) - get_tag_by_name: Lookup tag by normalized name - get_notes_by_tag: Get all notes with specific tag - parse_tag_input: Parse comma-separated tag input Model updates: - Note.tags property (lazy-loaded, prefer pre-loading in routes) - Note.to_dict() add include_tags parameter CRUD updates: - create_note() accepts tags parameter - update_note() accepts tags parameter (None = no change, [] = remove all) Micropub integration: - Pass tags to create_note() (tags already extracted by extract_tags()) - Return tags in q=source response Per design doc: docs/design/v1.3.0/microformats-tags-design.md Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
444 lines
12 KiB
Markdown
444 lines
12 KiB
Markdown
# IndieAuth Endpoint Discovery Architecture
|
|
|
|
## Overview
|
|
|
|
This document details the CORRECT implementation of IndieAuth endpoint discovery for StarPunk. This corrects a fundamental misunderstanding where endpoints were incorrectly hardcoded instead of being discovered dynamically.
|
|
|
|
## Core Principle
|
|
|
|
**Endpoints are NEVER hardcoded. They are ALWAYS discovered from the user's profile URL.**
|
|
|
|
## Discovery Process
|
|
|
|
### Step 1: Profile URL Fetching
|
|
|
|
When discovering endpoints for a user (e.g., `https://alice.example.com/`):
|
|
|
|
```
|
|
GET https://alice.example.com/ HTTP/1.1
|
|
Accept: text/html
|
|
User-Agent: StarPunk/1.0
|
|
```
|
|
|
|
### Step 2: Endpoint Extraction
|
|
|
|
Check in priority order:
|
|
|
|
#### 2.1 HTTP Link Headers (Highest Priority)
|
|
```
|
|
Link: <https://auth.example.com/authorize>; rel="authorization_endpoint",
|
|
<https://auth.example.com/token>; rel="token_endpoint"
|
|
```
|
|
|
|
#### 2.2 HTML Link Elements
|
|
```html
|
|
<link rel="authorization_endpoint" href="https://auth.example.com/authorize">
|
|
<link rel="token_endpoint" href="https://auth.example.com/token">
|
|
```
|
|
|
|
#### 2.3 IndieAuth Metadata (Optional)
|
|
```html
|
|
<link rel="indieauth-metadata" href="https://auth.example.com/.well-known/indieauth-metadata">
|
|
```
|
|
|
|
### Step 3: URL Resolution
|
|
|
|
All discovered URLs must be resolved relative to the profile URL:
|
|
|
|
- Absolute URL: Use as-is
|
|
- Relative URL: Resolve against profile URL
|
|
- Protocol-relative: Inherit profile URL protocol
|
|
|
|
## Token Verification Architecture
|
|
|
|
### The Problem
|
|
|
|
When Micropub receives a token, it needs to verify it. But with which endpoint?
|
|
|
|
### The Solution
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ Micropub Request│
|
|
│ Bearer: xxxxx │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Extract Token │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────┐
|
|
│ Determine User Identity │
|
|
│ (from token or cache) │
|
|
└────────┬────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────┐
|
|
│ Discover Endpoints │
|
|
│ from User Profile │
|
|
└────────┬─────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────┐
|
|
│ Verify with │
|
|
│ Discovered Endpoint │
|
|
└────────┬─────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────┐
|
|
│ Validate Response │
|
|
│ - Check 'me' URL │
|
|
│ - Check scopes │
|
|
└──────────────────────┘
|
|
```
|
|
|
|
## Implementation Components
|
|
|
|
### 1. Endpoint Discovery Module
|
|
|
|
```python
|
|
class EndpointDiscovery:
|
|
"""
|
|
Discovers IndieAuth endpoints from profile URLs
|
|
"""
|
|
|
|
def discover(self, profile_url: str) -> Dict[str, str]:
|
|
"""
|
|
Discover endpoints from a profile URL
|
|
|
|
Returns:
|
|
{
|
|
'authorization_endpoint': 'https://...',
|
|
'token_endpoint': 'https://...',
|
|
'indieauth_metadata': 'https://...' # optional
|
|
}
|
|
"""
|
|
|
|
def parse_link_header(self, header: str) -> Dict[str, str]:
|
|
"""Parse HTTP Link header for endpoints"""
|
|
|
|
def extract_from_html(self, html: str, base_url: str) -> Dict[str, str]:
|
|
"""Extract endpoints from HTML link elements"""
|
|
|
|
def resolve_url(self, url: str, base: str) -> str:
|
|
"""Resolve potentially relative URL against base"""
|
|
```
|
|
|
|
### 2. Token Verification Module
|
|
|
|
```python
|
|
class TokenVerifier:
|
|
"""
|
|
Verifies tokens using discovered endpoints
|
|
"""
|
|
|
|
def __init__(self, discovery: EndpointDiscovery, cache: EndpointCache):
|
|
self.discovery = discovery
|
|
self.cache = cache
|
|
|
|
def verify(self, token: str, expected_me: str = None) -> TokenInfo:
|
|
"""
|
|
Verify a token using endpoint discovery
|
|
|
|
Args:
|
|
token: The bearer token to verify
|
|
expected_me: Optional expected 'me' URL
|
|
|
|
Returns:
|
|
TokenInfo with 'me', 'scope', 'client_id', etc.
|
|
"""
|
|
|
|
def introspect_token(self, token: str, endpoint: str) -> dict:
|
|
"""Call token endpoint to verify token"""
|
|
```
|
|
|
|
### 3. Caching Layer
|
|
|
|
```python
|
|
class EndpointCache:
|
|
"""
|
|
Caches discovered endpoints for performance
|
|
"""
|
|
|
|
def __init__(self, ttl: int = 3600):
|
|
self.endpoint_cache = {} # profile_url -> (endpoints, expiry)
|
|
self.token_cache = {} # token_hash -> (info, expiry)
|
|
self.ttl = ttl
|
|
|
|
def get_endpoints(self, profile_url: str) -> Optional[Dict[str, str]]:
|
|
"""Get cached endpoints if still valid"""
|
|
|
|
def store_endpoints(self, profile_url: str, endpoints: Dict[str, str]):
|
|
"""Cache discovered endpoints"""
|
|
|
|
def get_token_info(self, token_hash: str) -> Optional[TokenInfo]:
|
|
"""Get cached token verification if still valid"""
|
|
|
|
def store_token_info(self, token_hash: str, info: TokenInfo):
|
|
"""Cache token verification result"""
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Discovery Failures
|
|
|
|
| Error | Cause | Response |
|
|
|-------|-------|----------|
|
|
| ProfileUnreachableError | Can't fetch profile URL | 503 Service Unavailable |
|
|
| NoEndpointsFoundError | No endpoints in profile | 400 Bad Request |
|
|
| InvalidEndpointError | Malformed endpoint URL | 500 Internal Server Error |
|
|
| TimeoutError | Discovery timeout | 504 Gateway Timeout |
|
|
|
|
### Verification Failures
|
|
|
|
| Error | Cause | Response |
|
|
|-------|-------|----------|
|
|
| TokenInvalidError | Token rejected by endpoint | 403 Forbidden |
|
|
| EndpointUnreachableError | Can't reach token endpoint | 503 Service Unavailable |
|
|
| ScopeMismatchError | Token lacks required scope | 403 Forbidden |
|
|
| MeMismatchError | Token 'me' doesn't match expected | 403 Forbidden |
|
|
|
|
## Security Considerations
|
|
|
|
### 1. HTTPS Enforcement
|
|
|
|
- Profile URLs SHOULD use HTTPS
|
|
- Discovered endpoints MUST use HTTPS
|
|
- Reject non-HTTPS endpoints in production
|
|
|
|
### 2. Redirect Limits
|
|
|
|
- Maximum 5 redirects when fetching profiles
|
|
- Prevent redirect loops
|
|
- Log suspicious redirect patterns
|
|
|
|
### 3. Cache Poisoning Prevention
|
|
|
|
- Validate discovered URLs are well-formed
|
|
- Don't cache error responses
|
|
- Clear cache on configuration changes
|
|
|
|
### 4. Token Security
|
|
|
|
- Never log tokens in plaintext
|
|
- Hash tokens before caching
|
|
- Use constant-time comparison for token hashes
|
|
|
|
## Performance Optimization
|
|
|
|
### Caching Strategy
|
|
|
|
```
|
|
┌─────────────────────────────────────┐
|
|
│ First Request │
|
|
│ Discovery: ~500ms │
|
|
│ Verification: ~200ms │
|
|
│ Total: ~700ms │
|
|
└─────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────┐
|
|
│ Subsequent Requests │
|
|
│ Cached Endpoints: ~1ms │
|
|
│ Cached Token: ~1ms │
|
|
│ Total: ~2ms │
|
|
└─────────────────────────────────────┘
|
|
```
|
|
|
|
### Cache Configuration
|
|
|
|
```ini
|
|
# Endpoint cache (user rarely changes provider)
|
|
ENDPOINT_CACHE_TTL=3600 # 1 hour
|
|
|
|
# Token cache (balance security and performance)
|
|
TOKEN_CACHE_TTL=300 # 5 minutes
|
|
|
|
# Cache sizes
|
|
MAX_ENDPOINT_CACHE_SIZE=1000
|
|
MAX_TOKEN_CACHE_SIZE=10000
|
|
```
|
|
|
|
## Migration Path
|
|
|
|
### From Incorrect Hardcoded Implementation
|
|
|
|
1. Remove hardcoded endpoint configuration
|
|
2. Implement discovery module
|
|
3. Update token verification to use discovery
|
|
4. Add caching layer
|
|
5. Update documentation
|
|
|
|
### Configuration Changes
|
|
|
|
Before (WRONG):
|
|
```ini
|
|
TOKEN_ENDPOINT=https://tokens.indieauth.com/token
|
|
AUTHORIZATION_ENDPOINT=https://indieauth.com/auth
|
|
```
|
|
|
|
After (CORRECT):
|
|
```ini
|
|
ADMIN_ME=https://admin.example.com/
|
|
# Endpoints discovered automatically from ADMIN_ME
|
|
```
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
|
|
1. **Discovery Tests**
|
|
- Parse various Link header formats
|
|
- Extract from different HTML structures
|
|
- Handle malformed responses
|
|
- URL resolution edge cases
|
|
|
|
2. **Cache Tests**
|
|
- TTL expiration
|
|
- Cache invalidation
|
|
- Size limits
|
|
- Concurrent access
|
|
|
|
3. **Security Tests**
|
|
- HTTPS enforcement
|
|
- Redirect limit enforcement
|
|
- Cache poisoning attempts
|
|
|
|
### Integration Tests
|
|
|
|
1. **Real Provider Tests**
|
|
- Test against indieauth.com
|
|
- Test against indie-auth.com
|
|
- Test against self-hosted providers
|
|
|
|
2. **Network Condition Tests**
|
|
- Slow responses
|
|
- Timeouts
|
|
- Connection failures
|
|
- Partial responses
|
|
|
|
### End-to-End Tests
|
|
|
|
1. **Full Flow Tests**
|
|
- Discovery → Verification → Caching
|
|
- Multiple users with different providers
|
|
- Provider switching scenarios
|
|
|
|
## Monitoring and Debugging
|
|
|
|
### Metrics to Track
|
|
|
|
- Discovery success/failure rate
|
|
- Average discovery latency
|
|
- Cache hit ratio
|
|
- Token verification latency
|
|
- Endpoint availability
|
|
|
|
### Debug Logging
|
|
|
|
```python
|
|
# Discovery
|
|
DEBUG: Fetching profile URL: https://alice.example.com/
|
|
DEBUG: Found Link header: <https://auth.alice.net/token>; rel="token_endpoint"
|
|
DEBUG: Discovered token endpoint: https://auth.alice.net/token
|
|
|
|
# Verification
|
|
DEBUG: Verifying token for claimed identity: https://alice.example.com/
|
|
DEBUG: Using cached endpoint: https://auth.alice.net/token
|
|
DEBUG: Token verification successful, scopes: ['create', 'update']
|
|
|
|
# Caching
|
|
DEBUG: Caching endpoints for https://alice.example.com/ (TTL: 3600s)
|
|
DEBUG: Token verification cached (TTL: 300s)
|
|
```
|
|
|
|
## Common Issues and Solutions
|
|
|
|
### Issue 1: No Endpoints Found
|
|
|
|
**Symptom**: "No token endpoint found for user"
|
|
|
|
**Causes**:
|
|
- User hasn't set up IndieAuth on their profile
|
|
- Profile URL returns wrong Content-Type
|
|
- Link elements have typos
|
|
|
|
**Solution**:
|
|
- Provide clear error message
|
|
- Link to IndieAuth setup documentation
|
|
- Log details for debugging
|
|
|
|
### Issue 2: Verification Timeouts
|
|
|
|
**Symptom**: "Authorization server is unreachable"
|
|
|
|
**Causes**:
|
|
- Auth server is down
|
|
- Network issues
|
|
- Firewall blocking requests
|
|
|
|
**Solution**:
|
|
- Implement retries with backoff
|
|
- Cache successful verifications
|
|
- Provide status page for auth server health
|
|
|
|
### Issue 3: Cache Invalidation
|
|
|
|
**Symptom**: User changed provider but old one still used
|
|
|
|
**Causes**:
|
|
- Endpoints still cached
|
|
- TTL too long
|
|
|
|
**Solution**:
|
|
- Provide manual cache clear option
|
|
- Reduce TTL if needed
|
|
- Clear cache on errors
|
|
|
|
## Appendix: Example Discoveries
|
|
|
|
### Example 1: IndieAuth.com User
|
|
|
|
```html
|
|
<!-- https://user.example.com/ -->
|
|
<link rel="authorization_endpoint" href="https://indieauth.com/auth">
|
|
<link rel="token_endpoint" href="https://tokens.indieauth.com/token">
|
|
```
|
|
|
|
### Example 2: Self-Hosted
|
|
|
|
```html
|
|
<!-- https://alice.example.com/ -->
|
|
<link rel="authorization_endpoint" href="https://alice.example.com/auth">
|
|
<link rel="token_endpoint" href="https://alice.example.com/token">
|
|
```
|
|
|
|
### Example 3: Link Headers
|
|
|
|
```
|
|
HTTP/1.1 200 OK
|
|
Link: <https://auth.provider.com/authorize>; rel="authorization_endpoint",
|
|
<https://auth.provider.com/token>; rel="token_endpoint"
|
|
Content-Type: text/html
|
|
|
|
<!-- No link elements needed in HTML -->
|
|
```
|
|
|
|
### Example 4: Relative URLs
|
|
|
|
```html
|
|
<!-- https://bob.example.org/ -->
|
|
<link rel="authorization_endpoint" href="/auth/authorize">
|
|
<link rel="token_endpoint" href="/auth/token">
|
|
<!-- Resolves to https://bob.example.org/auth/authorize -->
|
|
<!-- Resolves to https://bob.example.org/auth/token -->
|
|
```
|
|
|
|
---
|
|
|
|
**Document Version**: 1.0
|
|
**Created**: 2024-11-24
|
|
**Purpose**: Correct implementation of IndieAuth endpoint discovery
|
|
**Status**: Authoritative guide for implementation |