Files
StarPunk/docs/design/v1.0.0/indieauth-endpoint-discovery.md
Phil Skentelbery f10d0679da feat(tags): Add database schema and tags module (v1.3.0 Phase 1)
Implements tag/category system backend following microformats2 p-category specification.

Database changes:
- Migration 008: Add tags and note_tags tables
- Normalized tag storage (case-insensitive lookup, display name preserved)
- Indexes for performance

New module:
- starpunk/tags.py: Tag management functions
  - normalize_tag: Normalize tag strings
  - get_or_create_tag: Get or create tag records
  - add_tags_to_note: Associate tags with notes (replaces existing)
  - get_note_tags: Retrieve note tags (alphabetically ordered)
  - get_tag_by_name: Lookup tag by normalized name
  - get_notes_by_tag: Get all notes with specific tag
  - parse_tag_input: Parse comma-separated tag input

Model updates:
- Note.tags property (lazy-loaded, prefer pre-loading in routes)
- Note.to_dict() add include_tags parameter

CRUD updates:
- create_note() accepts tags parameter
- update_note() accepts tags parameter (None = no change, [] = remove all)

Micropub integration:
- Pass tags to create_note() (tags already extracted by extract_tags())
- Return tags in q=source response

Per design doc: docs/design/v1.3.0/microformats-tags-design.md

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-10 11:24:23 -07:00

444 lines
12 KiB
Markdown

# IndieAuth Endpoint Discovery Architecture
## Overview
This document details the CORRECT implementation of IndieAuth endpoint discovery for StarPunk. This corrects a fundamental misunderstanding where endpoints were incorrectly hardcoded instead of being discovered dynamically.
## Core Principle
**Endpoints are NEVER hardcoded. They are ALWAYS discovered from the user's profile URL.**
## Discovery Process
### Step 1: Profile URL Fetching
When discovering endpoints for a user (e.g., `https://alice.example.com/`):
```
GET https://alice.example.com/ HTTP/1.1
Accept: text/html
User-Agent: StarPunk/1.0
```
### Step 2: Endpoint Extraction
Check in priority order:
#### 2.1 HTTP Link Headers (Highest Priority)
```
Link: <https://auth.example.com/authorize>; rel="authorization_endpoint",
<https://auth.example.com/token>; rel="token_endpoint"
```
#### 2.2 HTML Link Elements
```html
<link rel="authorization_endpoint" href="https://auth.example.com/authorize">
<link rel="token_endpoint" href="https://auth.example.com/token">
```
#### 2.3 IndieAuth Metadata (Optional)
```html
<link rel="indieauth-metadata" href="https://auth.example.com/.well-known/indieauth-metadata">
```
### Step 3: URL Resolution
All discovered URLs must be resolved relative to the profile URL:
- Absolute URL: Use as-is
- Relative URL: Resolve against profile URL
- Protocol-relative: Inherit profile URL protocol
## Token Verification Architecture
### The Problem
When Micropub receives a token, it needs to verify it. But with which endpoint?
### The Solution
```
┌─────────────────┐
│ Micropub Request│
│ Bearer: xxxxx │
└────────┬────────┘
┌─────────────────┐
│ Extract Token │
└────────┬────────┘
┌─────────────────────────┐
│ Determine User Identity │
│ (from token or cache) │
└────────┬────────────────┘
┌──────────────────────┐
│ Discover Endpoints │
│ from User Profile │
└────────┬─────────────┘
┌──────────────────────┐
│ Verify with │
│ Discovered Endpoint │
└────────┬─────────────┘
┌──────────────────────┐
│ Validate Response │
│ - Check 'me' URL │
│ - Check scopes │
└──────────────────────┘
```
## Implementation Components
### 1. Endpoint Discovery Module
```python
class EndpointDiscovery:
"""
Discovers IndieAuth endpoints from profile URLs
"""
def discover(self, profile_url: str) -> Dict[str, str]:
"""
Discover endpoints from a profile URL
Returns:
{
'authorization_endpoint': 'https://...',
'token_endpoint': 'https://...',
'indieauth_metadata': 'https://...' # optional
}
"""
def parse_link_header(self, header: str) -> Dict[str, str]:
"""Parse HTTP Link header for endpoints"""
def extract_from_html(self, html: str, base_url: str) -> Dict[str, str]:
"""Extract endpoints from HTML link elements"""
def resolve_url(self, url: str, base: str) -> str:
"""Resolve potentially relative URL against base"""
```
### 2. Token Verification Module
```python
class TokenVerifier:
"""
Verifies tokens using discovered endpoints
"""
def __init__(self, discovery: EndpointDiscovery, cache: EndpointCache):
self.discovery = discovery
self.cache = cache
def verify(self, token: str, expected_me: str = None) -> TokenInfo:
"""
Verify a token using endpoint discovery
Args:
token: The bearer token to verify
expected_me: Optional expected 'me' URL
Returns:
TokenInfo with 'me', 'scope', 'client_id', etc.
"""
def introspect_token(self, token: str, endpoint: str) -> dict:
"""Call token endpoint to verify token"""
```
### 3. Caching Layer
```python
class EndpointCache:
"""
Caches discovered endpoints for performance
"""
def __init__(self, ttl: int = 3600):
self.endpoint_cache = {} # profile_url -> (endpoints, expiry)
self.token_cache = {} # token_hash -> (info, expiry)
self.ttl = ttl
def get_endpoints(self, profile_url: str) -> Optional[Dict[str, str]]:
"""Get cached endpoints if still valid"""
def store_endpoints(self, profile_url: str, endpoints: Dict[str, str]):
"""Cache discovered endpoints"""
def get_token_info(self, token_hash: str) -> Optional[TokenInfo]:
"""Get cached token verification if still valid"""
def store_token_info(self, token_hash: str, info: TokenInfo):
"""Cache token verification result"""
```
## Error Handling
### Discovery Failures
| Error | Cause | Response |
|-------|-------|----------|
| ProfileUnreachableError | Can't fetch profile URL | 503 Service Unavailable |
| NoEndpointsFoundError | No endpoints in profile | 400 Bad Request |
| InvalidEndpointError | Malformed endpoint URL | 500 Internal Server Error |
| TimeoutError | Discovery timeout | 504 Gateway Timeout |
### Verification Failures
| Error | Cause | Response |
|-------|-------|----------|
| TokenInvalidError | Token rejected by endpoint | 403 Forbidden |
| EndpointUnreachableError | Can't reach token endpoint | 503 Service Unavailable |
| ScopeMismatchError | Token lacks required scope | 403 Forbidden |
| MeMismatchError | Token 'me' doesn't match expected | 403 Forbidden |
## Security Considerations
### 1. HTTPS Enforcement
- Profile URLs SHOULD use HTTPS
- Discovered endpoints MUST use HTTPS
- Reject non-HTTPS endpoints in production
### 2. Redirect Limits
- Maximum 5 redirects when fetching profiles
- Prevent redirect loops
- Log suspicious redirect patterns
### 3. Cache Poisoning Prevention
- Validate discovered URLs are well-formed
- Don't cache error responses
- Clear cache on configuration changes
### 4. Token Security
- Never log tokens in plaintext
- Hash tokens before caching
- Use constant-time comparison for token hashes
## Performance Optimization
### Caching Strategy
```
┌─────────────────────────────────────┐
│ First Request │
│ Discovery: ~500ms │
│ Verification: ~200ms │
│ Total: ~700ms │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Subsequent Requests │
│ Cached Endpoints: ~1ms │
│ Cached Token: ~1ms │
│ Total: ~2ms │
└─────────────────────────────────────┘
```
### Cache Configuration
```ini
# Endpoint cache (user rarely changes provider)
ENDPOINT_CACHE_TTL=3600 # 1 hour
# Token cache (balance security and performance)
TOKEN_CACHE_TTL=300 # 5 minutes
# Cache sizes
MAX_ENDPOINT_CACHE_SIZE=1000
MAX_TOKEN_CACHE_SIZE=10000
```
## Migration Path
### From Incorrect Hardcoded Implementation
1. Remove hardcoded endpoint configuration
2. Implement discovery module
3. Update token verification to use discovery
4. Add caching layer
5. Update documentation
### Configuration Changes
Before (WRONG):
```ini
TOKEN_ENDPOINT=https://tokens.indieauth.com/token
AUTHORIZATION_ENDPOINT=https://indieauth.com/auth
```
After (CORRECT):
```ini
ADMIN_ME=https://admin.example.com/
# Endpoints discovered automatically from ADMIN_ME
```
## Testing Strategy
### Unit Tests
1. **Discovery Tests**
- Parse various Link header formats
- Extract from different HTML structures
- Handle malformed responses
- URL resolution edge cases
2. **Cache Tests**
- TTL expiration
- Cache invalidation
- Size limits
- Concurrent access
3. **Security Tests**
- HTTPS enforcement
- Redirect limit enforcement
- Cache poisoning attempts
### Integration Tests
1. **Real Provider Tests**
- Test against indieauth.com
- Test against indie-auth.com
- Test against self-hosted providers
2. **Network Condition Tests**
- Slow responses
- Timeouts
- Connection failures
- Partial responses
### End-to-End Tests
1. **Full Flow Tests**
- Discovery → Verification → Caching
- Multiple users with different providers
- Provider switching scenarios
## Monitoring and Debugging
### Metrics to Track
- Discovery success/failure rate
- Average discovery latency
- Cache hit ratio
- Token verification latency
- Endpoint availability
### Debug Logging
```python
# Discovery
DEBUG: Fetching profile URL: https://alice.example.com/
DEBUG: Found Link header: <https://auth.alice.net/token>; rel="token_endpoint"
DEBUG: Discovered token endpoint: https://auth.alice.net/token
# Verification
DEBUG: Verifying token for claimed identity: https://alice.example.com/
DEBUG: Using cached endpoint: https://auth.alice.net/token
DEBUG: Token verification successful, scopes: ['create', 'update']
# Caching
DEBUG: Caching endpoints for https://alice.example.com/ (TTL: 3600s)
DEBUG: Token verification cached (TTL: 300s)
```
## Common Issues and Solutions
### Issue 1: No Endpoints Found
**Symptom**: "No token endpoint found for user"
**Causes**:
- User hasn't set up IndieAuth on their profile
- Profile URL returns wrong Content-Type
- Link elements have typos
**Solution**:
- Provide clear error message
- Link to IndieAuth setup documentation
- Log details for debugging
### Issue 2: Verification Timeouts
**Symptom**: "Authorization server is unreachable"
**Causes**:
- Auth server is down
- Network issues
- Firewall blocking requests
**Solution**:
- Implement retries with backoff
- Cache successful verifications
- Provide status page for auth server health
### Issue 3: Cache Invalidation
**Symptom**: User changed provider but old one still used
**Causes**:
- Endpoints still cached
- TTL too long
**Solution**:
- Provide manual cache clear option
- Reduce TTL if needed
- Clear cache on errors
## Appendix: Example Discoveries
### Example 1: IndieAuth.com User
```html
<!-- https://user.example.com/ -->
<link rel="authorization_endpoint" href="https://indieauth.com/auth">
<link rel="token_endpoint" href="https://tokens.indieauth.com/token">
```
### Example 2: Self-Hosted
```html
<!-- https://alice.example.com/ -->
<link rel="authorization_endpoint" href="https://alice.example.com/auth">
<link rel="token_endpoint" href="https://alice.example.com/token">
```
### Example 3: Link Headers
```
HTTP/1.1 200 OK
Link: <https://auth.provider.com/authorize>; rel="authorization_endpoint",
<https://auth.provider.com/token>; rel="token_endpoint"
Content-Type: text/html
<!-- No link elements needed in HTML -->
```
### Example 4: Relative URLs
```html
<!-- https://bob.example.org/ -->
<link rel="authorization_endpoint" href="/auth/authorize">
<link rel="token_endpoint" href="/auth/token">
<!-- Resolves to https://bob.example.org/auth/authorize -->
<!-- Resolves to https://bob.example.org/auth/token -->
```
---
**Document Version**: 1.0
**Created**: 2024-11-24
**Purpose**: Correct implementation of IndieAuth endpoint discovery
**Status**: Authoritative guide for implementation