feat(tags): Add database schema and tags module (v1.3.0 Phase 1)
Implements tag/category system backend following microformats2 p-category specification. Database changes: - Migration 008: Add tags and note_tags tables - Normalized tag storage (case-insensitive lookup, display name preserved) - Indexes for performance New module: - starpunk/tags.py: Tag management functions - normalize_tag: Normalize tag strings - get_or_create_tag: Get or create tag records - add_tags_to_note: Associate tags with notes (replaces existing) - get_note_tags: Retrieve note tags (alphabetically ordered) - get_tag_by_name: Lookup tag by normalized name - get_notes_by_tag: Get all notes with specific tag - parse_tag_input: Parse comma-separated tag input Model updates: - Note.tags property (lazy-loaded, prefer pre-loading in routes) - Note.to_dict() add include_tags parameter CRUD updates: - create_note() accepts tags parameter - update_note() accepts tags parameter (None = no change, [] = remove all) Micropub integration: - Pass tags to create_note() (tags already extracted by extract_tags()) - Return tags in q=source response Per design doc: docs/design/v1.3.0/microformats-tags-design.md Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
397
docs/design/v1.0.0/indieauth-endpoint-discovery-security.md
Normal file
397
docs/design/v1.0.0/indieauth-endpoint-discovery-security.md
Normal file
@@ -0,0 +1,397 @@
|
||||
# IndieAuth Endpoint Discovery Security Analysis
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document analyzes the security implications of implementing IndieAuth endpoint discovery correctly, contrasting it with the fundamentally flawed approach of hardcoding endpoints.
|
||||
|
||||
## The Critical Error: Hardcoded Endpoints
|
||||
|
||||
### What Was Wrong
|
||||
|
||||
```ini
|
||||
# FATALLY FLAWED - Breaks IndieAuth completely
|
||||
TOKEN_ENDPOINT=https://tokens.indieauth.com/token
|
||||
```
|
||||
|
||||
### Why It's a Security Disaster
|
||||
|
||||
1. **Single Point of Failure**: If the hardcoded endpoint is compromised, ALL users are affected
|
||||
2. **No User Control**: Users cannot change providers if security issues arise
|
||||
3. **Trust Concentration**: Forces all users to trust a single provider
|
||||
4. **Not IndieAuth**: This isn't IndieAuth at all - it's just OAuth with extra steps
|
||||
5. **Violates User Sovereignty**: Users don't control their own authentication
|
||||
|
||||
## The Correct Approach: Dynamic Discovery
|
||||
|
||||
### Security Model
|
||||
|
||||
```
|
||||
User Identity URL → Endpoint Discovery → Provider Verification
|
||||
(User Controls) (Dynamic) (User's Choice)
|
||||
```
|
||||
|
||||
### Security Benefits
|
||||
|
||||
1. **Distributed Trust**: No single provider compromise affects all users
|
||||
2. **User Control**: Users can switch providers instantly if needed
|
||||
3. **Provider Independence**: Each user's security is independent
|
||||
4. **Immediate Revocation**: Users can revoke by changing profile links
|
||||
5. **True Decentralization**: No central authority
|
||||
|
||||
## Threat Analysis
|
||||
|
||||
### Threat 1: Profile URL Hijacking
|
||||
|
||||
**Attack Vector**: Attacker gains control of user's profile URL
|
||||
|
||||
**Impact**: Can redirect authentication to attacker's endpoints
|
||||
|
||||
**Mitigations**:
|
||||
- Profile URL must use HTTPS
|
||||
- Verify SSL certificates
|
||||
- Monitor for unexpected endpoint changes
|
||||
- Cache endpoints with reasonable TTL
|
||||
|
||||
### Threat 2: Endpoint Discovery Manipulation
|
||||
|
||||
**Attack Vector**: MITM attack during endpoint discovery
|
||||
|
||||
**Impact**: Could redirect to malicious endpoints
|
||||
|
||||
**Mitigations**:
|
||||
```python
|
||||
def discover_endpoints(profile_url: str) -> dict:
|
||||
# CRITICAL: Enforce HTTPS
|
||||
if not profile_url.startswith('https://'):
|
||||
raise SecurityError("Profile URL must use HTTPS")
|
||||
|
||||
# Verify SSL certificates
|
||||
response = requests.get(
|
||||
profile_url,
|
||||
verify=True, # Enforce certificate validation
|
||||
timeout=5
|
||||
)
|
||||
|
||||
# Validate discovered endpoints
|
||||
endpoints = extract_endpoints(response)
|
||||
for endpoint_url in endpoints.values():
|
||||
if not endpoint_url.startswith('https://'):
|
||||
raise SecurityError(f"Endpoint must use HTTPS: {endpoint_url}")
|
||||
|
||||
return endpoints
|
||||
```
|
||||
|
||||
### Threat 3: Cache Poisoning
|
||||
|
||||
**Attack Vector**: Attacker poisons endpoint cache with malicious URLs
|
||||
|
||||
**Impact**: Subsequent requests use attacker's endpoints
|
||||
|
||||
**Mitigations**:
|
||||
```python
|
||||
class SecureEndpointCache:
|
||||
def store_endpoints(self, profile_url: str, endpoints: dict):
|
||||
# Validate before caching
|
||||
self._validate_profile_url(profile_url)
|
||||
self._validate_endpoints(endpoints)
|
||||
|
||||
# Store with integrity check
|
||||
cache_entry = {
|
||||
'endpoints': endpoints,
|
||||
'stored_at': time.time(),
|
||||
'checksum': self._calculate_checksum(endpoints)
|
||||
}
|
||||
self.cache[profile_url] = cache_entry
|
||||
|
||||
def get_endpoints(self, profile_url: str) -> dict:
|
||||
entry = self.cache.get(profile_url)
|
||||
if entry:
|
||||
# Verify integrity
|
||||
if self._calculate_checksum(entry['endpoints']) != entry['checksum']:
|
||||
# Cache corruption detected
|
||||
del self.cache[profile_url]
|
||||
raise SecurityError("Cache integrity check failed")
|
||||
return entry['endpoints']
|
||||
```
|
||||
|
||||
### Threat 4: Redirect Attacks
|
||||
|
||||
**Attack Vector**: Malicious redirects during discovery
|
||||
|
||||
**Impact**: Could redirect to attacker-controlled endpoints
|
||||
|
||||
**Mitigations**:
|
||||
```python
|
||||
def fetch_with_redirect_limit(url: str, max_redirects: int = 5):
|
||||
redirect_count = 0
|
||||
visited = set()
|
||||
|
||||
while redirect_count < max_redirects:
|
||||
if url in visited:
|
||||
raise SecurityError("Redirect loop detected")
|
||||
visited.add(url)
|
||||
|
||||
response = requests.get(url, allow_redirects=False)
|
||||
|
||||
if response.status_code in (301, 302, 303, 307, 308):
|
||||
redirect_url = response.headers.get('Location')
|
||||
|
||||
# Validate redirect target
|
||||
if not redirect_url.startswith('https://'):
|
||||
raise SecurityError("Redirect to non-HTTPS URL blocked")
|
||||
|
||||
url = redirect_url
|
||||
redirect_count += 1
|
||||
else:
|
||||
return response
|
||||
|
||||
raise SecurityError("Too many redirects")
|
||||
```
|
||||
|
||||
### Threat 5: Token Replay Attacks
|
||||
|
||||
**Attack Vector**: Intercepted token reused
|
||||
|
||||
**Impact**: Unauthorized access
|
||||
|
||||
**Mitigations**:
|
||||
- Always use HTTPS for token transmission
|
||||
- Implement token expiration
|
||||
- Cache token verification results briefly
|
||||
- Use nonce/timestamp validation
|
||||
|
||||
## Security Requirements
|
||||
|
||||
### 1. HTTPS Enforcement
|
||||
|
||||
```python
|
||||
class HTTPSEnforcer:
|
||||
def validate_url(self, url: str, context: str):
|
||||
"""Enforce HTTPS for all security-critical URLs"""
|
||||
|
||||
parsed = urlparse(url)
|
||||
|
||||
# Development exception (with warning)
|
||||
if self.development_mode and parsed.hostname in ['localhost', '127.0.0.1']:
|
||||
logger.warning(f"Allowing HTTP in development for {context}: {url}")
|
||||
return
|
||||
|
||||
# Production: HTTPS required
|
||||
if parsed.scheme != 'https':
|
||||
raise SecurityError(f"HTTPS required for {context}: {url}")
|
||||
```
|
||||
|
||||
### 2. Certificate Validation
|
||||
|
||||
```python
|
||||
def create_secure_http_client():
|
||||
"""Create HTTP client with proper security settings"""
|
||||
|
||||
return httpx.Client(
|
||||
verify=True, # Always verify SSL certificates
|
||||
follow_redirects=False, # Handle redirects manually
|
||||
timeout=httpx.Timeout(
|
||||
connect=5.0,
|
||||
read=10.0,
|
||||
write=10.0,
|
||||
pool=10.0
|
||||
),
|
||||
limits=httpx.Limits(
|
||||
max_connections=100,
|
||||
max_keepalive_connections=20
|
||||
),
|
||||
headers={
|
||||
'User-Agent': 'StarPunk/1.0 (+https://starpunk.example.com/)'
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Input Validation
|
||||
|
||||
```python
|
||||
def validate_endpoint_response(response: dict, expected_me: str):
|
||||
"""Validate token verification response"""
|
||||
|
||||
# Required fields
|
||||
if 'me' not in response:
|
||||
raise ValidationError("Missing 'me' field in response")
|
||||
|
||||
# URL normalization and comparison
|
||||
normalized_me = normalize_url(response['me'])
|
||||
normalized_expected = normalize_url(expected_me)
|
||||
|
||||
if normalized_me != normalized_expected:
|
||||
raise ValidationError(
|
||||
f"Token 'me' mismatch: expected {normalized_expected}, "
|
||||
f"got {normalized_me}"
|
||||
)
|
||||
|
||||
# Scope validation
|
||||
scopes = response.get('scope', '').split()
|
||||
if 'create' not in scopes:
|
||||
raise ValidationError("Token missing required 'create' scope")
|
||||
|
||||
return True
|
||||
```
|
||||
|
||||
### 4. Rate Limiting
|
||||
|
||||
```python
|
||||
class DiscoveryRateLimiter:
|
||||
"""Prevent discovery abuse"""
|
||||
|
||||
def __init__(self, max_per_minute: int = 60):
|
||||
self.requests = defaultdict(list)
|
||||
self.max_per_minute = max_per_minute
|
||||
|
||||
def check_rate_limit(self, profile_url: str):
|
||||
now = time.time()
|
||||
minute_ago = now - 60
|
||||
|
||||
# Clean old entries
|
||||
self.requests[profile_url] = [
|
||||
t for t in self.requests[profile_url]
|
||||
if t > minute_ago
|
||||
]
|
||||
|
||||
# Check limit
|
||||
if len(self.requests[profile_url]) >= self.max_per_minute:
|
||||
raise RateLimitError(f"Too many discovery requests for {profile_url}")
|
||||
|
||||
# Record request
|
||||
self.requests[profile_url].append(now)
|
||||
```
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
### Discovery Security
|
||||
|
||||
- [ ] Enforce HTTPS for profile URLs
|
||||
- [ ] Validate SSL certificates
|
||||
- [ ] Limit redirect chains to 5
|
||||
- [ ] Detect redirect loops
|
||||
- [ ] Validate discovered endpoint URLs
|
||||
- [ ] Implement discovery rate limiting
|
||||
- [ ] Log all discovery attempts
|
||||
- [ ] Handle timeouts gracefully
|
||||
|
||||
### Token Verification Security
|
||||
|
||||
- [ ] Use HTTPS for all token endpoints
|
||||
- [ ] Validate token endpoint responses
|
||||
- [ ] Check 'me' field matches expected
|
||||
- [ ] Verify required scopes present
|
||||
- [ ] Hash tokens before caching
|
||||
- [ ] Implement cache expiration
|
||||
- [ ] Use constant-time comparisons
|
||||
- [ ] Log verification failures
|
||||
|
||||
### Cache Security
|
||||
|
||||
- [ ] Validate data before caching
|
||||
- [ ] Implement cache size limits
|
||||
- [ ] Use TTL for all cache entries
|
||||
- [ ] Clear cache on configuration changes
|
||||
- [ ] Protect against cache poisoning
|
||||
- [ ] Monitor cache hit/miss rates
|
||||
- [ ] Implement cache integrity checks
|
||||
|
||||
### Error Handling
|
||||
|
||||
- [ ] Never expose internal errors
|
||||
- [ ] Log security events
|
||||
- [ ] Rate limit error responses
|
||||
- [ ] Implement proper timeouts
|
||||
- [ ] Handle network failures gracefully
|
||||
- [ ] Provide clear user messages
|
||||
|
||||
## Security Testing
|
||||
|
||||
### Test Scenarios
|
||||
|
||||
1. **HTTPS Downgrade Attack**
|
||||
- Try to use HTTP endpoints
|
||||
- Verify rejection
|
||||
|
||||
2. **Invalid Certificates**
|
||||
- Test with self-signed certs
|
||||
- Test with expired certs
|
||||
- Verify rejection
|
||||
|
||||
3. **Redirect Attacks**
|
||||
- Test redirect loops
|
||||
- Test excessive redirects
|
||||
- Test HTTP redirects
|
||||
- Verify proper handling
|
||||
|
||||
4. **Cache Poisoning**
|
||||
- Attempt to inject invalid data
|
||||
- Verify cache validation
|
||||
|
||||
5. **Token Manipulation**
|
||||
- Modify token before verification
|
||||
- Test expired tokens
|
||||
- Test tokens with wrong 'me'
|
||||
- Verify proper rejection
|
||||
|
||||
## Monitoring and Alerting
|
||||
|
||||
### Security Metrics
|
||||
|
||||
```python
|
||||
# Track these metrics
|
||||
security_metrics = {
|
||||
'discovery_failures': Counter(),
|
||||
'https_violations': Counter(),
|
||||
'certificate_errors': Counter(),
|
||||
'redirect_limit_exceeded': Counter(),
|
||||
'cache_poisoning_attempts': Counter(),
|
||||
'token_verification_failures': Counter(),
|
||||
'rate_limit_violations': Counter()
|
||||
}
|
||||
```
|
||||
|
||||
### Alert Conditions
|
||||
|
||||
- Multiple discovery failures for same profile
|
||||
- Sudden increase in HTTPS violations
|
||||
- Certificate validation failures
|
||||
- Cache poisoning attempts detected
|
||||
- Unusual token verification patterns
|
||||
|
||||
## Incident Response
|
||||
|
||||
### If Endpoint Compromise Suspected
|
||||
|
||||
1. Clear endpoint cache immediately
|
||||
2. Force re-discovery of all endpoints
|
||||
3. Alert affected users
|
||||
4. Review logs for suspicious patterns
|
||||
5. Document incident
|
||||
|
||||
### If Cache Poisoning Detected
|
||||
|
||||
1. Clear entire cache
|
||||
2. Review cache validation logic
|
||||
3. Identify attack vector
|
||||
4. Implement additional validation
|
||||
5. Monitor for recurrence
|
||||
|
||||
## Conclusion
|
||||
|
||||
Dynamic endpoint discovery is not just correct according to the IndieAuth specification - it's also more secure than hardcoded endpoints. By allowing users to control their authentication infrastructure, we:
|
||||
|
||||
1. Eliminate single points of failure
|
||||
2. Enable immediate provider switching
|
||||
3. Distribute security responsibility
|
||||
4. Maintain true decentralization
|
||||
5. Respect user sovereignty
|
||||
|
||||
The complexity of proper implementation is justified by the security and flexibility benefits. This is what IndieAuth is designed to provide, and we must implement it correctly.
|
||||
|
||||
---
|
||||
|
||||
**Document Version**: 1.0
|
||||
**Created**: 2024-11-24
|
||||
**Classification**: Security Architecture
|
||||
**Review Schedule**: Quarterly
|
||||
Reference in New Issue
Block a user