StarPunk/docs/design/v1.0.0/indieauth-endpoint-discovery-security.md

# IndieAuth Endpoint Discovery Security Analysis

## Executive Summary

This document analyzes the security implications of implementing IndieAuth endpoint discovery correctly, contrasting it with the fundamentally flawed approach of hardcoding endpoints.

## The Critical Error: Hardcoded Endpoints

### What Was Wrong

```ini
# FATALLY FLAWED - Breaks IndieAuth completely
TOKEN_ENDPOINT=https://tokens.indieauth.com/token
```

### Why It's a Security Disaster

1. **Single Point of Failure**: If the hardcoded endpoint is compromised, ALL users are affected
2. **No User Control**: Users cannot change providers if security issues arise
3. **Trust Concentration**: Forces all users to trust a single provider
4. **Not IndieAuth**: This isn't IndieAuth at all - it's just OAuth with extra steps
5. **Violates User Sovereignty**: Users don't control their own authentication

## The Correct Approach: Dynamic Discovery

### Security Model

```
User Identity URL → Endpoint Discovery → Provider Verification
     (User Controls)     (Dynamic)        (User's Choice)
```

### Security Benefits

1. **Distributed Trust**: No single provider compromise affects all users
2. **User Control**: Users can switch providers instantly if needed
3. **Provider Independence**: Each user's security is independent
4. **Immediate Revocation**: Users can revoke by changing profile links
5. **True Decentralization**: No central authority

## Threat Analysis

### Threat 1: Profile URL Hijacking

**Attack Vector**: Attacker gains control of user's profile URL

**Impact**: Can redirect authentication to attacker's endpoints

**Mitigations**:
- Profile URL must use HTTPS
- Verify SSL certificates
- Monitor for unexpected endpoint changes
- Cache endpoints with reasonable TTL

### Threat 2: Endpoint Discovery Manipulation

**Attack Vector**: MITM attack during endpoint discovery

**Impact**: Could redirect to malicious endpoints

**Mitigations**:
```python
def discover_endpoints(profile_url: str) -> dict:
    # CRITICAL: Enforce HTTPS
    if not profile_url.startswith('https://'):
        raise SecurityError("Profile URL must use HTTPS")

    # Verify SSL certificates
    response = requests.get(
        profile_url,
        verify=True,  # Enforce certificate validation
        timeout=5
    )

    # Validate discovered endpoints
    endpoints = extract_endpoints(response)
    for endpoint_url in endpoints.values():
        if not endpoint_url.startswith('https://'):
            raise SecurityError(f"Endpoint must use HTTPS: {endpoint_url}")

    return endpoints
```

### Threat 3: Cache Poisoning

**Attack Vector**: Attacker poisons endpoint cache with malicious URLs

**Impact**: Subsequent requests use attacker's endpoints

**Mitigations**:
```python
class SecureEndpointCache:
    def store_endpoints(self, profile_url: str, endpoints: dict):
        # Validate before caching
        self._validate_profile_url(profile_url)
        self._validate_endpoints(endpoints)

        # Store with integrity check
        cache_entry = {
            'endpoints': endpoints,
            'stored_at': time.time(),
            'checksum': self._calculate_checksum(endpoints)
        }
        self.cache[profile_url] = cache_entry

    def get_endpoints(self, profile_url: str) -> dict:
        entry = self.cache.get(profile_url)
        if entry:
            # Verify integrity
            if self._calculate_checksum(entry['endpoints']) != entry['checksum']:
                # Cache corruption detected
                del self.cache[profile_url]
                raise SecurityError("Cache integrity check failed")
        return entry['endpoints']
```

### Threat 4: Redirect Attacks

**Attack Vector**: Malicious redirects during discovery

**Impact**: Could redirect to attacker-controlled endpoints

**Mitigations**:
```python
def fetch_with_redirect_limit(url: str, max_redirects: int = 5):
    redirect_count = 0
    visited = set()

    while redirect_count < max_redirects:
        if url in visited:
            raise SecurityError("Redirect loop detected")
        visited.add(url)

        response = requests.get(url, allow_redirects=False)

        if response.status_code in (301, 302, 303, 307, 308):
            redirect_url = response.headers.get('Location')

            # Validate redirect target
            if not redirect_url.startswith('https://'):
                raise SecurityError("Redirect to non-HTTPS URL blocked")

            url = redirect_url
            redirect_count += 1
        else:
            return response

    raise SecurityError("Too many redirects")
```

### Threat 5: Token Replay Attacks

**Attack Vector**: Intercepted token reused

**Impact**: Unauthorized access

**Mitigations**:
- Always use HTTPS for token transmission
- Implement token expiration
- Cache token verification results briefly
- Use nonce/timestamp validation

## Security Requirements

### 1. HTTPS Enforcement

```python
class HTTPSEnforcer:
    def validate_url(self, url: str, context: str):
        """Enforce HTTPS for all security-critical URLs"""

        parsed = urlparse(url)

        # Development exception (with warning)
        if self.development_mode and parsed.hostname in ['localhost', '127.0.0.1']:
            logger.warning(f"Allowing HTTP in development for {context}: {url}")
            return

        # Production: HTTPS required
        if parsed.scheme != 'https':
            raise SecurityError(f"HTTPS required for {context}: {url}")
```

### 2. Certificate Validation

```python
def create_secure_http_client():
    """Create HTTP client with proper security settings"""

    return httpx.Client(
        verify=True,  # Always verify SSL certificates
        follow_redirects=False,  # Handle redirects manually
        timeout=httpx.Timeout(
            connect=5.0,
            read=10.0,
            write=10.0,
            pool=10.0
        ),
        limits=httpx.Limits(
            max_connections=100,
            max_keepalive_connections=20
        ),
        headers={
            'User-Agent': 'StarPunk/1.0 (+https://starpunk.example.com/)'
        }
    )
```

### 3. Input Validation

```python
def validate_endpoint_response(response: dict, expected_me: str):
    """Validate token verification response"""

    # Required fields
    if 'me' not in response:
        raise ValidationError("Missing 'me' field in response")

    # URL normalization and comparison
    normalized_me = normalize_url(response['me'])
    normalized_expected = normalize_url(expected_me)

    if normalized_me != normalized_expected:
        raise ValidationError(
            f"Token 'me' mismatch: expected {normalized_expected}, "
            f"got {normalized_me}"
        )

    # Scope validation
    scopes = response.get('scope', '').split()
    if 'create' not in scopes:
        raise ValidationError("Token missing required 'create' scope")

    return True
```

### 4. Rate Limiting

```python
class DiscoveryRateLimiter:
    """Prevent discovery abuse"""

    def __init__(self, max_per_minute: int = 60):
        self.requests = defaultdict(list)
        self.max_per_minute = max_per_minute

    def check_rate_limit(self, profile_url: str):
        now = time.time()
        minute_ago = now - 60

        # Clean old entries
        self.requests[profile_url] = [
            t for t in self.requests[profile_url]
            if t > minute_ago
        ]

        # Check limit
        if len(self.requests[profile_url]) >= self.max_per_minute:
            raise RateLimitError(f"Too many discovery requests for {profile_url}")

        # Record request
        self.requests[profile_url].append(now)
```

## Implementation Checklist

### Discovery Security

- [ ] Enforce HTTPS for profile URLs
- [ ] Validate SSL certificates
- [ ] Limit redirect chains to 5
- [ ] Detect redirect loops
- [ ] Validate discovered endpoint URLs
- [ ] Implement discovery rate limiting
- [ ] Log all discovery attempts
- [ ] Handle timeouts gracefully

### Token Verification Security

- [ ] Use HTTPS for all token endpoints
- [ ] Validate token endpoint responses
- [ ] Check 'me' field matches expected
- [ ] Verify required scopes present
- [ ] Hash tokens before caching
- [ ] Implement cache expiration
- [ ] Use constant-time comparisons
- [ ] Log verification failures

### Cache Security

- [ ] Validate data before caching
- [ ] Implement cache size limits
- [ ] Use TTL for all cache entries
- [ ] Clear cache on configuration changes
- [ ] Protect against cache poisoning
- [ ] Monitor cache hit/miss rates
- [ ] Implement cache integrity checks

### Error Handling

- [ ] Never expose internal errors
- [ ] Log security events
- [ ] Rate limit error responses
- [ ] Implement proper timeouts
- [ ] Handle network failures gracefully
- [ ] Provide clear user messages

## Security Testing

### Test Scenarios

1. **HTTPS Downgrade Attack**
   - Try to use HTTP endpoints
   - Verify rejection

2. **Invalid Certificates**
   - Test with self-signed certs
   - Test with expired certs
   - Verify rejection

3. **Redirect Attacks**
   - Test redirect loops
   - Test excessive redirects
   - Test HTTP redirects
   - Verify proper handling

4. **Cache Poisoning**
   - Attempt to inject invalid data
   - Verify cache validation

5. **Token Manipulation**
   - Modify token before verification
   - Test expired tokens
   - Test tokens with wrong 'me'
   - Verify proper rejection

## Monitoring and Alerting

### Security Metrics

```python
# Track these metrics
security_metrics = {
    'discovery_failures': Counter(),
    'https_violations': Counter(),
    'certificate_errors': Counter(),
    'redirect_limit_exceeded': Counter(),
    'cache_poisoning_attempts': Counter(),
    'token_verification_failures': Counter(),
    'rate_limit_violations': Counter()
}
```

### Alert Conditions

- Multiple discovery failures for same profile
- Sudden increase in HTTPS violations
- Certificate validation failures
- Cache poisoning attempts detected
- Unusual token verification patterns

## Incident Response

### If Endpoint Compromise Suspected

1. Clear endpoint cache immediately
2. Force re-discovery of all endpoints
3. Alert affected users
4. Review logs for suspicious patterns
5. Document incident

### If Cache Poisoning Detected

1. Clear entire cache
2. Review cache validation logic
3. Identify attack vector
4. Implement additional validation
5. Monitor for recurrence

## Conclusion

Dynamic endpoint discovery is not just correct according to the IndieAuth specification - it's also more secure than hardcoded endpoints. By allowing users to control their authentication infrastructure, we:

1. Eliminate single points of failure
2. Enable immediate provider switching
3. Distribute security responsibility
4. Maintain true decentralization
5. Respect user sovereignty

The complexity of proper implementation is justified by the security and flexibility benefits. This is what IndieAuth is designed to provide, and we must implement it correctly.

---

**Document Version**: 1.0
**Created**: 2024-11-24
**Classification**: Security Architecture
**Review Schedule**: Quarterly