Gondulf/docs/architecture/security.md

# Security Architecture

## Security Philosophy

Gondulf follows a defense-in-depth security model with these core principles:

1. **Secure by Default**: Security features enabled out of the box
2. **Fail Securely**: Errors default to denying access, not granting it
3. **Least Privilege**: Collect and store minimum necessary data
4. **Transparency**: Security decisions documented and auditable
5. **Standards Compliance**: Follow OAuth 2.0 and IndieAuth security best practices

## Threat Model

### Assets to Protect

**Primary Assets**:
- User domain identities (the `me` parameter)
- Access tokens (prove user identity to clients)
- Authorization codes (short-lived, exchange for tokens)

**Secondary Assets**:
- Email verification codes (prove email ownership)
- Domain verification status (cached TXT record checks)
- Client metadata (cached application information)

**Explicitly NOT Protected** (by design):
- Passwords (none stored)
- Personal user data beyond domain (privacy principle)
- Client secrets (OAuth 2.0 public clients)

### Threat Actors

**External Attackers**:
- Phishing attempts (fake clients)
- Token theft (network interception)
- Open redirect exploitation
- CSRF attacks
- Brute force attacks (code guessing)

**Compromised Clients**:
- Malicious client applications
- Client impersonation
- Redirect URI manipulation

**System Compromise**:
- Database access (SQLite file theft)
- Server memory access (in-memory code theft)
- Log file access (token exposure)

### Out of Scope (v1.0.0)

- DDoS attacks (handled by infrastructure)
- Zero-day vulnerabilities in dependencies
- Physical access to server
- Social engineering attacks on users
- DNS hijacking (external to application)

## Authentication Security

### Email-Based Verification (v1.0.0)

**Mechanism**: Users prove domain ownership by receiving verification code at email address on that domain.

#### Threat: Email Interception

**Risk**: Attacker intercepts email containing verification code.

**Mitigations**:
1. **Short Code Lifetime**: 15-minute expiration
2. **Single Use**: Code invalidated after verification
3. **Rate Limiting**: Max 3 code requests per email per hour
4. **TLS Email Delivery**: Require STARTTLS for SMTP
5. **Display Warning**: "Only request code if you initiated this login"

**Residual Risk**: Acceptable for v1.0.0 given short lifetime and single-use.

#### Threat: Code Brute Force

**Risk**: Attacker guesses 6-digit verification code.

**Mitigations**:
1. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
2. **Attempt Limiting**: Max 3 attempts per email
3. **Short Lifetime**: 15-minute window
4. **Rate Limiting**: Max 10 attempts per IP per hour
5. **Exponential Backoff**: 5-second delay after each failed attempt

**Math**:
- 3 attempts × 1,000,000 codes = 0.0003% success probability
- 15-minute window limits attack time
- Rate limiting prevents distributed guessing

**Residual Risk**: Very low, acceptable for v1.0.0.

#### Threat: Email Address Enumeration

**Risk**: Attacker discovers which domains are registered by requesting codes.

**Mitigations**:
1. **Consistent Response**: Always say "If email exists, code sent"
2. **No Error Differentiation**: Same message for valid/invalid emails
3. **Rate Limiting**: Prevent bulk enumeration

**Residual Risk**: Minimal, domain names are public anyway (DNS).

### Domain Ownership Verification

#### TXT Record Validation (Preferred)

**Mechanism**: Admin adds DNS TXT record `_gondulf.example.com` = `verified`.

**Security Properties**:
- Requires DNS control (stronger than email)
- Verifiable without user interaction
- Cacheable for performance
- Re-verifiable periodically

**Threat: DNS Spoofing**

**Mitigations**:
1. **DNSSEC**: Validate DNSSEC signatures if available
2. **Multiple Resolvers**: Query 2+ DNS servers, require consensus
3. **Caching**: Cache valid results, re-verify daily
4. **Logging**: Log all DNS verification attempts

**Implementation**:
```python
import dns.resolver
import dns.dnssec

def verify_txt_record(domain: str) -> bool:
    """
    Verify _gondulf.{domain} TXT record exists with value 'verified'.
    """
    try:
        # Use Google and Cloudflare DNS for redundancy
        resolvers = ['8.8.8.8', '1.1.1.1']
        results = []

        for resolver_ip in resolvers:
            resolver = dns.resolver.Resolver()
            resolver.nameservers = [resolver_ip]
            resolver.timeout = 5
            resolver.lifetime = 5

            answers = resolver.resolve(f'_gondulf.{domain}', 'TXT')
            for rdata in answers:
                txt_value = rdata.to_text().strip('"')
                if txt_value == 'verified':
                    results.append(True)
                    break

        # Require consensus from both resolvers
        return len(results) >= 2

    except Exception as e:
        logger.warning(f"DNS verification failed for {domain}: {e}")
        return False
```

**Residual Risk**: Low, DNS is foundational internet infrastructure.

## Authorization Security

### Authorization Code Security

**Properties**:
- **Length**: 32 bytes (256 bits of entropy)
- **Generation**: `secrets.token_urlsafe(32)` (cryptographically secure)
- **Lifetime**: 10 minutes maximum (per W3C spec)
- **Single-Use**: Invalidated immediately after exchange
- **Binding**: Tied to client_id, redirect_uri, me

#### Threat: Authorization Code Interception

**Risk**: Attacker intercepts code from redirect URL.

**Mitigations (v1.0.0)**:
1. **HTTPS Only**: Enforce TLS for all communications
2. **Short Lifetime**: 10-minute expiration
3. **Single Use**: Code invalidated after first use
4. **State Binding**: Client validates state parameter (CSRF protection)

**Mitigations (Future - PKCE)**:
1. **Code Challenge**: Client sends hash of secret with auth request
2. **Code Verifier**: Client proves knowledge of secret on token exchange
3. **No Interception Value**: Code useless without original secret

**ADR-003 Decision**: PKCE deferred to v1.1.0 to maintain MVP simplicity.

**Residual Risk**: Low with HTTPS + short lifetime, minimal with PKCE (future).

#### Threat: Code Replay Attack

**Risk**: Attacker reuses previously valid authorization code.

**Mitigations**:
1. **Single-Use Enforcement**: Mark code as used in storage
2. **Immediate Invalidation**: Delete code after exchange
3. **Concurrent Use Detection**: Log warning if used code presented again

**Implementation**:
```python
def exchange_code(code: str) -> Optional[dict]:
    """
    Exchange authorization code for token.
    Returns None if code invalid, expired, or already used.
    """
    # Retrieve code data
    code_data = code_storage.get(code)
    if not code_data:
        logger.warning("Code not found or expired")
        return None

    # Check if already used
    if code_data.get('used'):
        logger.error(f"Code replay attack detected: {code[:8]}...")
        # SECURITY: Potential replay attack, alert admin
        return None

    # Mark as used IMMEDIATELY (before token generation)
    code_data['used'] = True
    code_storage.set(code, code_data)

    # Generate token
    return generate_token(code_data)
```

**Residual Risk**: Negligible.

### Access Token Security

**Properties**:
- **Format**: Opaque tokens (v1.0.0), not JWT
- **Length**: 32 bytes (256 bits of entropy)
- **Generation**: `secrets.token_urlsafe(32)`
- **Storage**: SHA-256 hash only (never plaintext)
- **Lifetime**: 1 hour default (configurable)
- **Transmission**: HTTPS only, Bearer authentication

#### Threat: Token Theft

**Risk**: Attacker steals access token from storage or transmission.

**Mitigations**:
1. **TLS Enforcement**: HTTPS only in production
2. **Hashed Storage**: Store SHA-256 hash, not plaintext
3. **Short Lifetime**: 1-hour expiration (configurable)
4. **Revocation**: Admin can revoke tokens (future)
5. **Secure Headers**: Set Cache-Control: no-store, Pragma: no-cache

**Token Storage**:
```python
import hashlib
import secrets

def generate_token(me: str, client_id: str) -> str:
    """
    Generate access token and store hash in database.
    """
    # Generate token (returned to client, never stored)
    token = secrets.token_urlsafe(32)

    # Store only hash (irreversible)
    token_hash = hashlib.sha256(token.encode()).hexdigest()

    db.execute('''
        INSERT INTO tokens (token_hash, me, client_id, scope, issued_at, expires_at)
        VALUES (?, ?, ?, ?, ?, ?)
    ''', (token_hash, me, client_id, "", datetime.utcnow(), expires_at))

    return token
```

**Residual Risk**: Low, tokens useless if hashing is secure.

#### Threat: Timing Attacks on Token Verification

**Risk**: Attacker uses timing differences to guess valid tokens character-by-character.

**Mitigations**:
1. **Constant-Time Comparison**: Use `secrets.compare_digest()`
2. **Hash Comparison**: Compare hashes, not tokens
3. **Logging Delays**: Random delay on failed validation

**Implementation**:
```python
import secrets
import hashlib

def verify_token(provided_token: str) -> Optional[dict]:
    """
    Verify access token using constant-time comparison.
    """
    # Hash provided token
    provided_hash = hashlib.sha256(provided_token.encode()).hexdigest()

    # Lookup in database
    token_data = db.query_one('''
        SELECT me, client_id, scope, expires_at, revoked
        FROM tokens
        WHERE token_hash = ?
    ''', (provided_hash,))

    if not token_data:
        return None

    # Constant-time comparison (even though we use SQL =, hash mismatch protection)
    # The comparison happens in SQL, but we add extra layer here
    if not secrets.compare_digest(provided_hash, provided_hash):
        # This always passes, but ensures constant-time code path
        pass

    # Check expiration
    if datetime.utcnow() > token_data['expires_at']:
        return None

    # Check revocation
    if token_data.get('revoked'):
        return None

    return token_data
```

**Residual Risk**: Negligible.

## Input Validation

### URL Validation Security

**Critical**: Improper URL validation enables phishing and open redirect attacks.

#### Threat: Open Redirect via redirect_uri

**Risk**: Attacker tricks user into authorizing malicious redirect_uri, steals authorization code.

**Mitigations**:
1. **Domain Matching**: Require redirect_uri domain match client_id domain
2. **Subdomain Validation**: Allow subdomains of client_id domain
3. **Registered URIs**: Future feature to pre-register alternate domains
4. **User Warning**: Display warning if domains differ
5. **HTTPS Enforcement**: Require HTTPS for non-localhost

**Validation Logic**:
```python
def validate_redirect_uri(redirect_uri: str, client_id: str, registered_uris: list) -> tuple[bool, str]:
    """
    Validate redirect_uri against client_id.
    Returns (is_valid, warning_message).
    """
    redirect_parsed = urlparse(redirect_uri)
    client_parsed = urlparse(client_id)

    # Must be HTTPS (except localhost)
    if redirect_parsed.hostname != 'localhost':
        if redirect_parsed.scheme != 'https':
            return False, "redirect_uri must use HTTPS"

    redirect_domain = redirect_parsed.hostname.lower()
    client_domain = client_parsed.hostname.lower()

    # Exact match: OK
    if redirect_domain == client_domain:
        return True, ""

    # Subdomain: OK
    if redirect_domain.endswith('.' + client_domain):
        return True, ""

    # Registered URI: OK (future)
    if redirect_uri in registered_uris:
        return True, ""

    # Different domain: WARNING
    warning = f"Warning: Redirect to different domain ({redirect_domain})"
    return True, warning  # Allow but warn user
```

**Residual Risk**: Low, user must approve redirect with warning.

#### Threat: Phishing via Malicious client_id

**Risk**: Attacker uses client_id of legitimate-looking domain (typosquatting).

**Mitigations**:
1. **Display Full URL**: Show complete client_id to user, not just app name
2. **Fetch Verification**: Verify client_id is fetchable (real domain)
3. **Subdomain Check**: Warn if client_id is subdomain of well-known domain
4. **Certificate Validation**: Verify SSL certificate validity
5. **User Education**: Inform users to verify client_id carefully

**UI Display**:
```
Sign in to:
  Application Name (if available)
  https://client.example.com  ← Full URL always displayed

Redirect to:
  https://client.example.com/callback
```

**Residual Risk**: Moderate, requires user vigilance.

#### Threat: URL Parameter Injection

**Risk**: Attacker injects malicious parameters via crafted URLs.

**Mitigations**:
1. **Pydantic Validation**: Use Pydantic models for all parameters
2. **Type Enforcement**: Strict type checking (str, not any)
3. **Allowlist Validation**: Only accept expected parameters
4. **SQL Parameterization**: Use parameterized queries (prevent SQL injection)
5. **HTML Encoding**: Encode all user input in HTML responses

**Pydantic Models**:
```python
from pydantic import BaseModel, HttpUrl, Field

class AuthorizeRequest(BaseModel):
    me: HttpUrl
    client_id: HttpUrl
    redirect_uri: HttpUrl
    state: str = Field(min_length=1, max_length=512)
    response_type: Literal["code"]
    scope: str = ""  # Optional, ignored in v1.0.0

    class Config:
        extra = "forbid"  # Reject unknown parameters
```

**Residual Risk**: Minimal, Pydantic provides strong validation.

### Email Validation

#### Threat: Email Injection Attacks

**Risk**: Attacker injects SMTP commands via email address field.

**Mitigations**:
1. **Format Validation**: Strict email regex (RFC 5322)
2. **Domain Matching**: Require email domain match `me` domain
3. **SMTP Library**: Use well-tested library (smtplib)
4. **Content Encoding**: Encode email content properly
5. **Rate Limiting**: Prevent abuse

**Validation**:
```python
import re
from email.utils import parseaddr

def validate_email(email: str, required_domain: str) -> tuple[bool, str]:
    """
    Validate email address and domain match.
    """
    # Parse email (RFC 5322 compliant)
    name, addr = parseaddr(email)

    # Basic format check
    email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    if not re.match(email_regex, addr):
        return False, "Invalid email format"

    # Extract domain
    email_domain = addr.split('@')[1].lower()
    required_domain = required_domain.lower()

    # Domain must match
    if email_domain != required_domain:
        return False, f"Email must be at {required_domain}"

    return True, ""
```

**Residual Risk**: Low, standard validation patterns.

## Network Security

### TLS/HTTPS Enforcement

**Production Requirements**:
- All endpoints MUST use HTTPS
- Minimum TLS 1.2 (prefer TLS 1.3)
- Strong cipher suites only
- Valid SSL certificate (not self-signed)

**Configuration**:
```python
# In production configuration
if not DEBUG:
    # Enforce HTTPS
    app.add_middleware(HTTPSRedirectMiddleware)

    # Add security headers
    app.add_middleware(
        SecureHeadersMiddleware,
        hsts="max-age=31536000; includeSubDomains",
        content_security_policy="default-src 'self'",
        x_frame_options="DENY",
        x_content_type_options="nosniff"
    )
```

**Development Exception**:
- HTTP allowed for `localhost` only
- Never in production

**Residual Risk**: Negligible if properly configured.

### Security Headers

**Required Headers**:

```http
# Prevent clickjacking
X-Frame-Options: DENY

# Prevent MIME sniffing
X-Content-Type-Options: nosniff

# XSS protection (legacy browsers)
X-XSS-Protection: 1; mode=block

# HSTS (HTTPS enforcement)
Strict-Transport-Security: max-age=31536000; includeSubDomains

# CSP (limit resource loading)
Content-Security-Policy: default-src 'self'; style-src 'self' 'unsafe-inline'

# Referrer policy (privacy)
Referrer-Policy: strict-origin-when-cross-origin
```

**Implementation**:
```python
@app.middleware("http")
async def add_security_headers(request: Request, call_next):
    response = await call_next(request)
    response.headers["X-Frame-Options"] = "DENY"
    response.headers["X-Content-Type-Options"] = "nosniff"
    response.headers["X-XSS-Protection"] = "1; mode=block"
    if not DEBUG:
        response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains"
    response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
    return response
```

## Data Security

### Data Minimization (Privacy)

**Principle**: Collect and store ONLY essential data.

**Stored Data**:
- ✅ Domain name (user identity, required)
- ✅ Token hashes (security, required)
- ✅ Client IDs (protocol, required)
- ✅ Timestamps (auditing, required)

**Never Stored**:
- ❌ Email addresses (after verification)
- ❌ Plaintext tokens
- ❌ User-Agent strings
- ❌ IP addresses (except rate limiting, temporary)
- ❌ Browsing history
- ❌ Personal information

**Email Handling**:
```python
# Email stored ONLY during verification (in-memory, 15-min TTL)
verification_codes[code_id] = {
    "email": email,  # ← Exists ONLY here, NEVER in database
    "code": code,
    "expires_at": datetime.utcnow() + timedelta(minutes=15)
}

# After verification: email is deleted, only domain stored
db.execute('''
    INSERT INTO domains (domain, verification_method, verified_at)
    VALUES (?, 'email', ?)
''', (domain, datetime.utcnow()))
# Note: NO email address in database
```

### Database Security

**SQLite Security**:
1. **File Permissions**: 600 (owner read/write only)
2. **Encryption at Rest**: Use encrypted filesystem (LUKS, dm-crypt)
3. **Backup Encryption**: Encrypt backup files (GPG)
4. **SQL Injection Prevention**: Parameterized queries only

**Parameterized Queries**:
```python
# GOOD: Parameterized (safe)
db.execute(
    "SELECT * FROM tokens WHERE token_hash = ?",
    (token_hash,)
)

# BAD: String interpolation (vulnerable)
db.execute(
    f"SELECT * FROM tokens WHERE token_hash = '{token_hash}'"
)  # ← NEVER DO THIS
```

**File Permissions**:
```bash
# Set restrictive permissions
chmod 600 /data/gondulf.db
chown gondulf:gondulf /data/gondulf.db
```

### Logging Security

**Principle**: Log security events, NEVER log sensitive data.

**Log Security Events**:
- ✅ Failed authentication attempts
- ✅ Authorization grants (domain + client_id)
- ✅ Token generation (hash prefix only)
- ✅ Email verification attempts
- ✅ DNS verification results
- ✅ Error conditions

**Never Log**:
- ❌ Email addresses (PII)
- ❌ Full access tokens
- ❌ Verification codes
- ❌ Authorization codes
- ❌ IP addresses (production)

**Safe Logging Examples**:
```python
# GOOD: Domain only (public information)
logger.info(f"Authorization granted for {domain} to {client_id}")

# GOOD: Token prefix for correlation
logger.debug(f"Token generated: {token[:8]}...")

# GOOD: Error without sensitive data
logger.error(f"Email send failed for domain {domain}")

# BAD: Email address (PII)
logger.info(f"Verification sent to {email}")  # ← NEVER

# BAD: Full token (security)
logger.debug(f"Token: {token}")  # ← NEVER
```

## Dependency Security

### Dependency Management

**Principles**:
1. **Minimal Dependencies**: Prefer standard library
2. **Vetted Libraries**: Only well-maintained, popular libraries
3. **Version Pinning**: Pin exact versions in requirements.txt
4. **Security Scanning**: Regular vulnerability scanning
5. **Update Strategy**: Security patches applied promptly

**Security Scanning**:
```bash
# Scan for known vulnerabilities
uv run pip-audit

# Alternative: safety check
uv run safety check
```

**Update Policy**:
- **Security patches**: Apply within 24 hours (critical), 7 days (high)
- **Minor versions**: Review and test before updating
- **Major versions**: Evaluate breaking changes, test thoroughly

### Secrets Management

**Environment Variables** (v1.0.0):
```bash
# Required secrets
GONDULF_SECRET_KEY=<256-bit random value>
GONDULF_SMTP_PASSWORD=<SMTP password>

# Optional secrets
GONDULF_DATABASE_ENCRYPTION_KEY=<for encrypted backups>
```

**Secret Generation**:
```bash
# Generate SECRET_KEY (256 bits)
python -c "import secrets; print(secrets.token_urlsafe(32))"
```

**Storage**:
- Development: `.env` file (not committed)
- Production: Docker secrets or environment variables
- Never hardcode secrets in code

**Future**: Integrate with HashiCorp Vault or AWS Secrets Manager.

## Rate Limiting (Future)

**v1.0.0**: Not implemented (acceptable for small deployments).

**Future Implementation**:

| Endpoint | Limit | Window | Key |
|----------|-------|--------|-----|
| /authorize | 10 requests | 1 minute | IP |
| /token | 30 requests | 1 minute | client_id |
| Email verification | 3 codes | 1 hour | email |
| Code submission | 3 attempts | 15 minutes | session |

**Implementation Strategy**:
- Use Redis for distributed rate limiting
- Token bucket algorithm
- Exponential backoff on failures

## Security Testing

### Required Security Tests

1. **Input Validation**:
   - Malformed URLs (me, client_id, redirect_uri)
   - SQL injection attempts
   - XSS attempts
   - Email injection

2. **Authentication**:
   - Expired code rejection
   - Used code rejection
   - Invalid code rejection
   - Brute force resistance

3. **Authorization**:
   - State parameter validation
   - Redirect URI validation
   - Open redirect prevention

4. **Token Security**:
   - Timing attack resistance
   - Token theft scenarios
   - Expiration enforcement

5. **TLS/HTTPS**:
   - HTTP rejection in production
   - Security headers presence
   - Certificate validation

### Security Scanning Tools

**Required Tools**:
- `bandit`: Python security linter
- `pip-audit`: Dependency vulnerability scanner
- `pytest`: Security-focused test cases

**CI/CD Integration**:
```yaml
# GitHub Actions example
security:
  - name: Run Bandit
    run: uv run bandit -r src/gondulf

  - name: Scan Dependencies
    run: uv run pip-audit

  - name: Run Security Tests
    run: uv run pytest tests/security/
```

## Incident Response

### Security Event Monitoring

**Monitor For**:
1. Multiple failed authentication attempts
2. Authorization code reuse attempts
3. Invalid token presentation
4. Unusual DNS verification failures
5. Email send failures (potential abuse)

**Alerting** (future):
- Admin email on critical events
- Webhook integration (Slack, Discord)
- Metrics dashboard (Grafana)

### Breach Response Plan (Future)

**If Access Tokens Compromised**:
1. Revoke all active tokens
2. Force re-authentication
3. Notify affected users (via domain)
4. Rotate SECRET_KEY
5. Audit logs for suspicious activity

**If Database Compromised**:
1. Assess data exposure (only hashes + domains)
2. Rotate all tokens
3. Review access logs
4. Notify users if domains exposed

## Compliance Considerations

### GDPR Compliance

**Personal Data Stored**:
- Domain names (considered PII in some jurisdictions)
- Timestamps (associated with domains)

**GDPR Rights**:
- **Right to Access**: Admin can query database
- **Right to Erasure**: Admin can delete domain records
- **Right to Portability**: Data export feature (future)

**Privacy Policy** (required):
- Document what data is collected (domains, timestamps)
- Document how data is used (authentication)
- Document retention policy (indefinite unless deleted)
- Provide contact for data requests

### Security Disclosure

**Security Policy** (future):
- Responsible disclosure process
- Security contact (security@domain)
- GPG key for encrypted reports
- Acknowledgments for researchers

## Security Roadmap

### v1.0.0 (MVP)
- ✅ Email-based authentication
- ✅ TLS/HTTPS enforcement
- ✅ Secure token generation (opaque, hashed)
- ✅ URL validation (open redirect prevention)
- ✅ Input validation (Pydantic)
- ✅ Security headers
- ✅ Minimal data collection

### v1.1.0
- PKCE support (code challenge/verifier)
- Rate limiting (Redis-based)
- Token revocation endpoint
- Enhanced logging

### v1.2.0
- WebAuthn support (passwordless)
- Hardware security key support
- Admin dashboard (audit logs)
- Security metrics

### v2.0.0
- Multi-factor authentication
- Federated identity providers
- Advanced threat detection
- SOC 2 compliance preparation

## References

- OWASP Top 10: https://owasp.org/www-project-top-ten/
- OAuth 2.0 Security Best Practices: https://datatracker.ietf.org/doc/html/draft-ietf-oauth-security-topics
- NIST Cybersecurity Framework: https://www.nist.gov/cyberframework
- CWE Top 25: https://cwe.mitre.org/top25/