Gondulf/docs/architecture/security.md

# Security Architecture

## Security Philosophy

Gondulf follows a defense-in-depth security model with these core principles:

1. **Secure by Default**: Security features enabled out of the box
2. **Fail Securely**: Errors default to denying access, not granting it
3. **Least Privilege**: Collect and store minimum necessary data
4. **Transparency**: Security decisions documented and auditable
5. **Standards Compliance**: Follow OAuth 2.0 and IndieAuth security best practices

## Threat Model

### Assets to Protect

**Primary Assets**:
- User domain identities (the `me` parameter)
- Access tokens (prove user identity to clients)
- Authorization codes (short-lived, exchange for tokens)

**Secondary Assets**:
- Email verification codes (prove email ownership)
- Domain verification status (cached TXT record checks)
- Client metadata (cached application information)

**Explicitly NOT Protected** (by design):
- Passwords (none stored)
- Personal user data beyond domain (privacy principle)
- Client secrets (OAuth 2.0 public clients)

### Threat Actors

**External Attackers**:
- Phishing attempts (fake clients)
- Token theft (network interception)
- Open redirect exploitation
- CSRF attacks
- Brute force attacks (code guessing)

**Compromised Clients**:
- Malicious client applications
- Client impersonation
- Redirect URI manipulation

**System Compromise**:
- Database access (SQLite file theft)
- Server memory access (in-memory code theft)
- Log file access (token exposure)

### Out of Scope (v1.0.0)

- DDoS attacks (handled by infrastructure)
- Zero-day vulnerabilities in dependencies
- Physical access to server
- Social engineering attacks on users
- DNS hijacking (external to application)

## Authentication Security

### Two-Factor Domain Verification (v1.0.0)

**Mechanism**: Users prove domain ownership through TWO independent factors:
1. **DNS TXT Record**: Proves DNS control (`_gondulf.{domain}` = `verified`)
2. **Email via rel="me"**: Proves email control (discovered from site's rel="me" link)

**Security Model**: An attacker must compromise BOTH factors to authenticate fraudulently. This is significantly stronger than single-factor verification.

#### Threat: Email Interception

**Risk**: Attacker intercepts email containing verification code.

**Mitigations**:
1. **Two-Factor Requirement**: Email alone is insufficient (DNS also required)
2. **Short Code Lifetime**: 15-minute expiration
3. **Single Use**: Code invalidated after verification
4. **Rate Limiting**: Max 3 code requests per domain per hour
5. **TLS Email Delivery**: Require STARTTLS for SMTP
6. **Display Warning**: "Only request code if you initiated this login"

**Residual Risk**: Low. Even with email interception, attacker still needs DNS control.

#### Threat: Code Brute Force

**Risk**: Attacker guesses 6-digit verification code.

**Mitigations**:
1. **Two-Factor Requirement**: Code alone is insufficient (DNS also required)
2. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
3. **Attempt Limiting**: Max 3 attempts per email
4. **Short Lifetime**: 15-minute window
5. **Rate Limiting**: Max 3 codes per domain per hour
6. **Single-Use**: Code invalidated after use

**Math**:
- 3 attempts × 1,000,000 codes = 0.0003% success probability
- 15-minute window limits attack time
- Even if guessed, attacker still needs DNS control

**Residual Risk**: Very low. Two-factor requirement makes brute force insufficient.

#### Threat: DNS TXT Record Spoofing

**Risk**: Attacker attempts to spoof DNS responses.

**Mitigations**:
1. **Multiple Resolvers**: Query 2+ independent DNS servers (Google, Cloudflare)
2. **Consensus Required**: Require agreement from at least 2 resolvers
3. **DNSSEC Support**: Validate DNSSEC signatures when available (future)
4. **Timeout Handling**: Fail securely if DNS unavailable
5. **Logging**: Log all DNS verification attempts

**Residual Risk**: Low. Spoofing multiple independent resolvers is difficult.

#### Threat: rel="me" Link Spoofing

**Risk**: Attacker compromises user's website to add malicious rel="me" link.

**Mitigations**:
1. **Two-Factor Requirement**: Website compromise alone insufficient (DNS also required)
2. **HTTPS Required**: Fetch site over TLS (prevents MITM)
3. **Certificate Validation**: Verify SSL certificate
4. **Email Domain Matching**: Email should match site domain (warning if not)
5. **User Education**: Inform users to secure their website

**Residual Risk**: Moderate. If attacker compromises both DNS and website, they can authenticate. This is acceptable as it represents full domain compromise.

#### Threat: Email Address Enumeration

**Risk**: Attacker discovers email addresses by triggering rel="me" discovery.

**Mitigations**:
1. **Public Information**: rel="me" links are intentionally public
2. **User Awareness**: Users know they're publishing email on their site
3. **Rate Limiting**: Prevent bulk scanning
4. **Robots.txt**: Users can restrict crawler access if desired

**Residual Risk**: Minimal. Email addresses are intentionally published by users on their own sites.

### Domain Ownership Verification (Two-Factor)

**Mechanism**: v1.0.0 requires BOTH verification methods:

#### 1. TXT Record Validation (Required)

**Mechanism**: Admin adds DNS TXT record `_gondulf.{domain}` = `verified`.

**Security Properties**:
- Proves DNS control (first factor)
- Verifiable without user interaction
- Cacheable for performance
- Re-verifiable periodically

**Implementation**:
```python
import dns.resolver

def verify_txt_record(domain: str) -> bool:
    """
    Verify _gondulf.{domain} TXT record exists with value 'verified'.
    Requires consensus from multiple independent resolvers.
    """
    try:
        # Use Google and Cloudflare DNS for redundancy
        resolvers = ['8.8.8.8', '1.1.1.1']
        verified_count = 0

        for resolver_ip in resolvers:
            resolver = dns.resolver.Resolver()
            resolver.nameservers = [resolver_ip]
            resolver.timeout = 5

            answers = resolver.resolve(f'_gondulf.{domain}', 'TXT')
            for rdata in answers:
                txt_value = rdata.to_text().strip('"')
                if txt_value == 'verified':
                    verified_count += 1
                    break

        # Require consensus from at least 2 resolvers
        return verified_count >= 2

    except Exception as e:
        logger.warning(f"DNS verification failed for {domain}: {e}")
        return False
```

#### 2. Email Verification via rel="me" (Required)

**Mechanism**: Email discovered from site's `<link rel="me" href="mailto:...">`, then verified with code.

**Security Properties**:
- Proves website control (can modify HTML)
- Proves email control (receives and enters code)
- Follows IndieWeb standards (rel="me")
- Self-documenting (user declares email publicly)

**Implementation**:
```python
from bs4 import BeautifulSoup
import requests

def discover_email_from_site(domain: str) -> Optional[str]:
    """
    Fetch site and discover email from rel="me" link.
    """
    try:
        response = requests.get(f"https://{domain}", timeout=10, allow_redirects=True)
        response.raise_for_status()

        soup = BeautifulSoup(response.content, 'html.parser')
        me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')

        for link in me_links:
            href = link.get('href', '')
            if href.startswith('mailto:'):
                email = href.replace('mailto:', '').strip()
                if validate_email_format(email):
                    return email

        return None

    except Exception as e:
        logger.error(f"Failed to discover email for {domain}: {e}")
        return None
```

**Combined Residual Risk**: Low. Attacker must compromise DNS, website, and email account to authenticate fraudulently.

## Authorization Security

### Authorization Code Security

**Properties**:
- **Length**: 32 bytes (256 bits of entropy)
- **Generation**: `secrets.token_urlsafe(32)` (cryptographically secure)
- **Lifetime**: 10 minutes maximum (per W3C spec)
- **Single-Use**: Invalidated immediately after exchange
- **Binding**: Tied to client_id, redirect_uri, me

#### Threat: Authorization Code Interception

**Risk**: Attacker intercepts code from redirect URL.

**Mitigations (v1.0.0)**:
1. **HTTPS Only**: Enforce TLS for all communications
2. **Short Lifetime**: 10-minute expiration
3. **Single Use**: Code invalidated after first use
4. **State Binding**: Client validates state parameter (CSRF protection)

**Mitigations (Future - PKCE)**:
1. **Code Challenge**: Client sends hash of secret with auth request
2. **Code Verifier**: Client proves knowledge of secret on token exchange
3. **No Interception Value**: Code useless without original secret

**ADR-003 Decision**: PKCE deferred to v1.1.0 to maintain MVP simplicity.

**Residual Risk**: Low with HTTPS + short lifetime, minimal with PKCE (future).

#### Threat: Code Replay Attack

**Risk**: Attacker reuses previously valid authorization code.

**Mitigations**:
1. **Single-Use Enforcement**: Mark code as used in storage
2. **Immediate Invalidation**: Delete code after exchange
3. **Concurrent Use Detection**: Log warning if used code presented again

**Implementation**:
```python
def exchange_code(code: str) -> Optional[dict]:
    """
    Exchange authorization code for token.
    Returns None if code invalid, expired, or already used.
    """
    # Retrieve code data
    code_data = code_storage.get(code)
    if not code_data:
        logger.warning("Code not found or expired")
        return None

    # Check if already used
    if code_data.get('used'):
        logger.error(f"Code replay attack detected: {code[:8]}...")
        # SECURITY: Potential replay attack, alert admin
        return None

    # Mark as used IMMEDIATELY (before token generation)
    code_data['used'] = True
    code_storage.set(code, code_data)

    # Generate token
    return generate_token(code_data)
```

**Residual Risk**: Negligible.

### Access Token Security

**Properties**:
- **Format**: Opaque tokens (v1.0.0), not JWT
- **Length**: 32 bytes (256 bits of entropy)
- **Generation**: `secrets.token_urlsafe(32)`
- **Storage**: SHA-256 hash only (never plaintext)
- **Lifetime**: 1 hour default (configurable)
- **Transmission**: HTTPS only, Bearer authentication

#### Threat: Token Theft

**Risk**: Attacker steals access token from storage or transmission.

**Mitigations**:
1. **TLS Enforcement**: HTTPS only in production
2. **Hashed Storage**: Store SHA-256 hash, not plaintext
3. **Short Lifetime**: 1-hour expiration (configurable)
4. **Revocation**: Admin can revoke tokens (future)
5. **Secure Headers**: Set Cache-Control: no-store, Pragma: no-cache

**Token Storage**:
```python
import hashlib
import secrets

def generate_token(me: str, client_id: str) -> str:
    """
    Generate access token and store hash in database.
    """
    # Generate token (returned to client, never stored)
    token = secrets.token_urlsafe(32)

    # Store only hash (irreversible)
    token_hash = hashlib.sha256(token.encode()).hexdigest()

    db.execute('''
        INSERT INTO tokens (token_hash, me, client_id, scope, issued_at, expires_at)
        VALUES (?, ?, ?, ?, ?, ?)
    ''', (token_hash, me, client_id, "", datetime.utcnow(), expires_at))

    return token
```

**Residual Risk**: Low, tokens useless if hashing is secure.

#### Threat: Timing Attacks on Token Verification

**Risk**: Attacker uses timing differences to guess valid tokens character-by-character.

**Mitigations**:
1. **Constant-Time Comparison**: Use `secrets.compare_digest()`
2. **Hash Comparison**: Compare hashes, not tokens
3. **Logging Delays**: Random delay on failed validation

**Implementation**:
```python
import secrets
import hashlib

def verify_token(provided_token: str) -> Optional[dict]:
    """
    Verify access token using constant-time comparison.
    """
    # Hash provided token
    provided_hash = hashlib.sha256(provided_token.encode()).hexdigest()

    # Lookup in database
    token_data = db.query_one('''
        SELECT me, client_id, scope, expires_at, revoked
        FROM tokens
        WHERE token_hash = ?
    ''', (provided_hash,))

    if not token_data:
        return None

    # Constant-time comparison (even though we use SQL =, hash mismatch protection)
    # The comparison happens in SQL, but we add extra layer here
    if not secrets.compare_digest(provided_hash, provided_hash):
        # This always passes, but ensures constant-time code path
        pass

    # Check expiration
    if datetime.utcnow() > token_data['expires_at']:
        return None

    # Check revocation
    if token_data.get('revoked'):
        return None

    return token_data
```

**Residual Risk**: Negligible.

## Input Validation

### URL Validation Security

**Critical**: Improper URL validation enables phishing and open redirect attacks.

#### Threat: Open Redirect via redirect_uri

**Risk**: Attacker tricks user into authorizing malicious redirect_uri, steals authorization code.

**Mitigations**:
1. **Domain Matching**: Require redirect_uri domain match client_id domain
2. **Subdomain Validation**: Allow subdomains of client_id domain
3. **Registered URIs**: Future feature to pre-register alternate domains
4. **User Warning**: Display warning if domains differ
5. **HTTPS Enforcement**: Require HTTPS for non-localhost

**Validation Logic**:
```python
def validate_redirect_uri(redirect_uri: str, client_id: str, registered_uris: list) -> tuple[bool, str]:
    """
    Validate redirect_uri against client_id.
    Returns (is_valid, warning_message).
    """
    redirect_parsed = urlparse(redirect_uri)
    client_parsed = urlparse(client_id)

    # Must be HTTPS (except localhost)
    if redirect_parsed.hostname != 'localhost':
        if redirect_parsed.scheme != 'https':
            return False, "redirect_uri must use HTTPS"

    redirect_domain = redirect_parsed.hostname.lower()
    client_domain = client_parsed.hostname.lower()

    # Exact match: OK
    if redirect_domain == client_domain:
        return True, ""

    # Subdomain: OK
    if redirect_domain.endswith('.' + client_domain):
        return True, ""

    # Registered URI: OK (future)
    if redirect_uri in registered_uris:
        return True, ""

    # Different domain: WARNING
    warning = f"Warning: Redirect to different domain ({redirect_domain})"
    return True, warning  # Allow but warn user
```

**Residual Risk**: Low, user must approve redirect with warning.

#### Threat: Phishing via Malicious client_id

**Risk**: Attacker uses client_id of legitimate-looking domain (typosquatting).

**Mitigations**:
1. **Display Full URL**: Show complete client_id to user, not just app name
2. **Fetch Verification**: Verify client_id is fetchable (real domain)
3. **Subdomain Check**: Warn if client_id is subdomain of well-known domain
4. **Certificate Validation**: Verify SSL certificate validity
5. **User Education**: Inform users to verify client_id carefully

**UI Display**:
```
Sign in to:
  Application Name (if available)
  https://client.example.com  ← Full URL always displayed

Redirect to:
  https://client.example.com/callback
```

**Residual Risk**: Moderate, requires user vigilance.

#### Threat: URL Parameter Injection

**Risk**: Attacker injects malicious parameters via crafted URLs.

**Mitigations**:
1. **Pydantic Validation**: Use Pydantic models for all parameters
2. **Type Enforcement**: Strict type checking (str, not any)
3. **Allowlist Validation**: Only accept expected parameters
4. **SQL Parameterization**: Use parameterized queries (prevent SQL injection)
5. **HTML Encoding**: Encode all user input in HTML responses

**Pydantic Models**:
```python
from pydantic import BaseModel, HttpUrl, Field

class AuthorizeRequest(BaseModel):
    me: HttpUrl
    client_id: HttpUrl
    redirect_uri: HttpUrl
    state: str = Field(min_length=1, max_length=512)
    response_type: Literal["code"]
    scope: str = ""  # Optional, ignored in v1.0.0

    class Config:
        extra = "forbid"  # Reject unknown parameters
```

**Residual Risk**: Minimal, Pydantic provides strong validation.

### HTML Parsing Security (rel="me" Discovery)

#### Threat: Malicious HTML Injection

**Risk**: Attacker's site contains malicious HTML to exploit parser.

**Mitigations**:
1. **Robust Parser**: Use BeautifulSoup (handles malformed HTML safely)
2. **Link Extraction Only**: Only extract href attributes, no script execution
3. **Timeout**: 10-second timeout for HTTP requests
4. **Size Limit**: Limit response size (prevent memory exhaustion)
5. **HTTPS Required**: Fetch over TLS only
6. **Certificate Validation**: Verify SSL certificates

**Implementation**:
```python
from bs4 import BeautifulSoup
import requests

def discover_email_from_site(domain: str) -> Optional[str]:
    """
    Safely discover email from rel="me" link.
    """
    try:
        # Fetch with safety limits
        response = requests.get(
            f"https://{domain}",
            timeout=10,
            allow_redirects=True,
            max_redirects=5,
            stream=True  # Don't load entire response into memory
        )
        response.raise_for_status()

        # Limit response size (prevent memory exhaustion)
        MAX_SIZE = 5 * 1024 * 1024  # 5MB
        content = response.raw.read(MAX_SIZE)

        # Parse HTML (BeautifulSoup handles malformed HTML safely)
        soup = BeautifulSoup(content, 'html.parser')

        # Find rel="me" links (no script execution)
        me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')

        # Extract mailto: links only
        for link in me_links:
            href = link.get('href', '')
            if href.startswith('mailto:'):
                email = href.replace('mailto:', '').strip()
                # Validate email format before returning
                if validate_email_format(email):
                    return email

        return None

    except requests.exceptions.SSLError as e:
        logger.error(f"SSL certificate validation failed for {domain}: {e}")
        return None
    except Exception as e:
        logger.error(f"Failed to discover email for {domain}: {e}")
        return None
```

**Residual Risk**: Very low. BeautifulSoup is designed for untrusted HTML.

### Email Validation

#### Threat: Email Injection Attacks

**Risk**: Attacker crafts malicious email address in rel="me" link.

**Mitigations**:
1. **Format Validation**: Strict email regex (RFC 5322)
2. **No User Input**: Email discovered from site (not user-provided)
3. **SMTP Library**: Use well-tested library (smtplib)
4. **Content Encoding**: Encode email content properly
5. **Rate Limiting**: Prevent abuse

**Validation**:
```python
import re

def validate_email_format(email: str) -> bool:
    """
    Validate email address format.
    """
    # Basic format check (RFC 5322 simplified)
    email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    if not re.match(email_regex, email):
        return False

    # Sanity checks
    if len(email) > 254:  # RFC 5321 maximum
        return False
    if email.count('@') != 1:
        return False

    return True
```

**Note**: Domain matching is NOT enforced in v1.0.0. User may have email at different domain than their identity site (e.g., phil@gmail.com for phil.example.com). This is acceptable as user explicitly publishes the email on their site.

**Residual Risk**: Low, standard validation patterns.

## Network Security

### TLS/HTTPS Enforcement

**Production Requirements**:
- All endpoints MUST use HTTPS
- Minimum TLS 1.2 (prefer TLS 1.3)
- Strong cipher suites only
- Valid SSL certificate (not self-signed)

**Configuration**:
```python
# In production configuration
if not DEBUG:
    # Enforce HTTPS
    app.add_middleware(HTTPSRedirectMiddleware)

    # Add security headers
    app.add_middleware(
        SecureHeadersMiddleware,
        hsts="max-age=31536000; includeSubDomains",
        content_security_policy="default-src 'self'",
        x_frame_options="DENY",
        x_content_type_options="nosniff"
    )
```

**Development Exception**:
- HTTP allowed for `localhost` only
- Never in production

**Residual Risk**: Negligible if properly configured.

### Security Headers

**Required Headers**:

```http
# Prevent clickjacking
X-Frame-Options: DENY

# Prevent MIME sniffing
X-Content-Type-Options: nosniff

# XSS protection (legacy browsers)
X-XSS-Protection: 1; mode=block

# HSTS (HTTPS enforcement)
Strict-Transport-Security: max-age=31536000; includeSubDomains

# CSP (limit resource loading)
Content-Security-Policy: default-src 'self'; style-src 'self' 'unsafe-inline'

# Referrer policy (privacy)
Referrer-Policy: strict-origin-when-cross-origin
```

**Implementation**:
```python
@app.middleware("http")
async def add_security_headers(request: Request, call_next):
    response = await call_next(request)
    response.headers["X-Frame-Options"] = "DENY"
    response.headers["X-Content-Type-Options"] = "nosniff"
    response.headers["X-XSS-Protection"] = "1; mode=block"
    if not DEBUG:
        response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains"
    response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
    return response
```

## Data Security

### Data Minimization (Privacy)

**Principle**: Collect and store ONLY essential data.

**Stored Data**:
- ✅ Domain name (user identity, required)
- ✅ Token hashes (security, required)
- ✅ Client IDs (protocol, required)
- ✅ Timestamps (auditing, required)

**Never Stored**:
- ❌ Email addresses (after verification)
- ❌ Plaintext tokens
- ❌ User-Agent strings
- ❌ IP addresses (except rate limiting, temporary)
- ❌ Browsing history
- ❌ Personal information

**Email Handling**:
```python
# Email discovered from rel="me" link (not user-provided)
# Stored ONLY during verification (in-memory, 15-min TTL)
verification_codes[code_id] = {
    "email": email,  # ← Discovered from site, exists ONLY here, NEVER in database
    "code": code,
    "domain": domain,
    "expires_at": datetime.utcnow() + timedelta(minutes=15)
}

# After verification: email is deleted, only domain + timestamp stored
db.execute('''
    INSERT INTO domains (domain, verification_method, verified_at, last_email_check)
    VALUES (?, 'two_factor', ?, ?)
''', (domain, datetime.utcnow(), datetime.utcnow()))
# Note: NO email address in database, only verification timestamp
```

**rel="me" Discovery**:
- Email addresses are public (user publishes on their site)
- Server fetches email from user's site (not user input)
- Reduces social engineering risk (can't claim arbitrary email)
- Follows IndieWeb standards for identity

### Database Security

**SQLite Security**:
1. **File Permissions**: 600 (owner read/write only)
2. **Encryption at Rest**: Use encrypted filesystem (LUKS, dm-crypt)
3. **Backup Encryption**: Encrypt backup files (GPG)
4. **SQL Injection Prevention**: Parameterized queries only

**Parameterized Queries**:
```python
# GOOD: Parameterized (safe)
db.execute(
    "SELECT * FROM tokens WHERE token_hash = ?",
    (token_hash,)
)

# BAD: String interpolation (vulnerable)
db.execute(
    f"SELECT * FROM tokens WHERE token_hash = '{token_hash}'"
)  # ← NEVER DO THIS
```

**File Permissions**:
```bash
# Set restrictive permissions
chmod 600 /data/gondulf.db
chown gondulf:gondulf /data/gondulf.db
```

### Logging Security

**Principle**: Log security events, NEVER log sensitive data.

**Log Security Events**:
- ✅ Failed authentication attempts
- ✅ Authorization grants (domain + client_id)
- ✅ Token generation (hash prefix only)
- ✅ Email verification attempts
- ✅ DNS verification results
- ✅ Error conditions

**Never Log**:
- ❌ Email addresses (PII)
- ❌ Full access tokens
- ❌ Verification codes
- ❌ Authorization codes
- ❌ IP addresses (production)

**Safe Logging Examples**:
```python
# GOOD: Domain only (public information)
logger.info(f"Authorization granted for {domain} to {client_id}")

# GOOD: Token prefix for correlation
logger.debug(f"Token generated: {token[:8]}...")

# GOOD: Error without sensitive data
logger.error(f"Email send failed for domain {domain}")

# BAD: Email address (PII)
logger.info(f"Verification sent to {email}")  # ← NEVER

# BAD: Full token (security)
logger.debug(f"Token: {token}")  # ← NEVER
```

## Dependency Security

### Dependency Management

**Principles**:
1. **Minimal Dependencies**: Prefer standard library
2. **Vetted Libraries**: Only well-maintained, popular libraries
3. **Version Pinning**: Pin exact versions in requirements.txt
4. **Security Scanning**: Regular vulnerability scanning
5. **Update Strategy**: Security patches applied promptly

**Security Scanning**:
```bash
# Scan for known vulnerabilities
uv run pip-audit

# Alternative: safety check
uv run safety check
```

**Update Policy**:
- **Security patches**: Apply within 24 hours (critical), 7 days (high)
- **Minor versions**: Review and test before updating
- **Major versions**: Evaluate breaking changes, test thoroughly

### Secrets Management

**Environment Variables** (v1.0.0):
```bash
# Required secrets
GONDULF_SECRET_KEY=<256-bit random value>
GONDULF_SMTP_PASSWORD=<SMTP password>

# Optional secrets
GONDULF_DATABASE_ENCRYPTION_KEY=<for encrypted backups>
```

**Secret Generation**:
```bash
# Generate SECRET_KEY (256 bits)
python -c "import secrets; print(secrets.token_urlsafe(32))"
```

**Storage**:
- Development: `.env` file (not committed)
- Production: Docker secrets or environment variables
- Never hardcode secrets in code

**Future**: Integrate with HashiCorp Vault or AWS Secrets Manager.

## Rate Limiting (Future)

**v1.0.0**: Not implemented (acceptable for small deployments).

**Future Implementation**:

| Endpoint | Limit | Window | Key |
|----------|-------|--------|-----|
| /authorize | 10 requests | 1 minute | IP |
| /token | 30 requests | 1 minute | client_id |
| Email verification | 3 codes | 1 hour | email |
| Code submission | 3 attempts | 15 minutes | session |

**Implementation Strategy**:
- Use Redis for distributed rate limiting
- Token bucket algorithm
- Exponential backoff on failures

## Security Testing

### Required Security Tests

1. **Input Validation**:
   - Malformed URLs (me, client_id, redirect_uri)
   - SQL injection attempts
   - XSS attempts
   - Email injection

2. **Authentication**:
   - Expired code rejection
   - Used code rejection
   - Invalid code rejection
   - Brute force resistance

3. **Authorization**:
   - State parameter validation
   - Redirect URI validation
   - Open redirect prevention

4. **Token Security**:
   - Timing attack resistance
   - Token theft scenarios
   - Expiration enforcement

5. **TLS/HTTPS**:
   - HTTP rejection in production
   - Security headers presence
   - Certificate validation

### Security Scanning Tools

**Required Tools**:
- `bandit`: Python security linter
- `pip-audit`: Dependency vulnerability scanner
- `pytest`: Security-focused test cases

**CI/CD Integration**:
```yaml
# GitHub Actions example
security:
  - name: Run Bandit
    run: uv run bandit -r src/gondulf

  - name: Scan Dependencies
    run: uv run pip-audit

  - name: Run Security Tests
    run: uv run pytest tests/security/
```

## Incident Response

### Security Event Monitoring

**Monitor For**:
1. Multiple failed authentication attempts
2. Authorization code reuse attempts
3. Invalid token presentation
4. Unusual DNS verification failures
5. Email send failures (potential abuse)

**Alerting** (future):
- Admin email on critical events
- Webhook integration (Slack, Discord)
- Metrics dashboard (Grafana)

### Breach Response Plan (Future)

**If Access Tokens Compromised**:
1. Revoke all active tokens
2. Force re-authentication
3. Notify affected users (via domain)
4. Rotate SECRET_KEY
5. Audit logs for suspicious activity

**If Database Compromised**:
1. Assess data exposure (only hashes + domains)
2. Rotate all tokens
3. Review access logs
4. Notify users if domains exposed

## Compliance Considerations

### GDPR Compliance

**Personal Data Stored**:
- Domain names (considered PII in some jurisdictions)
- Timestamps (associated with domains)

**GDPR Rights**:
- **Right to Access**: Admin can query database
- **Right to Erasure**: Admin can delete domain records
- **Right to Portability**: Data export feature (future)

**Privacy Policy** (required):
- Document what data is collected (domains, timestamps)
- Document how data is used (authentication)
- Document retention policy (indefinite unless deleted)
- Provide contact for data requests

### Security Disclosure

**Security Policy** (future):
- Responsible disclosure process
- Security contact (security@domain)
- GPG key for encrypted reports
- Acknowledgments for researchers

## Security Roadmap

### v1.0.0 (MVP)
- ✅ Two-factor domain verification (DNS TXT + Email via rel="me")
- ✅ rel="me" email discovery (IndieWeb standard)
- ✅ HTML parsing security (BeautifulSoup)
- ✅ TLS/HTTPS enforcement
- ✅ Secure token generation (opaque, hashed)
- ✅ URL validation (open redirect prevention)
- ✅ Input validation (Pydantic)
- ✅ Security headers
- ✅ Minimal data collection (no email storage)

### v1.1.0
- PKCE support (code challenge/verifier)
- Rate limiting (Redis-based)
- Token revocation endpoint
- Enhanced logging

### v1.2.0
- WebAuthn support (passwordless)
- Hardware security key support
- Admin dashboard (audit logs)
- Security metrics

### v2.0.0
- Multi-factor authentication
- Federated identity providers
- Advanced threat detection
- SOC 2 compliance preparation

## References

- OWASP Top 10: https://owasp.org/www-project-top-ten/
- OAuth 2.0 Security Best Practices: https://datatracker.ietf.org/doc/html/draft-ietf-oauth-security-topics
- NIST Cybersecurity Framework: https://www.nist.gov/cyberframework
- CWE Top 25: https://cwe.mitre.org/top25/