Files
Gondulf/docs/architecture/security.md
Phil Skentelbery 6f06aebf40 docs: add Phase 2 domain verification design and clarifications
Add comprehensive Phase 2 documentation:
- Complete design document for two-factor domain verification
- Implementation guide with code examples
- ADR for implementation decisions (ADR-0004)
- ADR for rel="me" email discovery (ADR-008)
- Phase 1 impact assessment
- All 23 clarification questions answered
- Updated architecture docs (indieauth-protocol, security)
- Updated ADR-005 with rel="me" approach
- Updated backlog with technical debt items

Design ready for Phase 2 implementation.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 13:05:09 -07:00

1001 lines
29 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Security Architecture
## Security Philosophy
Gondulf follows a defense-in-depth security model with these core principles:
1. **Secure by Default**: Security features enabled out of the box
2. **Fail Securely**: Errors default to denying access, not granting it
3. **Least Privilege**: Collect and store minimum necessary data
4. **Transparency**: Security decisions documented and auditable
5. **Standards Compliance**: Follow OAuth 2.0 and IndieAuth security best practices
## Threat Model
### Assets to Protect
**Primary Assets**:
- User domain identities (the `me` parameter)
- Access tokens (prove user identity to clients)
- Authorization codes (short-lived, exchange for tokens)
**Secondary Assets**:
- Email verification codes (prove email ownership)
- Domain verification status (cached TXT record checks)
- Client metadata (cached application information)
**Explicitly NOT Protected** (by design):
- Passwords (none stored)
- Personal user data beyond domain (privacy principle)
- Client secrets (OAuth 2.0 public clients)
### Threat Actors
**External Attackers**:
- Phishing attempts (fake clients)
- Token theft (network interception)
- Open redirect exploitation
- CSRF attacks
- Brute force attacks (code guessing)
**Compromised Clients**:
- Malicious client applications
- Client impersonation
- Redirect URI manipulation
**System Compromise**:
- Database access (SQLite file theft)
- Server memory access (in-memory code theft)
- Log file access (token exposure)
### Out of Scope (v1.0.0)
- DDoS attacks (handled by infrastructure)
- Zero-day vulnerabilities in dependencies
- Physical access to server
- Social engineering attacks on users
- DNS hijacking (external to application)
## Authentication Security
### Two-Factor Domain Verification (v1.0.0)
**Mechanism**: Users prove domain ownership through TWO independent factors:
1. **DNS TXT Record**: Proves DNS control (`_gondulf.{domain}` = `verified`)
2. **Email via rel="me"**: Proves email control (discovered from site's rel="me" link)
**Security Model**: An attacker must compromise BOTH factors to authenticate fraudulently. This is significantly stronger than single-factor verification.
#### Threat: Email Interception
**Risk**: Attacker intercepts email containing verification code.
**Mitigations**:
1. **Two-Factor Requirement**: Email alone is insufficient (DNS also required)
2. **Short Code Lifetime**: 15-minute expiration
3. **Single Use**: Code invalidated after verification
4. **Rate Limiting**: Max 3 code requests per domain per hour
5. **TLS Email Delivery**: Require STARTTLS for SMTP
6. **Display Warning**: "Only request code if you initiated this login"
**Residual Risk**: Low. Even with email interception, attacker still needs DNS control.
#### Threat: Code Brute Force
**Risk**: Attacker guesses 6-digit verification code.
**Mitigations**:
1. **Two-Factor Requirement**: Code alone is insufficient (DNS also required)
2. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
3. **Attempt Limiting**: Max 3 attempts per email
4. **Short Lifetime**: 15-minute window
5. **Rate Limiting**: Max 3 codes per domain per hour
6. **Single-Use**: Code invalidated after use
**Math**:
- 3 attempts × 1,000,000 codes = 0.0003% success probability
- 15-minute window limits attack time
- Even if guessed, attacker still needs DNS control
**Residual Risk**: Very low. Two-factor requirement makes brute force insufficient.
#### Threat: DNS TXT Record Spoofing
**Risk**: Attacker attempts to spoof DNS responses.
**Mitigations**:
1. **Multiple Resolvers**: Query 2+ independent DNS servers (Google, Cloudflare)
2. **Consensus Required**: Require agreement from at least 2 resolvers
3. **DNSSEC Support**: Validate DNSSEC signatures when available (future)
4. **Timeout Handling**: Fail securely if DNS unavailable
5. **Logging**: Log all DNS verification attempts
**Residual Risk**: Low. Spoofing multiple independent resolvers is difficult.
#### Threat: rel="me" Link Spoofing
**Risk**: Attacker compromises user's website to add malicious rel="me" link.
**Mitigations**:
1. **Two-Factor Requirement**: Website compromise alone insufficient (DNS also required)
2. **HTTPS Required**: Fetch site over TLS (prevents MITM)
3. **Certificate Validation**: Verify SSL certificate
4. **Email Domain Matching**: Email should match site domain (warning if not)
5. **User Education**: Inform users to secure their website
**Residual Risk**: Moderate. If attacker compromises both DNS and website, they can authenticate. This is acceptable as it represents full domain compromise.
#### Threat: Email Address Enumeration
**Risk**: Attacker discovers email addresses by triggering rel="me" discovery.
**Mitigations**:
1. **Public Information**: rel="me" links are intentionally public
2. **User Awareness**: Users know they're publishing email on their site
3. **Rate Limiting**: Prevent bulk scanning
4. **Robots.txt**: Users can restrict crawler access if desired
**Residual Risk**: Minimal. Email addresses are intentionally published by users on their own sites.
### Domain Ownership Verification (Two-Factor)
**Mechanism**: v1.0.0 requires BOTH verification methods:
#### 1. TXT Record Validation (Required)
**Mechanism**: Admin adds DNS TXT record `_gondulf.{domain}` = `verified`.
**Security Properties**:
- Proves DNS control (first factor)
- Verifiable without user interaction
- Cacheable for performance
- Re-verifiable periodically
**Implementation**:
```python
import dns.resolver
def verify_txt_record(domain: str) -> bool:
"""
Verify _gondulf.{domain} TXT record exists with value 'verified'.
Requires consensus from multiple independent resolvers.
"""
try:
# Use Google and Cloudflare DNS for redundancy
resolvers = ['8.8.8.8', '1.1.1.1']
verified_count = 0
for resolver_ip in resolvers:
resolver = dns.resolver.Resolver()
resolver.nameservers = [resolver_ip]
resolver.timeout = 5
answers = resolver.resolve(f'_gondulf.{domain}', 'TXT')
for rdata in answers:
txt_value = rdata.to_text().strip('"')
if txt_value == 'verified':
verified_count += 1
break
# Require consensus from at least 2 resolvers
return verified_count >= 2
except Exception as e:
logger.warning(f"DNS verification failed for {domain}: {e}")
return False
```
#### 2. Email Verification via rel="me" (Required)
**Mechanism**: Email discovered from site's `<link rel="me" href="mailto:...">`, then verified with code.
**Security Properties**:
- Proves website control (can modify HTML)
- Proves email control (receives and enters code)
- Follows IndieWeb standards (rel="me")
- Self-documenting (user declares email publicly)
**Implementation**:
```python
from bs4 import BeautifulSoup
import requests
def discover_email_from_site(domain: str) -> Optional[str]:
"""
Fetch site and discover email from rel="me" link.
"""
try:
response = requests.get(f"https://{domain}", timeout=10, allow_redirects=True)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
if validate_email_format(email):
return email
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
```
**Combined Residual Risk**: Low. Attacker must compromise DNS, website, and email account to authenticate fraudulently.
## Authorization Security
### Authorization Code Security
**Properties**:
- **Length**: 32 bytes (256 bits of entropy)
- **Generation**: `secrets.token_urlsafe(32)` (cryptographically secure)
- **Lifetime**: 10 minutes maximum (per W3C spec)
- **Single-Use**: Invalidated immediately after exchange
- **Binding**: Tied to client_id, redirect_uri, me
#### Threat: Authorization Code Interception
**Risk**: Attacker intercepts code from redirect URL.
**Mitigations (v1.0.0)**:
1. **HTTPS Only**: Enforce TLS for all communications
2. **Short Lifetime**: 10-minute expiration
3. **Single Use**: Code invalidated after first use
4. **State Binding**: Client validates state parameter (CSRF protection)
**Mitigations (Future - PKCE)**:
1. **Code Challenge**: Client sends hash of secret with auth request
2. **Code Verifier**: Client proves knowledge of secret on token exchange
3. **No Interception Value**: Code useless without original secret
**ADR-003 Decision**: PKCE deferred to v1.1.0 to maintain MVP simplicity.
**Residual Risk**: Low with HTTPS + short lifetime, minimal with PKCE (future).
#### Threat: Code Replay Attack
**Risk**: Attacker reuses previously valid authorization code.
**Mitigations**:
1. **Single-Use Enforcement**: Mark code as used in storage
2. **Immediate Invalidation**: Delete code after exchange
3. **Concurrent Use Detection**: Log warning if used code presented again
**Implementation**:
```python
def exchange_code(code: str) -> Optional[dict]:
"""
Exchange authorization code for token.
Returns None if code invalid, expired, or already used.
"""
# Retrieve code data
code_data = code_storage.get(code)
if not code_data:
logger.warning("Code not found or expired")
return None
# Check if already used
if code_data.get('used'):
logger.error(f"Code replay attack detected: {code[:8]}...")
# SECURITY: Potential replay attack, alert admin
return None
# Mark as used IMMEDIATELY (before token generation)
code_data['used'] = True
code_storage.set(code, code_data)
# Generate token
return generate_token(code_data)
```
**Residual Risk**: Negligible.
### Access Token Security
**Properties**:
- **Format**: Opaque tokens (v1.0.0), not JWT
- **Length**: 32 bytes (256 bits of entropy)
- **Generation**: `secrets.token_urlsafe(32)`
- **Storage**: SHA-256 hash only (never plaintext)
- **Lifetime**: 1 hour default (configurable)
- **Transmission**: HTTPS only, Bearer authentication
#### Threat: Token Theft
**Risk**: Attacker steals access token from storage or transmission.
**Mitigations**:
1. **TLS Enforcement**: HTTPS only in production
2. **Hashed Storage**: Store SHA-256 hash, not plaintext
3. **Short Lifetime**: 1-hour expiration (configurable)
4. **Revocation**: Admin can revoke tokens (future)
5. **Secure Headers**: Set Cache-Control: no-store, Pragma: no-cache
**Token Storage**:
```python
import hashlib
import secrets
def generate_token(me: str, client_id: str) -> str:
"""
Generate access token and store hash in database.
"""
# Generate token (returned to client, never stored)
token = secrets.token_urlsafe(32)
# Store only hash (irreversible)
token_hash = hashlib.sha256(token.encode()).hexdigest()
db.execute('''
INSERT INTO tokens (token_hash, me, client_id, scope, issued_at, expires_at)
VALUES (?, ?, ?, ?, ?, ?)
''', (token_hash, me, client_id, "", datetime.utcnow(), expires_at))
return token
```
**Residual Risk**: Low, tokens useless if hashing is secure.
#### Threat: Timing Attacks on Token Verification
**Risk**: Attacker uses timing differences to guess valid tokens character-by-character.
**Mitigations**:
1. **Constant-Time Comparison**: Use `secrets.compare_digest()`
2. **Hash Comparison**: Compare hashes, not tokens
3. **Logging Delays**: Random delay on failed validation
**Implementation**:
```python
import secrets
import hashlib
def verify_token(provided_token: str) -> Optional[dict]:
"""
Verify access token using constant-time comparison.
"""
# Hash provided token
provided_hash = hashlib.sha256(provided_token.encode()).hexdigest()
# Lookup in database
token_data = db.query_one('''
SELECT me, client_id, scope, expires_at, revoked
FROM tokens
WHERE token_hash = ?
''', (provided_hash,))
if not token_data:
return None
# Constant-time comparison (even though we use SQL =, hash mismatch protection)
# The comparison happens in SQL, but we add extra layer here
if not secrets.compare_digest(provided_hash, provided_hash):
# This always passes, but ensures constant-time code path
pass
# Check expiration
if datetime.utcnow() > token_data['expires_at']:
return None
# Check revocation
if token_data.get('revoked'):
return None
return token_data
```
**Residual Risk**: Negligible.
## Input Validation
### URL Validation Security
**Critical**: Improper URL validation enables phishing and open redirect attacks.
#### Threat: Open Redirect via redirect_uri
**Risk**: Attacker tricks user into authorizing malicious redirect_uri, steals authorization code.
**Mitigations**:
1. **Domain Matching**: Require redirect_uri domain match client_id domain
2. **Subdomain Validation**: Allow subdomains of client_id domain
3. **Registered URIs**: Future feature to pre-register alternate domains
4. **User Warning**: Display warning if domains differ
5. **HTTPS Enforcement**: Require HTTPS for non-localhost
**Validation Logic**:
```python
def validate_redirect_uri(redirect_uri: str, client_id: str, registered_uris: list) -> tuple[bool, str]:
"""
Validate redirect_uri against client_id.
Returns (is_valid, warning_message).
"""
redirect_parsed = urlparse(redirect_uri)
client_parsed = urlparse(client_id)
# Must be HTTPS (except localhost)
if redirect_parsed.hostname != 'localhost':
if redirect_parsed.scheme != 'https':
return False, "redirect_uri must use HTTPS"
redirect_domain = redirect_parsed.hostname.lower()
client_domain = client_parsed.hostname.lower()
# Exact match: OK
if redirect_domain == client_domain:
return True, ""
# Subdomain: OK
if redirect_domain.endswith('.' + client_domain):
return True, ""
# Registered URI: OK (future)
if redirect_uri in registered_uris:
return True, ""
# Different domain: WARNING
warning = f"Warning: Redirect to different domain ({redirect_domain})"
return True, warning # Allow but warn user
```
**Residual Risk**: Low, user must approve redirect with warning.
#### Threat: Phishing via Malicious client_id
**Risk**: Attacker uses client_id of legitimate-looking domain (typosquatting).
**Mitigations**:
1. **Display Full URL**: Show complete client_id to user, not just app name
2. **Fetch Verification**: Verify client_id is fetchable (real domain)
3. **Subdomain Check**: Warn if client_id is subdomain of well-known domain
4. **Certificate Validation**: Verify SSL certificate validity
5. **User Education**: Inform users to verify client_id carefully
**UI Display**:
```
Sign in to:
Application Name (if available)
https://client.example.com ← Full URL always displayed
Redirect to:
https://client.example.com/callback
```
**Residual Risk**: Moderate, requires user vigilance.
#### Threat: URL Parameter Injection
**Risk**: Attacker injects malicious parameters via crafted URLs.
**Mitigations**:
1. **Pydantic Validation**: Use Pydantic models for all parameters
2. **Type Enforcement**: Strict type checking (str, not any)
3. **Allowlist Validation**: Only accept expected parameters
4. **SQL Parameterization**: Use parameterized queries (prevent SQL injection)
5. **HTML Encoding**: Encode all user input in HTML responses
**Pydantic Models**:
```python
from pydantic import BaseModel, HttpUrl, Field
class AuthorizeRequest(BaseModel):
me: HttpUrl
client_id: HttpUrl
redirect_uri: HttpUrl
state: str = Field(min_length=1, max_length=512)
response_type: Literal["code"]
scope: str = "" # Optional, ignored in v1.0.0
class Config:
extra = "forbid" # Reject unknown parameters
```
**Residual Risk**: Minimal, Pydantic provides strong validation.
### HTML Parsing Security (rel="me" Discovery)
#### Threat: Malicious HTML Injection
**Risk**: Attacker's site contains malicious HTML to exploit parser.
**Mitigations**:
1. **Robust Parser**: Use BeautifulSoup (handles malformed HTML safely)
2. **Link Extraction Only**: Only extract href attributes, no script execution
3. **Timeout**: 10-second timeout for HTTP requests
4. **Size Limit**: Limit response size (prevent memory exhaustion)
5. **HTTPS Required**: Fetch over TLS only
6. **Certificate Validation**: Verify SSL certificates
**Implementation**:
```python
from bs4 import BeautifulSoup
import requests
def discover_email_from_site(domain: str) -> Optional[str]:
"""
Safely discover email from rel="me" link.
"""
try:
# Fetch with safety limits
response = requests.get(
f"https://{domain}",
timeout=10,
allow_redirects=True,
max_redirects=5,
stream=True # Don't load entire response into memory
)
response.raise_for_status()
# Limit response size (prevent memory exhaustion)
MAX_SIZE = 5 * 1024 * 1024 # 5MB
content = response.raw.read(MAX_SIZE)
# Parse HTML (BeautifulSoup handles malformed HTML safely)
soup = BeautifulSoup(content, 'html.parser')
# Find rel="me" links (no script execution)
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Extract mailto: links only
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format before returning
if validate_email_format(email):
return email
return None
except requests.exceptions.SSLError as e:
logger.error(f"SSL certificate validation failed for {domain}: {e}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
```
**Residual Risk**: Very low. BeautifulSoup is designed for untrusted HTML.
### Email Validation
#### Threat: Email Injection Attacks
**Risk**: Attacker crafts malicious email address in rel="me" link.
**Mitigations**:
1. **Format Validation**: Strict email regex (RFC 5322)
2. **No User Input**: Email discovered from site (not user-provided)
3. **SMTP Library**: Use well-tested library (smtplib)
4. **Content Encoding**: Encode email content properly
5. **Rate Limiting**: Prevent abuse
**Validation**:
```python
import re
def validate_email_format(email: str) -> bool:
"""
Validate email address format.
"""
# Basic format check (RFC 5322 simplified)
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, email):
return False
# Sanity checks
if len(email) > 254: # RFC 5321 maximum
return False
if email.count('@') != 1:
return False
return True
```
**Note**: Domain matching is NOT enforced in v1.0.0. User may have email at different domain than their identity site (e.g., phil@gmail.com for phil.example.com). This is acceptable as user explicitly publishes the email on their site.
**Residual Risk**: Low, standard validation patterns.
## Network Security
### TLS/HTTPS Enforcement
**Production Requirements**:
- All endpoints MUST use HTTPS
- Minimum TLS 1.2 (prefer TLS 1.3)
- Strong cipher suites only
- Valid SSL certificate (not self-signed)
**Configuration**:
```python
# In production configuration
if not DEBUG:
# Enforce HTTPS
app.add_middleware(HTTPSRedirectMiddleware)
# Add security headers
app.add_middleware(
SecureHeadersMiddleware,
hsts="max-age=31536000; includeSubDomains",
content_security_policy="default-src 'self'",
x_frame_options="DENY",
x_content_type_options="nosniff"
)
```
**Development Exception**:
- HTTP allowed for `localhost` only
- Never in production
**Residual Risk**: Negligible if properly configured.
### Security Headers
**Required Headers**:
```http
# Prevent clickjacking
X-Frame-Options: DENY
# Prevent MIME sniffing
X-Content-Type-Options: nosniff
# XSS protection (legacy browsers)
X-XSS-Protection: 1; mode=block
# HSTS (HTTPS enforcement)
Strict-Transport-Security: max-age=31536000; includeSubDomains
# CSP (limit resource loading)
Content-Security-Policy: default-src 'self'; style-src 'self' 'unsafe-inline'
# Referrer policy (privacy)
Referrer-Policy: strict-origin-when-cross-origin
```
**Implementation**:
```python
@app.middleware("http")
async def add_security_headers(request: Request, call_next):
response = await call_next(request)
response.headers["X-Frame-Options"] = "DENY"
response.headers["X-Content-Type-Options"] = "nosniff"
response.headers["X-XSS-Protection"] = "1; mode=block"
if not DEBUG:
response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains"
response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
return response
```
## Data Security
### Data Minimization (Privacy)
**Principle**: Collect and store ONLY essential data.
**Stored Data**:
- ✅ Domain name (user identity, required)
- ✅ Token hashes (security, required)
- ✅ Client IDs (protocol, required)
- ✅ Timestamps (auditing, required)
**Never Stored**:
- ❌ Email addresses (after verification)
- ❌ Plaintext tokens
- ❌ User-Agent strings
- ❌ IP addresses (except rate limiting, temporary)
- ❌ Browsing history
- ❌ Personal information
**Email Handling**:
```python
# Email discovered from rel="me" link (not user-provided)
# Stored ONLY during verification (in-memory, 15-min TTL)
verification_codes[code_id] = {
"email": email, # ← Discovered from site, exists ONLY here, NEVER in database
"code": code,
"domain": domain,
"expires_at": datetime.utcnow() + timedelta(minutes=15)
}
# After verification: email is deleted, only domain + timestamp stored
db.execute('''
INSERT INTO domains (domain, verification_method, verified_at, last_email_check)
VALUES (?, 'two_factor', ?, ?)
''', (domain, datetime.utcnow(), datetime.utcnow()))
# Note: NO email address in database, only verification timestamp
```
**rel="me" Discovery**:
- Email addresses are public (user publishes on their site)
- Server fetches email from user's site (not user input)
- Reduces social engineering risk (can't claim arbitrary email)
- Follows IndieWeb standards for identity
### Database Security
**SQLite Security**:
1. **File Permissions**: 600 (owner read/write only)
2. **Encryption at Rest**: Use encrypted filesystem (LUKS, dm-crypt)
3. **Backup Encryption**: Encrypt backup files (GPG)
4. **SQL Injection Prevention**: Parameterized queries only
**Parameterized Queries**:
```python
# GOOD: Parameterized (safe)
db.execute(
"SELECT * FROM tokens WHERE token_hash = ?",
(token_hash,)
)
# BAD: String interpolation (vulnerable)
db.execute(
f"SELECT * FROM tokens WHERE token_hash = '{token_hash}'"
) # ← NEVER DO THIS
```
**File Permissions**:
```bash
# Set restrictive permissions
chmod 600 /data/gondulf.db
chown gondulf:gondulf /data/gondulf.db
```
### Logging Security
**Principle**: Log security events, NEVER log sensitive data.
**Log Security Events**:
- ✅ Failed authentication attempts
- ✅ Authorization grants (domain + client_id)
- ✅ Token generation (hash prefix only)
- ✅ Email verification attempts
- ✅ DNS verification results
- ✅ Error conditions
**Never Log**:
- ❌ Email addresses (PII)
- ❌ Full access tokens
- ❌ Verification codes
- ❌ Authorization codes
- ❌ IP addresses (production)
**Safe Logging Examples**:
```python
# GOOD: Domain only (public information)
logger.info(f"Authorization granted for {domain} to {client_id}")
# GOOD: Token prefix for correlation
logger.debug(f"Token generated: {token[:8]}...")
# GOOD: Error without sensitive data
logger.error(f"Email send failed for domain {domain}")
# BAD: Email address (PII)
logger.info(f"Verification sent to {email}") # ← NEVER
# BAD: Full token (security)
logger.debug(f"Token: {token}") # ← NEVER
```
## Dependency Security
### Dependency Management
**Principles**:
1. **Minimal Dependencies**: Prefer standard library
2. **Vetted Libraries**: Only well-maintained, popular libraries
3. **Version Pinning**: Pin exact versions in requirements.txt
4. **Security Scanning**: Regular vulnerability scanning
5. **Update Strategy**: Security patches applied promptly
**Security Scanning**:
```bash
# Scan for known vulnerabilities
uv run pip-audit
# Alternative: safety check
uv run safety check
```
**Update Policy**:
- **Security patches**: Apply within 24 hours (critical), 7 days (high)
- **Minor versions**: Review and test before updating
- **Major versions**: Evaluate breaking changes, test thoroughly
### Secrets Management
**Environment Variables** (v1.0.0):
```bash
# Required secrets
GONDULF_SECRET_KEY=<256-bit random value>
GONDULF_SMTP_PASSWORD=<SMTP password>
# Optional secrets
GONDULF_DATABASE_ENCRYPTION_KEY=<for encrypted backups>
```
**Secret Generation**:
```bash
# Generate SECRET_KEY (256 bits)
python -c "import secrets; print(secrets.token_urlsafe(32))"
```
**Storage**:
- Development: `.env` file (not committed)
- Production: Docker secrets or environment variables
- Never hardcode secrets in code
**Future**: Integrate with HashiCorp Vault or AWS Secrets Manager.
## Rate Limiting (Future)
**v1.0.0**: Not implemented (acceptable for small deployments).
**Future Implementation**:
| Endpoint | Limit | Window | Key |
|----------|-------|--------|-----|
| /authorize | 10 requests | 1 minute | IP |
| /token | 30 requests | 1 minute | client_id |
| Email verification | 3 codes | 1 hour | email |
| Code submission | 3 attempts | 15 minutes | session |
**Implementation Strategy**:
- Use Redis for distributed rate limiting
- Token bucket algorithm
- Exponential backoff on failures
## Security Testing
### Required Security Tests
1. **Input Validation**:
- Malformed URLs (me, client_id, redirect_uri)
- SQL injection attempts
- XSS attempts
- Email injection
2. **Authentication**:
- Expired code rejection
- Used code rejection
- Invalid code rejection
- Brute force resistance
3. **Authorization**:
- State parameter validation
- Redirect URI validation
- Open redirect prevention
4. **Token Security**:
- Timing attack resistance
- Token theft scenarios
- Expiration enforcement
5. **TLS/HTTPS**:
- HTTP rejection in production
- Security headers presence
- Certificate validation
### Security Scanning Tools
**Required Tools**:
- `bandit`: Python security linter
- `pip-audit`: Dependency vulnerability scanner
- `pytest`: Security-focused test cases
**CI/CD Integration**:
```yaml
# GitHub Actions example
security:
- name: Run Bandit
run: uv run bandit -r src/gondulf
- name: Scan Dependencies
run: uv run pip-audit
- name: Run Security Tests
run: uv run pytest tests/security/
```
## Incident Response
### Security Event Monitoring
**Monitor For**:
1. Multiple failed authentication attempts
2. Authorization code reuse attempts
3. Invalid token presentation
4. Unusual DNS verification failures
5. Email send failures (potential abuse)
**Alerting** (future):
- Admin email on critical events
- Webhook integration (Slack, Discord)
- Metrics dashboard (Grafana)
### Breach Response Plan (Future)
**If Access Tokens Compromised**:
1. Revoke all active tokens
2. Force re-authentication
3. Notify affected users (via domain)
4. Rotate SECRET_KEY
5. Audit logs for suspicious activity
**If Database Compromised**:
1. Assess data exposure (only hashes + domains)
2. Rotate all tokens
3. Review access logs
4. Notify users if domains exposed
## Compliance Considerations
### GDPR Compliance
**Personal Data Stored**:
- Domain names (considered PII in some jurisdictions)
- Timestamps (associated with domains)
**GDPR Rights**:
- **Right to Access**: Admin can query database
- **Right to Erasure**: Admin can delete domain records
- **Right to Portability**: Data export feature (future)
**Privacy Policy** (required):
- Document what data is collected (domains, timestamps)
- Document how data is used (authentication)
- Document retention policy (indefinite unless deleted)
- Provide contact for data requests
### Security Disclosure
**Security Policy** (future):
- Responsible disclosure process
- Security contact (security@domain)
- GPG key for encrypted reports
- Acknowledgments for researchers
## Security Roadmap
### v1.0.0 (MVP)
- ✅ Two-factor domain verification (DNS TXT + Email via rel="me")
- ✅ rel="me" email discovery (IndieWeb standard)
- ✅ HTML parsing security (BeautifulSoup)
- ✅ TLS/HTTPS enforcement
- ✅ Secure token generation (opaque, hashed)
- ✅ URL validation (open redirect prevention)
- ✅ Input validation (Pydantic)
- ✅ Security headers
- ✅ Minimal data collection (no email storage)
### v1.1.0
- PKCE support (code challenge/verifier)
- Rate limiting (Redis-based)
- Token revocation endpoint
- Enhanced logging
### v1.2.0
- WebAuthn support (passwordless)
- Hardware security key support
- Admin dashboard (audit logs)
- Security metrics
### v2.0.0
- Multi-factor authentication
- Federated identity providers
- Advanced threat detection
- SOC 2 compliance preparation
## References
- OWASP Top 10: https://owasp.org/www-project-top-ten/
- OAuth 2.0 Security Best Practices: https://datatracker.ietf.org/doc/html/draft-ietf-oauth-security-topics
- NIST Cybersecurity Framework: https://www.nist.gov/cyberframework
- CWE Top 25: https://cwe.mitre.org/top25/