docs: add Phase 2 domain verification design and clarifications

Add comprehensive Phase 2 documentation:
- Complete design document for two-factor domain verification
- Implementation guide with code examples
- ADR for implementation decisions (ADR-0004)
- ADR for rel="me" email discovery (ADR-008)
- Phase 1 impact assessment
- All 23 clarification questions answered
- Updated architecture docs (indieauth-protocol, security)
- Updated ADR-005 with rel="me" approach
- Updated backlog with technical debt items

Design ready for Phase 2 implementation.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-11-20 13:05:09 -07:00
parent bebd47955f
commit 6f06aebf40
10 changed files with 5605 additions and 410 deletions

View File

@@ -58,108 +58,174 @@ Gondulf follows a defense-in-depth security model with these core principles:
## Authentication Security
### Email-Based Verification (v1.0.0)
### Two-Factor Domain Verification (v1.0.0)
**Mechanism**: Users prove domain ownership by receiving verification code at email address on that domain.
**Mechanism**: Users prove domain ownership through TWO independent factors:
1. **DNS TXT Record**: Proves DNS control (`_gondulf.{domain}` = `verified`)
2. **Email via rel="me"**: Proves email control (discovered from site's rel="me" link)
**Security Model**: An attacker must compromise BOTH factors to authenticate fraudulently. This is significantly stronger than single-factor verification.
#### Threat: Email Interception
**Risk**: Attacker intercepts email containing verification code.
**Mitigations**:
1. **Short Code Lifetime**: 15-minute expiration
2. **Single Use**: Code invalidated after verification
3. **Rate Limiting**: Max 3 code requests per email per hour
4. **TLS Email Delivery**: Require STARTTLS for SMTP
5. **Display Warning**: "Only request code if you initiated this login"
1. **Two-Factor Requirement**: Email alone is insufficient (DNS also required)
2. **Short Code Lifetime**: 15-minute expiration
3. **Single Use**: Code invalidated after verification
4. **Rate Limiting**: Max 3 code requests per domain per hour
5. **TLS Email Delivery**: Require STARTTLS for SMTP
6. **Display Warning**: "Only request code if you initiated this login"
**Residual Risk**: Acceptable for v1.0.0 given short lifetime and single-use.
**Residual Risk**: Low. Even with email interception, attacker still needs DNS control.
#### Threat: Code Brute Force
**Risk**: Attacker guesses 6-digit verification code.
**Mitigations**:
1. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
2. **Attempt Limiting**: Max 3 attempts per email
3. **Short Lifetime**: 15-minute window
4. **Rate Limiting**: Max 10 attempts per IP per hour
5. **Exponential Backoff**: 5-second delay after each failed attempt
1. **Two-Factor Requirement**: Code alone is insufficient (DNS also required)
2. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
3. **Attempt Limiting**: Max 3 attempts per email
4. **Short Lifetime**: 15-minute window
5. **Rate Limiting**: Max 3 codes per domain per hour
6. **Single-Use**: Code invalidated after use
**Math**:
- 3 attempts × 1,000,000 codes = 0.0003% success probability
- 15-minute window limits attack time
- Rate limiting prevents distributed guessing
- Even if guessed, attacker still needs DNS control
**Residual Risk**: Very low, acceptable for v1.0.0.
**Residual Risk**: Very low. Two-factor requirement makes brute force insufficient.
#### Threat: DNS TXT Record Spoofing
**Risk**: Attacker attempts to spoof DNS responses.
**Mitigations**:
1. **Multiple Resolvers**: Query 2+ independent DNS servers (Google, Cloudflare)
2. **Consensus Required**: Require agreement from at least 2 resolvers
3. **DNSSEC Support**: Validate DNSSEC signatures when available (future)
4. **Timeout Handling**: Fail securely if DNS unavailable
5. **Logging**: Log all DNS verification attempts
**Residual Risk**: Low. Spoofing multiple independent resolvers is difficult.
#### Threat: rel="me" Link Spoofing
**Risk**: Attacker compromises user's website to add malicious rel="me" link.
**Mitigations**:
1. **Two-Factor Requirement**: Website compromise alone insufficient (DNS also required)
2. **HTTPS Required**: Fetch site over TLS (prevents MITM)
3. **Certificate Validation**: Verify SSL certificate
4. **Email Domain Matching**: Email should match site domain (warning if not)
5. **User Education**: Inform users to secure their website
**Residual Risk**: Moderate. If attacker compromises both DNS and website, they can authenticate. This is acceptable as it represents full domain compromise.
#### Threat: Email Address Enumeration
**Risk**: Attacker discovers which domains are registered by requesting codes.
**Risk**: Attacker discovers email addresses by triggering rel="me" discovery.
**Mitigations**:
1. **Consistent Response**: Always say "If email exists, code sent"
2. **No Error Differentiation**: Same message for valid/invalid emails
3. **Rate Limiting**: Prevent bulk enumeration
1. **Public Information**: rel="me" links are intentionally public
2. **User Awareness**: Users know they're publishing email on their site
3. **Rate Limiting**: Prevent bulk scanning
4. **Robots.txt**: Users can restrict crawler access if desired
**Residual Risk**: Minimal, domain names are public anyway (DNS).
**Residual Risk**: Minimal. Email addresses are intentionally published by users on their own sites.
### Domain Ownership Verification
### Domain Ownership Verification (Two-Factor)
#### TXT Record Validation (Preferred)
**Mechanism**: v1.0.0 requires BOTH verification methods:
**Mechanism**: Admin adds DNS TXT record `_gondulf.example.com` = `verified`.
#### 1. TXT Record Validation (Required)
**Mechanism**: Admin adds DNS TXT record `_gondulf.{domain}` = `verified`.
**Security Properties**:
- Requires DNS control (stronger than email)
- Proves DNS control (first factor)
- Verifiable without user interaction
- Cacheable for performance
- Re-verifiable periodically
**Threat: DNS Spoofing**
**Mitigations**:
1. **DNSSEC**: Validate DNSSEC signatures if available
2. **Multiple Resolvers**: Query 2+ DNS servers, require consensus
3. **Caching**: Cache valid results, re-verify daily
4. **Logging**: Log all DNS verification attempts
**Implementation**:
```python
import dns.resolver
import dns.dnssec
def verify_txt_record(domain: str) -> bool:
"""
Verify _gondulf.{domain} TXT record exists with value 'verified'.
Requires consensus from multiple independent resolvers.
"""
try:
# Use Google and Cloudflare DNS for redundancy
resolvers = ['8.8.8.8', '1.1.1.1']
results = []
verified_count = 0
for resolver_ip in resolvers:
resolver = dns.resolver.Resolver()
resolver.nameservers = [resolver_ip]
resolver.timeout = 5
resolver.lifetime = 5
answers = resolver.resolve(f'_gondulf.{domain}', 'TXT')
for rdata in answers:
txt_value = rdata.to_text().strip('"')
if txt_value == 'verified':
results.append(True)
verified_count += 1
break
# Require consensus from both resolvers
return len(results) >= 2
# Require consensus from at least 2 resolvers
return verified_count >= 2
except Exception as e:
logger.warning(f"DNS verification failed for {domain}: {e}")
return False
```
**Residual Risk**: Low, DNS is foundational internet infrastructure.
#### 2. Email Verification via rel="me" (Required)
**Mechanism**: Email discovered from site's `<link rel="me" href="mailto:...">`, then verified with code.
**Security Properties**:
- Proves website control (can modify HTML)
- Proves email control (receives and enters code)
- Follows IndieWeb standards (rel="me")
- Self-documenting (user declares email publicly)
**Implementation**:
```python
from bs4 import BeautifulSoup
import requests
def discover_email_from_site(domain: str) -> Optional[str]:
"""
Fetch site and discover email from rel="me" link.
"""
try:
response = requests.get(f"https://{domain}", timeout=10, allow_redirects=True)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
if validate_email_format(email):
return email
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
```
**Combined Residual Risk**: Low. Attacker must compromise DNS, website, and email account to authenticate fraudulently.
## Authorization Security
@@ -431,15 +497,80 @@ class AuthorizeRequest(BaseModel):
**Residual Risk**: Minimal, Pydantic provides strong validation.
### HTML Parsing Security (rel="me" Discovery)
#### Threat: Malicious HTML Injection
**Risk**: Attacker's site contains malicious HTML to exploit parser.
**Mitigations**:
1. **Robust Parser**: Use BeautifulSoup (handles malformed HTML safely)
2. **Link Extraction Only**: Only extract href attributes, no script execution
3. **Timeout**: 10-second timeout for HTTP requests
4. **Size Limit**: Limit response size (prevent memory exhaustion)
5. **HTTPS Required**: Fetch over TLS only
6. **Certificate Validation**: Verify SSL certificates
**Implementation**:
```python
from bs4 import BeautifulSoup
import requests
def discover_email_from_site(domain: str) -> Optional[str]:
"""
Safely discover email from rel="me" link.
"""
try:
# Fetch with safety limits
response = requests.get(
f"https://{domain}",
timeout=10,
allow_redirects=True,
max_redirects=5,
stream=True # Don't load entire response into memory
)
response.raise_for_status()
# Limit response size (prevent memory exhaustion)
MAX_SIZE = 5 * 1024 * 1024 # 5MB
content = response.raw.read(MAX_SIZE)
# Parse HTML (BeautifulSoup handles malformed HTML safely)
soup = BeautifulSoup(content, 'html.parser')
# Find rel="me" links (no script execution)
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Extract mailto: links only
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format before returning
if validate_email_format(email):
return email
return None
except requests.exceptions.SSLError as e:
logger.error(f"SSL certificate validation failed for {domain}: {e}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
```
**Residual Risk**: Very low. BeautifulSoup is designed for untrusted HTML.
### Email Validation
#### Threat: Email Injection Attacks
**Risk**: Attacker injects SMTP commands via email address field.
**Risk**: Attacker crafts malicious email address in rel="me" link.
**Mitigations**:
1. **Format Validation**: Strict email regex (RFC 5322)
2. **Domain Matching**: Require email domain match `me` domain
2. **No User Input**: Email discovered from site (not user-provided)
3. **SMTP Library**: Use well-tested library (smtplib)
4. **Content Encoding**: Encode email content properly
5. **Rate Limiting**: Prevent abuse
@@ -447,31 +578,27 @@ class AuthorizeRequest(BaseModel):
**Validation**:
```python
import re
from email.utils import parseaddr
def validate_email(email: str, required_domain: str) -> tuple[bool, str]:
def validate_email_format(email: str) -> bool:
"""
Validate email address and domain match.
Validate email address format.
"""
# Parse email (RFC 5322 compliant)
name, addr = parseaddr(email)
# Basic format check
# Basic format check (RFC 5322 simplified)
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, addr):
return False, "Invalid email format"
if not re.match(email_regex, email):
return False
# Extract domain
email_domain = addr.split('@')[1].lower()
required_domain = required_domain.lower()
# Sanity checks
if len(email) > 254: # RFC 5321 maximum
return False
if email.count('@') != 1:
return False
# Domain must match
if email_domain != required_domain:
return False, f"Email must be at {required_domain}"
return True, ""
return True
```
**Note**: Domain matching is NOT enforced in v1.0.0. User may have email at different domain than their identity site (e.g., phil@gmail.com for phil.example.com). This is acceptable as user explicitly publishes the email on their site.
**Residual Risk**: Low, standard validation patterns.
## Network Security
@@ -567,21 +694,29 @@ async def add_security_headers(request: Request, call_next):
**Email Handling**:
```python
# Email stored ONLY during verification (in-memory, 15-min TTL)
# Email discovered from rel="me" link (not user-provided)
# Stored ONLY during verification (in-memory, 15-min TTL)
verification_codes[code_id] = {
"email": email, # ← Exists ONLY here, NEVER in database
"email": email, # ← Discovered from site, exists ONLY here, NEVER in database
"code": code,
"domain": domain,
"expires_at": datetime.utcnow() + timedelta(minutes=15)
}
# After verification: email is deleted, only domain stored
# After verification: email is deleted, only domain + timestamp stored
db.execute('''
INSERT INTO domains (domain, verification_method, verified_at)
VALUES (?, 'email', ?)
''', (domain, datetime.utcnow()))
# Note: NO email address in database
INSERT INTO domains (domain, verification_method, verified_at, last_email_check)
VALUES (?, 'two_factor', ?, ?)
''', (domain, datetime.utcnow(), datetime.utcnow()))
# Note: NO email address in database, only verification timestamp
```
**rel="me" Discovery**:
- Email addresses are public (user publishes on their site)
- Server fetches email from user's site (not user input)
- Reduces social engineering risk (can't claim arbitrary email)
- Follows IndieWeb standards for identity
### Database Security
**SQLite Security**:
@@ -829,13 +964,15 @@ security:
## Security Roadmap
### v1.0.0 (MVP)
-Email-based authentication
-Two-factor domain verification (DNS TXT + Email via rel="me")
- ✅ rel="me" email discovery (IndieWeb standard)
- ✅ HTML parsing security (BeautifulSoup)
- ✅ TLS/HTTPS enforcement
- ✅ Secure token generation (opaque, hashed)
- ✅ URL validation (open redirect prevention)
- ✅ Input validation (Pydantic)
- ✅ Security headers
- ✅ Minimal data collection
- ✅ Minimal data collection (no email storage)
### v1.1.0
- PKCE support (code challenge/verifier)