docs: add Phase 2 domain verification design and clarifications
Add comprehensive Phase 2 documentation: - Complete design document for two-factor domain verification - Implementation guide with code examples - ADR for implementation decisions (ADR-0004) - ADR for rel="me" email discovery (ADR-008) - Phase 1 impact assessment - All 23 clarification questions answered - Updated architecture docs (indieauth-protocol, security) - Updated ADR-005 with rel="me" approach - Updated backlog with technical debt items Design ready for Phase 2 implementation. Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -58,108 +58,174 @@ Gondulf follows a defense-in-depth security model with these core principles:
|
||||
|
||||
## Authentication Security
|
||||
|
||||
### Email-Based Verification (v1.0.0)
|
||||
### Two-Factor Domain Verification (v1.0.0)
|
||||
|
||||
**Mechanism**: Users prove domain ownership by receiving verification code at email address on that domain.
|
||||
**Mechanism**: Users prove domain ownership through TWO independent factors:
|
||||
1. **DNS TXT Record**: Proves DNS control (`_gondulf.{domain}` = `verified`)
|
||||
2. **Email via rel="me"**: Proves email control (discovered from site's rel="me" link)
|
||||
|
||||
**Security Model**: An attacker must compromise BOTH factors to authenticate fraudulently. This is significantly stronger than single-factor verification.
|
||||
|
||||
#### Threat: Email Interception
|
||||
|
||||
**Risk**: Attacker intercepts email containing verification code.
|
||||
|
||||
**Mitigations**:
|
||||
1. **Short Code Lifetime**: 15-minute expiration
|
||||
2. **Single Use**: Code invalidated after verification
|
||||
3. **Rate Limiting**: Max 3 code requests per email per hour
|
||||
4. **TLS Email Delivery**: Require STARTTLS for SMTP
|
||||
5. **Display Warning**: "Only request code if you initiated this login"
|
||||
1. **Two-Factor Requirement**: Email alone is insufficient (DNS also required)
|
||||
2. **Short Code Lifetime**: 15-minute expiration
|
||||
3. **Single Use**: Code invalidated after verification
|
||||
4. **Rate Limiting**: Max 3 code requests per domain per hour
|
||||
5. **TLS Email Delivery**: Require STARTTLS for SMTP
|
||||
6. **Display Warning**: "Only request code if you initiated this login"
|
||||
|
||||
**Residual Risk**: Acceptable for v1.0.0 given short lifetime and single-use.
|
||||
**Residual Risk**: Low. Even with email interception, attacker still needs DNS control.
|
||||
|
||||
#### Threat: Code Brute Force
|
||||
|
||||
**Risk**: Attacker guesses 6-digit verification code.
|
||||
|
||||
**Mitigations**:
|
||||
1. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
|
||||
2. **Attempt Limiting**: Max 3 attempts per email
|
||||
3. **Short Lifetime**: 15-minute window
|
||||
4. **Rate Limiting**: Max 10 attempts per IP per hour
|
||||
5. **Exponential Backoff**: 5-second delay after each failed attempt
|
||||
1. **Two-Factor Requirement**: Code alone is insufficient (DNS also required)
|
||||
2. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
|
||||
3. **Attempt Limiting**: Max 3 attempts per email
|
||||
4. **Short Lifetime**: 15-minute window
|
||||
5. **Rate Limiting**: Max 3 codes per domain per hour
|
||||
6. **Single-Use**: Code invalidated after use
|
||||
|
||||
**Math**:
|
||||
- 3 attempts × 1,000,000 codes = 0.0003% success probability
|
||||
- 15-minute window limits attack time
|
||||
- Rate limiting prevents distributed guessing
|
||||
- Even if guessed, attacker still needs DNS control
|
||||
|
||||
**Residual Risk**: Very low, acceptable for v1.0.0.
|
||||
**Residual Risk**: Very low. Two-factor requirement makes brute force insufficient.
|
||||
|
||||
#### Threat: DNS TXT Record Spoofing
|
||||
|
||||
**Risk**: Attacker attempts to spoof DNS responses.
|
||||
|
||||
**Mitigations**:
|
||||
1. **Multiple Resolvers**: Query 2+ independent DNS servers (Google, Cloudflare)
|
||||
2. **Consensus Required**: Require agreement from at least 2 resolvers
|
||||
3. **DNSSEC Support**: Validate DNSSEC signatures when available (future)
|
||||
4. **Timeout Handling**: Fail securely if DNS unavailable
|
||||
5. **Logging**: Log all DNS verification attempts
|
||||
|
||||
**Residual Risk**: Low. Spoofing multiple independent resolvers is difficult.
|
||||
|
||||
#### Threat: rel="me" Link Spoofing
|
||||
|
||||
**Risk**: Attacker compromises user's website to add malicious rel="me" link.
|
||||
|
||||
**Mitigations**:
|
||||
1. **Two-Factor Requirement**: Website compromise alone insufficient (DNS also required)
|
||||
2. **HTTPS Required**: Fetch site over TLS (prevents MITM)
|
||||
3. **Certificate Validation**: Verify SSL certificate
|
||||
4. **Email Domain Matching**: Email should match site domain (warning if not)
|
||||
5. **User Education**: Inform users to secure their website
|
||||
|
||||
**Residual Risk**: Moderate. If attacker compromises both DNS and website, they can authenticate. This is acceptable as it represents full domain compromise.
|
||||
|
||||
#### Threat: Email Address Enumeration
|
||||
|
||||
**Risk**: Attacker discovers which domains are registered by requesting codes.
|
||||
**Risk**: Attacker discovers email addresses by triggering rel="me" discovery.
|
||||
|
||||
**Mitigations**:
|
||||
1. **Consistent Response**: Always say "If email exists, code sent"
|
||||
2. **No Error Differentiation**: Same message for valid/invalid emails
|
||||
3. **Rate Limiting**: Prevent bulk enumeration
|
||||
1. **Public Information**: rel="me" links are intentionally public
|
||||
2. **User Awareness**: Users know they're publishing email on their site
|
||||
3. **Rate Limiting**: Prevent bulk scanning
|
||||
4. **Robots.txt**: Users can restrict crawler access if desired
|
||||
|
||||
**Residual Risk**: Minimal, domain names are public anyway (DNS).
|
||||
**Residual Risk**: Minimal. Email addresses are intentionally published by users on their own sites.
|
||||
|
||||
### Domain Ownership Verification
|
||||
### Domain Ownership Verification (Two-Factor)
|
||||
|
||||
#### TXT Record Validation (Preferred)
|
||||
**Mechanism**: v1.0.0 requires BOTH verification methods:
|
||||
|
||||
**Mechanism**: Admin adds DNS TXT record `_gondulf.example.com` = `verified`.
|
||||
#### 1. TXT Record Validation (Required)
|
||||
|
||||
**Mechanism**: Admin adds DNS TXT record `_gondulf.{domain}` = `verified`.
|
||||
|
||||
**Security Properties**:
|
||||
- Requires DNS control (stronger than email)
|
||||
- Proves DNS control (first factor)
|
||||
- Verifiable without user interaction
|
||||
- Cacheable for performance
|
||||
- Re-verifiable periodically
|
||||
|
||||
**Threat: DNS Spoofing**
|
||||
|
||||
**Mitigations**:
|
||||
1. **DNSSEC**: Validate DNSSEC signatures if available
|
||||
2. **Multiple Resolvers**: Query 2+ DNS servers, require consensus
|
||||
3. **Caching**: Cache valid results, re-verify daily
|
||||
4. **Logging**: Log all DNS verification attempts
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
import dns.resolver
|
||||
import dns.dnssec
|
||||
|
||||
def verify_txt_record(domain: str) -> bool:
|
||||
"""
|
||||
Verify _gondulf.{domain} TXT record exists with value 'verified'.
|
||||
Requires consensus from multiple independent resolvers.
|
||||
"""
|
||||
try:
|
||||
# Use Google and Cloudflare DNS for redundancy
|
||||
resolvers = ['8.8.8.8', '1.1.1.1']
|
||||
results = []
|
||||
verified_count = 0
|
||||
|
||||
for resolver_ip in resolvers:
|
||||
resolver = dns.resolver.Resolver()
|
||||
resolver.nameservers = [resolver_ip]
|
||||
resolver.timeout = 5
|
||||
resolver.lifetime = 5
|
||||
|
||||
answers = resolver.resolve(f'_gondulf.{domain}', 'TXT')
|
||||
for rdata in answers:
|
||||
txt_value = rdata.to_text().strip('"')
|
||||
if txt_value == 'verified':
|
||||
results.append(True)
|
||||
verified_count += 1
|
||||
break
|
||||
|
||||
# Require consensus from both resolvers
|
||||
return len(results) >= 2
|
||||
# Require consensus from at least 2 resolvers
|
||||
return verified_count >= 2
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"DNS verification failed for {domain}: {e}")
|
||||
return False
|
||||
```
|
||||
|
||||
**Residual Risk**: Low, DNS is foundational internet infrastructure.
|
||||
#### 2. Email Verification via rel="me" (Required)
|
||||
|
||||
**Mechanism**: Email discovered from site's `<link rel="me" href="mailto:...">`, then verified with code.
|
||||
|
||||
**Security Properties**:
|
||||
- Proves website control (can modify HTML)
|
||||
- Proves email control (receives and enters code)
|
||||
- Follows IndieWeb standards (rel="me")
|
||||
- Self-documenting (user declares email publicly)
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
from bs4 import BeautifulSoup
|
||||
import requests
|
||||
|
||||
def discover_email_from_site(domain: str) -> Optional[str]:
|
||||
"""
|
||||
Fetch site and discover email from rel="me" link.
|
||||
"""
|
||||
try:
|
||||
response = requests.get(f"https://{domain}", timeout=10, allow_redirects=True)
|
||||
response.raise_for_status()
|
||||
|
||||
soup = BeautifulSoup(response.content, 'html.parser')
|
||||
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
|
||||
|
||||
for link in me_links:
|
||||
href = link.get('href', '')
|
||||
if href.startswith('mailto:'):
|
||||
email = href.replace('mailto:', '').strip()
|
||||
if validate_email_format(email):
|
||||
return email
|
||||
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to discover email for {domain}: {e}")
|
||||
return None
|
||||
```
|
||||
|
||||
**Combined Residual Risk**: Low. Attacker must compromise DNS, website, and email account to authenticate fraudulently.
|
||||
|
||||
## Authorization Security
|
||||
|
||||
@@ -431,15 +497,80 @@ class AuthorizeRequest(BaseModel):
|
||||
|
||||
**Residual Risk**: Minimal, Pydantic provides strong validation.
|
||||
|
||||
### HTML Parsing Security (rel="me" Discovery)
|
||||
|
||||
#### Threat: Malicious HTML Injection
|
||||
|
||||
**Risk**: Attacker's site contains malicious HTML to exploit parser.
|
||||
|
||||
**Mitigations**:
|
||||
1. **Robust Parser**: Use BeautifulSoup (handles malformed HTML safely)
|
||||
2. **Link Extraction Only**: Only extract href attributes, no script execution
|
||||
3. **Timeout**: 10-second timeout for HTTP requests
|
||||
4. **Size Limit**: Limit response size (prevent memory exhaustion)
|
||||
5. **HTTPS Required**: Fetch over TLS only
|
||||
6. **Certificate Validation**: Verify SSL certificates
|
||||
|
||||
**Implementation**:
|
||||
```python
|
||||
from bs4 import BeautifulSoup
|
||||
import requests
|
||||
|
||||
def discover_email_from_site(domain: str) -> Optional[str]:
|
||||
"""
|
||||
Safely discover email from rel="me" link.
|
||||
"""
|
||||
try:
|
||||
# Fetch with safety limits
|
||||
response = requests.get(
|
||||
f"https://{domain}",
|
||||
timeout=10,
|
||||
allow_redirects=True,
|
||||
max_redirects=5,
|
||||
stream=True # Don't load entire response into memory
|
||||
)
|
||||
response.raise_for_status()
|
||||
|
||||
# Limit response size (prevent memory exhaustion)
|
||||
MAX_SIZE = 5 * 1024 * 1024 # 5MB
|
||||
content = response.raw.read(MAX_SIZE)
|
||||
|
||||
# Parse HTML (BeautifulSoup handles malformed HTML safely)
|
||||
soup = BeautifulSoup(content, 'html.parser')
|
||||
|
||||
# Find rel="me" links (no script execution)
|
||||
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
|
||||
|
||||
# Extract mailto: links only
|
||||
for link in me_links:
|
||||
href = link.get('href', '')
|
||||
if href.startswith('mailto:'):
|
||||
email = href.replace('mailto:', '').strip()
|
||||
# Validate email format before returning
|
||||
if validate_email_format(email):
|
||||
return email
|
||||
|
||||
return None
|
||||
|
||||
except requests.exceptions.SSLError as e:
|
||||
logger.error(f"SSL certificate validation failed for {domain}: {e}")
|
||||
return None
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to discover email for {domain}: {e}")
|
||||
return None
|
||||
```
|
||||
|
||||
**Residual Risk**: Very low. BeautifulSoup is designed for untrusted HTML.
|
||||
|
||||
### Email Validation
|
||||
|
||||
#### Threat: Email Injection Attacks
|
||||
|
||||
**Risk**: Attacker injects SMTP commands via email address field.
|
||||
**Risk**: Attacker crafts malicious email address in rel="me" link.
|
||||
|
||||
**Mitigations**:
|
||||
1. **Format Validation**: Strict email regex (RFC 5322)
|
||||
2. **Domain Matching**: Require email domain match `me` domain
|
||||
2. **No User Input**: Email discovered from site (not user-provided)
|
||||
3. **SMTP Library**: Use well-tested library (smtplib)
|
||||
4. **Content Encoding**: Encode email content properly
|
||||
5. **Rate Limiting**: Prevent abuse
|
||||
@@ -447,31 +578,27 @@ class AuthorizeRequest(BaseModel):
|
||||
**Validation**:
|
||||
```python
|
||||
import re
|
||||
from email.utils import parseaddr
|
||||
|
||||
def validate_email(email: str, required_domain: str) -> tuple[bool, str]:
|
||||
def validate_email_format(email: str) -> bool:
|
||||
"""
|
||||
Validate email address and domain match.
|
||||
Validate email address format.
|
||||
"""
|
||||
# Parse email (RFC 5322 compliant)
|
||||
name, addr = parseaddr(email)
|
||||
|
||||
# Basic format check
|
||||
# Basic format check (RFC 5322 simplified)
|
||||
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
|
||||
if not re.match(email_regex, addr):
|
||||
return False, "Invalid email format"
|
||||
if not re.match(email_regex, email):
|
||||
return False
|
||||
|
||||
# Extract domain
|
||||
email_domain = addr.split('@')[1].lower()
|
||||
required_domain = required_domain.lower()
|
||||
# Sanity checks
|
||||
if len(email) > 254: # RFC 5321 maximum
|
||||
return False
|
||||
if email.count('@') != 1:
|
||||
return False
|
||||
|
||||
# Domain must match
|
||||
if email_domain != required_domain:
|
||||
return False, f"Email must be at {required_domain}"
|
||||
|
||||
return True, ""
|
||||
return True
|
||||
```
|
||||
|
||||
**Note**: Domain matching is NOT enforced in v1.0.0. User may have email at different domain than their identity site (e.g., phil@gmail.com for phil.example.com). This is acceptable as user explicitly publishes the email on their site.
|
||||
|
||||
**Residual Risk**: Low, standard validation patterns.
|
||||
|
||||
## Network Security
|
||||
@@ -567,21 +694,29 @@ async def add_security_headers(request: Request, call_next):
|
||||
|
||||
**Email Handling**:
|
||||
```python
|
||||
# Email stored ONLY during verification (in-memory, 15-min TTL)
|
||||
# Email discovered from rel="me" link (not user-provided)
|
||||
# Stored ONLY during verification (in-memory, 15-min TTL)
|
||||
verification_codes[code_id] = {
|
||||
"email": email, # ← Exists ONLY here, NEVER in database
|
||||
"email": email, # ← Discovered from site, exists ONLY here, NEVER in database
|
||||
"code": code,
|
||||
"domain": domain,
|
||||
"expires_at": datetime.utcnow() + timedelta(minutes=15)
|
||||
}
|
||||
|
||||
# After verification: email is deleted, only domain stored
|
||||
# After verification: email is deleted, only domain + timestamp stored
|
||||
db.execute('''
|
||||
INSERT INTO domains (domain, verification_method, verified_at)
|
||||
VALUES (?, 'email', ?)
|
||||
''', (domain, datetime.utcnow()))
|
||||
# Note: NO email address in database
|
||||
INSERT INTO domains (domain, verification_method, verified_at, last_email_check)
|
||||
VALUES (?, 'two_factor', ?, ?)
|
||||
''', (domain, datetime.utcnow(), datetime.utcnow()))
|
||||
# Note: NO email address in database, only verification timestamp
|
||||
```
|
||||
|
||||
**rel="me" Discovery**:
|
||||
- Email addresses are public (user publishes on their site)
|
||||
- Server fetches email from user's site (not user input)
|
||||
- Reduces social engineering risk (can't claim arbitrary email)
|
||||
- Follows IndieWeb standards for identity
|
||||
|
||||
### Database Security
|
||||
|
||||
**SQLite Security**:
|
||||
@@ -829,13 +964,15 @@ security:
|
||||
## Security Roadmap
|
||||
|
||||
### v1.0.0 (MVP)
|
||||
- ✅ Email-based authentication
|
||||
- ✅ Two-factor domain verification (DNS TXT + Email via rel="me")
|
||||
- ✅ rel="me" email discovery (IndieWeb standard)
|
||||
- ✅ HTML parsing security (BeautifulSoup)
|
||||
- ✅ TLS/HTTPS enforcement
|
||||
- ✅ Secure token generation (opaque, hashed)
|
||||
- ✅ URL validation (open redirect prevention)
|
||||
- ✅ Input validation (Pydantic)
|
||||
- ✅ Security headers
|
||||
- ✅ Minimal data collection
|
||||
- ✅ Minimal data collection (no email storage)
|
||||
|
||||
### v1.1.0
|
||||
- PKCE support (code challenge/verifier)
|
||||
|
||||
Reference in New Issue
Block a user