docs: add Phase 2 domain verification design and clarifications

Add comprehensive Phase 2 documentation:
- Complete design document for two-factor domain verification
- Implementation guide with code examples
- ADR for implementation decisions (ADR-0004)
- ADR for rel="me" email discovery (ADR-008)
- Phase 1 impact assessment
- All 23 clarification questions answered
- Updated architecture docs (indieauth-protocol, security)
- Updated ADR-005 with rel="me" approach
- Updated backlog with technical debt items

Design ready for Phase 2 implementation.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-11-20 13:05:09 -07:00
parent bebd47955f
commit 6f06aebf40
10 changed files with 5605 additions and 410 deletions

View File

@@ -0,0 +1,98 @@
# 0004. Phase 2 Implementation Decisions
Date: 2024-11-20
## Status
Accepted
## Context
The Developer has raised 8 categories of implementation questions for Phase 2 that require architectural decisions. These decisions need to balance simplicity with functionality while providing clear direction for implementation.
## Decisions
### 1. Rate Limiting Implementation
**Decision**: Implement actual rate limiting with in-memory storage in Phase 2.
**Rationale**: Security features should be real from the start, not stubs. In-memory is simplest.
**Implementation**:
- Use a simple dictionary with domain as key, list of timestamps as value
- Clean up old timestamps on each check (older than 1 hour)
- Store in `RateLimiter` service as instance variable
- No persistence needed - resets on restart is acceptable
### 2. Authorization Code Metadata Structure
**Decision**: Use Phase 1's `CodeStorage` service with complete structure from the start.
**Rationale**: Reuse existing infrastructure, avoid future migrations.
**Implementation**:
- Include `used` field (boolean, default False) even though Phase 3 consumes it
- Store epoch integers for timestamps (simpler than datetime objects)
- Use same `CodeStorage` from Phase 1 with authorization code keys
### 3. HTML Template Implementation
**Decision**: Use Jinja2 templates with separate template files.
**Rationale**: Jinja2 is standard, maintainable, and allows for future template customization.
**Implementation**:
- Templates in `src/gondulf/templates/`
- Create `base.html` for shared layout
- Individual templates: `verify_email.html`, `verify_totp.html`, `authorize.html`, `error.html`
- Pass minimal context to templates
### 4. Database Migration Timing
**Decision**: Apply migration 002 immediately as part of Phase 2 setup.
**Rationale**: Keep database schema current with code expectations.
**Implementation**:
- Run migration before any Phase 2 code execution
- New code assumes 'two_factor' column exists
- Migration updates existing rows (if any) to have 'two_factor' = false
### 5. Client Validation Helper Functions
**Decision**: Implement as standalone functions in a shared utility module.
**Rationale**: Functions over classes when no state is needed. Simpler to test and understand.
**Implementation**:
- Create `src/gondulf/utils/validation.py`
- Functions: `mask_email()`, `validate_redirect_uri()`, `normalize_client_id()`
- Full subdomain validation now (not a stub) - security should be complete
### 6. Error Response Format Consistency
**Decision**: Use format appropriate to the endpoint type.
**Rationale**: Follow OAuth 2.0 patterns and user experience expectations.
**Implementation**:
- Verification endpoints (`/verify/email`, `/verify/totp`): JSON responses, always 200 OK
- Authorization endpoint errors before user interaction: HTML error page
- Authorization endpoint errors after client validation: OAuth redirect with error
- Token endpoint (Phase 3): Always JSON
### 7. Dependency Injection Pattern
**Decision**: Create `dependencies.py` with singleton services instantiated at startup.
**Rationale**: Simpler than per-request instantiation, consistent with Phase 1 pattern.
**Implementation**:
- All services instantiated once in `dependencies.py`
- Services read configuration at instantiation
- FastAPI dependency injection provides same instance to all requests
- Pattern: `get_code_storage()`, `get_rate_limiter()`, etc.
### 8. Test Organization for Authorization Endpoint
**Decision**: Separate test files per major endpoint with shared fixtures module.
**Rationale**: Easier to navigate and maintain as tests grow.
**Implementation**:
- `tests/test_verification_endpoints.py` - email and TOTP verification
- `tests/test_authorization_endpoint.py` - authorization flow
- `tests/conftest.py` - shared fixtures for common scenarios
- Test complete flows, not sub-endpoints in isolation
## Consequences
### Positive
- Clear, consistent patterns across the codebase
- Real security from the start (no stubs)
- Reuse of existing Phase 1 infrastructure
- Standard, maintainable template approach
- Simple service architecture
### Negative
- Slightly more upfront work than stub implementations
- In-memory rate limiting loses state on restart
- Templates add a dependency (Jinja2)
### Neutral
- Following established patterns from other web frameworks
- Committing to specific implementation choices early

View File

@@ -1,9 +1,10 @@
# ADR-005: Email-Based Authentication for v1.0.0
# ADR-005: Two-Factor Domain Verification for v1.0.0 (DNS + Email via rel="me")
Date: 2025-11-20
Last Updated: 2025-11-20
## Status
Accepted
Accepted (Updated)
## Context
@@ -65,143 +66,289 @@ From project brief:
## Decision
**Gondulf v1.0.0 will use email-based verification as the PRIMARY authentication method, with DNS TXT record verification as an OPTIONAL fast-path.**
**Gondulf v1.0.0 will require BOTH DNS TXT record verification AND email verification using the IndieWeb rel="me" pattern. Both verifications must succeed for authentication to complete.**
### Implementation Approach
**Two-Tier Verification**:
**Two-Factor Verification (Both Required)**:
1. **DNS TXT Record (Preferred, Optional)**:
1. **DNS TXT Record Verification (Required)**:
- Check for `_gondulf.{domain}` TXT record = `verified`
- If found: Skip email verification, use cached result
- If not found: Fall back to email verification
- Result cached in database for future use
- If found: Proceed to email verification
- If not found: Authentication fails with instructions to add TXT record
- Proves: User controls DNS for the domain
2. **Email Verification (Required Fallback)**:
- User provides email address at their domain
2. **Email Discovery via rel="me" (Required)**:
- Fetch user's domain homepage (e.g., https://example.com)
- Parse HTML for `<link rel="me" href="mailto:user@example.com">`
- Extract email address from rel="me" link
- If not found: Authentication fails with instructions to add rel="me" link
- Proves: User has published email relationship on their site
3. **Email Verification Code (Required)**:
- Server generates 6-digit verification code
- Server sends code via SMTP
- Server sends code to discovered email address via SMTP
- User enters code (15-minute expiration)
- Domain marked as verified in database
- Verification code must be correct to complete authentication
- Proves: User controls the email account
**Why Both?**:
- DNS provides fast path for tech-savvy users
- Email provides accessible path for all users
- DNS requires upfront setup but smoother repeat authentication
- Email requires no setup but requires email access each time
**Why All Three?**:
- **DNS TXT**: Proves domain DNS control (strong ownership signal)
- **rel="me"**: Follows IndieWeb standard for identity claims
- **Email Code**: Proves active control of the email account (not just DNS/HTML)
- **Combined**: Two-factor verification provides stronger security than either alone
### Rationale
**Meets User Requirements**:
- Email-based authentication as specified
- No external identity providers (GitHub, GitLab) in v1.0.0
- Simple to understand and implement
- Familiar UX pattern
**Enhanced Security Model**:
- Two-factor verification: DNS control + Email control
- Prevents attacks where only one factor is compromised
- DNS TXT proves domain ownership
- Email code proves active account control
- rel="me" follows IndieWeb standards for identity
**Simplicity**:
- Email verification is well-understood
- Standard library SMTP support (smtplib)
- No OAuth 2.0 client implementation needed
- No external API dependencies
**Follows IndieWeb Standards**:
- rel="me" is standard practice for identity claims (see: https://thesatelliteoflove.com)
- Aligns with IndieAuth ecosystem expectations
- Users likely already have rel="me" links for other purposes
- Email discovery is self-documenting (user's site declares their email)
**Security Sufficient for MVP**:
- Email access typically indicates domain control
- 6-digit codes provide 1,000,000 combinations
- 15-minute expiration limits brute-force window
- Rate limiting prevents abuse
- TLS for email delivery (STARTTLS)
**No User-Provided Email Input**:
- Server discovers email from user's site (no manual entry)
- Prevents typos and social engineering
- Email is self-attested by user on their own domain
- Reduces attack surface (can't claim arbitrary email)
**Operational Simplicity**:
- Requires only SMTP configuration (widely available)
- No API keys or provider accounts needed
- No rate limits from external providers
- Full control over verification flow
**Stronger Than Single-Factor**:
- Attacker needs DNS control AND email access
- Compromised DNS alone: insufficient
- Compromised email alone: insufficient
- Requires control of both infrastructure and communication
**DNS TXT as Enhancement**:
- Provides better UX for repeat authentication
- Demonstrates domain control more directly
- Optional (users not forced to configure DNS)
- Cached result eliminates email requirement
**Simplicity Maintained**:
- Two verification checks, but both straightforward
- DNS TXT: standard practice
- rel="me": standard HTML link
- Email code: familiar pattern
- Total setup time: < 5 minutes for technical users
## Consequences
### Positive Consequences
1. **User Simplicity**:
- Familiar email verification pattern
- No need to create accounts on external services
- Works with any email provider
1. **Enhanced Security**:
- Two-factor verification (DNS + Email)
- Stronger ownership proof than single factor
- Prevents single-point-of-compromise attacks
- Aligns with security best practices
2. **Implementation Simplicity**:
- Standard library support (smtplib, email)
- No external API integration
- Straightforward testing (mock SMTP)
2. **IndieWeb Standard Compliance**:
- Follows rel="me" pattern from IndieWeb community
- Interoperability with other IndieWeb tools
- Users may already have rel="me" configured
- Self-documenting identity claims
3. **Operational Simplicity**:
- Single external dependency (SMTP server)
- No API rate limits to manage
- No provider outages to worry about
- Admin controls email templates
3. **Reduced Attack Surface**:
- No user-provided email input (prevents typos/social engineering)
- Email discovered from user's own site
- Can't claim arbitrary email addresses
- User controls all verification requirements
4. **Privacy**:
- Email addresses NOT stored (deleted after verification)
4. **Implementation Simplicity**:
- HTML parsing for rel="me" (standard libraries)
- DNS queries (dnspython)
- SMTP email sending (smtplib)
- No external API dependencies
5. **Privacy**:
- Email addresses NOT stored after verification
- No data shared with third parties
- No tracking by external providers
- Minimal data collection
5. **Flexibility**:
- DNS TXT provides fast-path for power users
- Email fallback ensures accessibility
- No user locked out if DNS unavailable
6. **Transparency**:
- User explicitly declares email on their site
- No hidden verification methods
- User controls both DNS and HTML
- Clear requirements for setup
### Negative Consequences
1. **Email Dependency**:
1. **Higher Setup Complexity**:
- Users must configure TWO things (DNS TXT + rel="me" link)
- More steps than single-factor approaches
- Requires basic HTML editing skills
- May deter non-technical users
2. **Email Dependency**:
- Requires functioning SMTP configuration
- Email delivery not guaranteed (spam filters)
- Users must have email access during authentication
- Email account compromise = domain compromise
- Email account compromise still a risk (mitigated by DNS requirement)
2. **User Experience**:
- Extra step vs. provider OAuth (more clicks)
- Requires checking email inbox
3. **User Experience**:
- More setup steps vs. simpler alternatives
- Requires checking email inbox during login
- Potential delay (email delivery time)
- Code expiration can frustrate users
- Both verifications must succeed (no fallback)
3. **Security Limitations**:
- Email interception risk (mitigated by TLS)
- Email account compromise risk (user responsibility)
- Weaker than hardware-based auth (WebAuthn)
4. **HTML Parsing Complexity**:
- Must parse potentially malformed HTML
- Multiple possible HTML formats for rel="me"
- Case sensitivity issues
- Must handle various link formats (mailto: vs https://)
4. **Scalability Concerns**:
- Email delivery at scale (future concern)
- SMTP rate limits (future concern)
- Email provider blocking (spam prevention)
5. **Failure Points**:
- DNS lookup failure blocks authentication
- Site unavailable blocks authentication
- Email send failure blocks authentication
- No fallback mechanism (both required)
### Mitigation Strategies
**Email Delivery Reliability**:
```python
# Robust SMTP configuration
SMTP_CONFIG = {
'host': os.environ['SMTP_HOST'],
'port': int(os.environ.get('SMTP_PORT', '587')),
'use_tls': True, # STARTTLS required
'username': os.environ['SMTP_USERNAME'],
'password': os.environ['SMTP_PASSWORD'],
'from_email': os.environ['SMTP_FROM'],
'timeout': 10, # Fail fast
}
**Clear Setup Instructions**:
```markdown
## Domain Verification Setup
# Comprehensive error handling
try:
send_email(to=email, code=code)
except SMTPException as e:
logger.error(f"Email send failed: {e}")
# Display user-friendly error
raise HTTPException(500, "Email delivery failed. Try again or contact admin.")
Gondulf requires two verifications to prove domain ownership:
### Step 1: Add DNS TXT Record
Add this DNS record to your domain:
- Type: TXT
- Name: _gondulf.example.com
- Value: verified
This proves you control DNS for your domain.
### Step 2: Add rel="me" Link to Your Homepage
Add this HTML to your homepage (e.g., https://example.com/index.html):
<link rel="me" href="mailto:your-email@example.com">
This declares your email address publicly on your site.
### Step 3: Verify Email Access
During login:
- We'll discover your email from the rel="me" link
- We'll send a verification code to that email
- Enter the code to complete authentication
Setup time: ~5 minutes
```
**Code Security**:
**Robust HTML Parsing**:
```python
from bs4 import BeautifulSoup
from urllib.parse import urlparse
def discover_email_from_site(domain_url: str) -> Optional[str]:
"""
Fetch site and discover email from rel="me" link.
Returns: email address or None if not found
"""
try:
# Fetch homepage
response = requests.get(domain_url, timeout=10, allow_redirects=True)
response.raise_for_status()
# Parse HTML (handle malformed HTML gracefully)
soup = BeautifulSoup(response.content, 'html.parser')
# Find all rel="me" links
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Look for mailto: links
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format
if validate_email_format(email):
logger.info(f"Discovered email via rel='me' for {domain_url}")
return email
logger.warning(f"No rel='me' mailto: link found for {domain_url}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain_url}: {e}")
return None
```
**DNS Verification**:
```python
def verify_dns_txt(domain: str) -> bool:
"""
Verify _gondulf.{domain} TXT record exists.
Returns: True if verified, False otherwise
"""
try:
import dns.resolver
# Query multiple resolvers for redundancy
resolvers = ['8.8.8.8', '1.1.1.1']
verified_count = 0
for resolver_ip in resolvers:
resolver = dns.resolver.Resolver()
resolver.nameservers = [resolver_ip]
resolver.timeout = 5
answers = resolver.resolve(f'_gondulf.{domain}', 'TXT')
for rdata in answers:
if rdata.to_text().strip('"') == 'verified':
verified_count += 1
break
# Require consensus from multiple resolvers
return verified_count >= 2
except Exception as e:
logger.warning(f"DNS verification failed for {domain}: {e}")
return False
```
**Helpful Error Messages**:
```python
# DNS TXT not found
if not dns_verified:
return ErrorResponse("""
DNS verification failed.
Please add this TXT record to your domain:
- Type: TXT
- Name: _gondulf.{domain}
- Value: verified
DNS changes may take up to 24 hours to propagate.
""")
# rel="me" not found
if not email_discovered:
return ErrorResponse("""
Could not find rel="me" link on your site.
Please add this to your homepage:
<link rel="me" href="mailto:your-email@example.com">
See: https://indieweb.org/rel-me for more information.
""")
# Email send failure
if not email_sent:
return ErrorResponse("""
Failed to send verification code to {email}.
Please check:
- Email address is correct in your rel="me" link
- Email server is accepting mail
- Check spam/junk folder
""")
```
**Code Security** (unchanged):
```python
# Sufficient entropy
code = ''.join(secrets.choice('0123456789') for _ in range(6))
@@ -209,107 +356,182 @@ code = ''.join(secrets.choice('0123456789') for _ in range(6))
# Rate limiting
MAX_ATTEMPTS = 3 # Per email
MAX_CODES = 3 # Per hour per email
MAX_CODES = 3 # Per hour per domain
# Expiration
CODE_LIFETIME = timedelta(minutes=15)
# Attempt tracking
attempts = code_storage.get_attempts(email)
if attempts >= MAX_ATTEMPTS:
raise HTTPException(429, "Too many attempts. Try again in 15 minutes.")
```
**Email Interception**:
```python
# Require TLS for email delivery
smtp.starttls()
# Clear warning to users
"""
We've sent a verification code to your email.
Only enter this code if you initiated this login.
The code expires in 15 minutes.
"""
# Log suspicious activity
if time_between_send_and_verify < 1_second:
logger.warning(f"Suspiciously fast verification: {domain}")
```
**DNS TXT Fast-Path**:
```python
# Check DNS first, skip email if verified
txt_record = dns.query(f'_gondulf.{domain}', 'TXT')
if txt_record == 'verified':
logger.info(f"DNS verification successful: {domain}")
# Use cached verification, skip email
return verified_domain(domain)
# Fall back to email
logger.info(f"DNS verification not found, using email: {domain}")
return email_verification_flow(domain)
```
**User Education**:
```markdown
## Domain Verification
Gondulf offers two ways to verify domain ownership:
### Option 1: DNS TXT Record (Recommended)
Add this DNS record to skip email verification:
- Type: TXT
- Name: _gondulf.example.com
- Value: verified
Benefits:
- Faster authentication (no email required)
- Verify once, use forever
- More secure (DNS control = domain control)
### Option 2: Email Verification
- Enter an email address at your domain
- We'll send a 6-digit code
- Enter the code to verify
Benefits:
- No DNS configuration needed
- Works immediately
- Familiar process
# Single-use enforcement
code_storage.mark_used(code_id)
```
## Implementation
### Email Verification Flow
### Complete Authentication Flow (v1.0.0)
```python
from datetime import datetime, timedelta
import secrets
import smtplib
import requests
import dns.resolver
from email.message import EmailMessage
from bs4 import BeautifulSoup
from typing import Optional, Tuple
class EmailVerificationService:
class DomainVerificationService:
"""
Two-factor domain verification: DNS TXT + Email via rel="me"
"""
def __init__(self, smtp_config: dict):
self.smtp = smtp_config
self.codes = {} # In-memory storage (short-lived)
self.codes = {} # In-memory storage for verification codes
def request_code(self, email: str, domain: str) -> None:
def verify_domain_ownership(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]:
"""
Generate and send verification code.
Perform two-factor domain verification.
Raises:
ValueError: If email domain doesn't match requested domain
HTTPException: If rate limit exceeded or email send fails
Returns: (success, email_discovered, error_message)
Steps:
1. Verify DNS TXT record
2. Discover email from rel="me" link
3. Send verification code to email
4. User enters code (handled separately)
"""
# Validate email matches domain
email_domain = email.split('@')[1].lower()
if email_domain != domain.lower():
raise ValueError(f"Email must be at {domain}")
# Step 1: Verify DNS TXT record
dns_verified = self._verify_dns_txt(domain)
if not dns_verified:
return False, None, "DNS TXT record not found. Please add _gondulf.{domain} = verified"
# Step 2: Discover email from site's rel="me" link
email = self._discover_email_from_site(f"https://{domain}")
if not email:
return False, None, 'No rel="me" mailto: link found on homepage. Please add <link rel="me" href="mailto:you@example.com">'
# Step 3: Generate and send verification code
code_sent = self._send_verification_code(email, domain)
if not code_sent:
return False, email, f"Failed to send verification code to {email}"
# Return success with discovered email
return True, email, None
def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, str]:
"""
Verify submitted code.
Returns: (success, domain or error_message)
"""
code_data = self.codes.get(email)
if not code_data:
return False, "No verification code found. Please request a new code."
# Check expiration
if datetime.utcnow() > code_data['expires_at']:
del self.codes[email]
return False, "Code expired. Please request a new code."
# Check attempts
code_data['attempts'] += 1
if code_data['attempts'] > 3:
del self.codes[email]
return False, "Too many attempts. Please restart authentication."
# Verify code (constant-time comparison)
if not secrets.compare_digest(submitted_code, code_data['code']):
return False, "Invalid code. Please try again."
# Success: Clean up and return domain
domain = code_data['domain']
del self.codes[email] # Single-use code
logger.info(f"Domain verified: {domain} (DNS + Email)")
return True, domain
def _verify_dns_txt(self, domain: str) -> bool:
"""
Verify _gondulf.{domain} TXT record exists with value 'verified'.
Returns: True if verified, False otherwise
"""
record_name = f'_gondulf.{domain}'
# Use multiple resolvers for redundancy
resolvers = ['8.8.8.8', '1.1.1.1']
verified_count = 0
for resolver_ip in resolvers:
try:
resolver = dns.resolver.Resolver()
resolver.nameservers = [resolver_ip]
resolver.timeout = 5
answers = resolver.resolve(record_name, 'TXT')
for rdata in answers:
if rdata.to_text().strip('"') == 'verified':
verified_count += 1
break
except Exception as e:
logger.debug(f"DNS query failed (resolver {resolver_ip}): {e}")
continue
# Require consensus from at least 2 resolvers
if verified_count >= 2:
logger.info(f"DNS TXT verified: {domain}")
return True
logger.warning(f"DNS TXT verification failed: {domain}")
return False
def _discover_email_from_site(self, domain_url: str) -> Optional[str]:
"""
Fetch domain homepage and discover email from rel="me" link.
Returns: email address or None if not found
"""
try:
# Fetch homepage
response = requests.get(domain_url, timeout=10, allow_redirects=True)
response.raise_for_status()
# Parse HTML (BeautifulSoup handles malformed HTML)
soup = BeautifulSoup(response.content, 'html.parser')
# Find all rel="me" links (both <link> and <a>)
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Look for mailto: links
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Basic email validation
if '@' in email and '.' in email.split('@')[1]:
logger.info(f"Discovered email via rel='me': {domain_url}")
return email
logger.warning(f"No rel='me' mailto: link found: {domain_url}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain_url}: {e}")
return None
def _send_verification_code(self, email: str, domain: str) -> bool:
"""
Generate and send verification code to email.
Returns: True if sent successfully, False otherwise
"""
# Check rate limit
if self._is_rate_limited(email):
raise HTTPException(429, "Too many requests. Try again in 1 hour.")
if self._is_rate_limited(domain):
logger.warning(f"Rate limit exceeded for domain: {domain}")
return False
# Generate 6-digit code
code = ''.join(secrets.choice('0123456789') for _ in range(6))
@@ -323,56 +545,14 @@ class EmailVerificationService:
'attempts': 0,
}
# Send email
# Send email via SMTP
try:
self._send_code_email(email, code)
logger.info(f"Verification code sent to {email[:3]}***@{email_domain}")
except Exception as e:
logger.error(f"Failed to send email to {email_domain}: {e}")
raise HTTPException(500, "Email delivery failed")
msg = EmailMessage()
msg['From'] = self.smtp['from_email']
msg['To'] = email
msg['Subject'] = 'Gondulf Verification Code'
def verify_code(self, email: str, submitted_code: str) -> str:
"""
Verify submitted code.
Returns: domain if valid
Raises: HTTPException if invalid/expired
"""
code_data = self.codes.get(email)
if not code_data:
raise HTTPException(400, "No verification code found")
# Check expiration
if datetime.utcnow() > code_data['expires_at']:
del self.codes[email]
raise HTTPException(400, "Code expired. Request a new one.")
# Check attempts
code_data['attempts'] += 1
if code_data['attempts'] > 3:
del self.codes[email]
raise HTTPException(429, "Too many attempts")
# Verify code (constant-time comparison)
if not secrets.compare_digest(submitted_code, code_data['code']):
raise HTTPException(400, "Invalid code")
# Success: Clean up and return domain
domain = code_data['domain']
del self.codes[email] # Single-use code
logger.info(f"Domain verified via email: {domain}")
return domain
def _send_code_email(self, to: str, code: str) -> None:
"""Send verification code via SMTP."""
msg = EmailMessage()
msg['From'] = self.smtp['from_email']
msg['To'] = to
msg['Subject'] = 'Gondulf Verification Code'
msg.set_content(f"""
msg.set_content(f"""
Your Gondulf verification code is:
{code}
@@ -381,96 +561,34 @@ This code expires in 15 minutes.
Only enter this code if you initiated this login.
If you did not request this code, ignore this email.
""")
""")
with smtplib.SMTP(self.smtp['host'], self.smtp['port'], timeout=10) as smtp:
smtp.starttls()
smtp.login(self.smtp['username'], self.smtp['password'])
smtp.send_message(msg)
with smtplib.SMTP(self.smtp['host'], self.smtp['port'], timeout=10) as smtp:
smtp.starttls()
smtp.login(self.smtp['username'], self.smtp['password'])
smtp.send_message(msg)
def _is_rate_limited(self, email: str) -> bool:
"""Check if email is rate limited."""
# Simple in-memory tracking (for v1.0.0)
# Future: Redis-based rate limiting
logger.info(f"Verification code sent to {email[:3]}***@{email.split('@')[1]}")
return True
except Exception as e:
logger.error(f"Failed to send email to {email}: {e}")
return False
def _is_rate_limited(self, domain: str) -> bool:
"""
Check if domain is rate limited (max 3 codes per hour).
Returns: True if rate limited, False otherwise
"""
recent_codes = [
code for code in self.codes.values()
if code.get('email') == email
if code.get('domain') == domain
and datetime.utcnow() - code['created_at'] < timedelta(hours=1)
]
return len(recent_codes) >= 3
```
### DNS TXT Record Verification
```python
import dns.resolver
class DNSVerificationService:
def __init__(self, cache_storage):
self.cache = cache_storage
def verify_domain(self, domain: str) -> bool:
"""
Check if domain has valid DNS TXT record.
Returns: True if verified, False otherwise
"""
# Check cache first
cached = self.cache.get(domain)
if cached and cached['verified']:
logger.info(f"Using cached DNS verification: {domain}")
return True
# Query DNS
try:
verified = self._query_txt_record(domain)
# Cache result
self.cache.set(domain, {
'verified': verified,
'verified_at': datetime.utcnow(),
'method': 'txt_record'
})
return verified
except Exception as e:
logger.warning(f"DNS verification failed for {domain}: {e}")
return False
def _query_txt_record(self, domain: str) -> bool:
"""
Query _gondulf.{domain} TXT record.
Returns: True if record exists with value 'verified'
"""
record_name = f'_gondulf.{domain}'
# Use multiple resolvers for redundancy
resolvers = ['8.8.8.8', '1.1.1.1']
for resolver_ip in resolvers:
try:
resolver = dns.resolver.Resolver()
resolver.nameservers = [resolver_ip]
resolver.timeout = 5
resolver.lifetime = 5
answers = resolver.resolve(record_name, 'TXT')
for rdata in answers:
txt_value = rdata.to_text().strip('"')
if txt_value == 'verified':
logger.info(f"DNS TXT verified: {domain} (resolver: {resolver_ip})")
return True
except Exception as e:
logger.debug(f"DNS query failed (resolver {resolver_ip}): {e}")
continue
return False
```
## Future Enhancements
### v1.1.0+: Additional Authentication Methods
@@ -561,13 +679,22 @@ These will be additive (user chooses method), not replacing email.
## References
- IndieWeb rel="me": https://indieweb.org/rel-me
- Example Implementation: https://thesatelliteoflove.com (Phil Skents' identity page)
- SMTP Protocol (RFC 5321): https://datatracker.ietf.org/doc/html/rfc5321
- Email Security (STARTTLS): https://datatracker.ietf.org/doc/html/rfc3207
- DNS TXT Records (RFC 1035): https://datatracker.ietf.org/doc/html/rfc1035
- HTML Link Relations: https://www.w3.org/TR/html5/links.html#linkTypes
- BeautifulSoup (HTML parsing): https://www.crummy.com/software/BeautifulSoup/
- WebAuthn (W3C): https://www.w3.org/TR/webauthn/ (future)
## Decision History
- 2025-11-20: Proposed (Architect)
- 2025-11-20: Accepted (Architect)
- 2025-11-20: Proposed (Architect) - Email primary, DNS optional
- 2025-11-20: Accepted (Architect) - Email primary, DNS optional
- 2025-11-20: **UPDATED** (Architect) - BOTH required (DNS + Email via rel="me")
- Changed from single-factor (email OR DNS) to two-factor (email AND DNS)
- Added rel="me" email discovery (IndieWeb standard)
- Removed user-provided email input (security improvement)
- Enhanced security model with dual verification
- TBD: Review after v1.0.0 deployment (gather user feedback)

View File

@@ -0,0 +1,516 @@
# ADR-008: rel="me" Email Discovery Pattern
Date: 2025-11-20
## Status
Accepted
## Context
Gondulf's authentication flow requires email verification as part of two-factor domain verification (see ADR-005). This raises the question: How do we obtain the user's email address?
### Email Acquisition Methods Evaluated
**1. User-Provided Email Input**
- User manually enters their email address
- Server validates email domain matches identity domain
- Simple UX pattern (familiar from many sites)
**2. DNS TXT Record**
- Email address stored in DNS: `_email.example.com` TXT `user@example.com`
- Server queries DNS to discover email
- Requires DNS configuration
**3. rel="me" Link Discovery (IndieWeb Standard)**
- User publishes email on their site: `<link rel="me" href="mailto:user@example.com">`
- Server fetches site and parses HTML for rel="me" links
- Follows IndieWeb standards for identity claims
**4. WebFinger Protocol**
- Server queries `/.well-known/webfinger?resource={domain}`
- Standard protocol for identity discovery
- Requires additional endpoint implementation
### Requirements
From the user requirement and IndieAuth ecosystem:
- **Security**: Prevent social engineering and email spoofing
- **Simplicity**: Keep v1.0.0 implementation straightforward
- **Standards**: Align with IndieWeb/IndieAuth community practices
- **Self-Documenting**: Users should understand what they're publishing
### IndieWeb Context
The IndieWeb community uses `rel="me"` as a standard way to assert identity relationships:
- Users publish rel="me" links on their homepage to various profiles (GitHub, Twitter, email, etc.)
- Other tools can discover these relationships by parsing the page
- Well-established pattern in the IndieWeb ecosystem
- Reference implementation: https://thesatelliteoflove.com
## Decision
**Gondulf v1.0.0 will discover email addresses from rel="me" links published on the user's homepage, following the IndieWeb standard.**
### Implementation Approach
1. **Fetch User's Homepage**
- When user initiates authentication with domain (e.g., `https://example.com`)
- Server fetches the homepage over HTTPS
- Timeout: 10 seconds
- Follow redirects (max 5)
- Verify SSL certificate
2. **Parse HTML for rel="me" Links**
- Use BeautifulSoup for robust HTML parsing (handles malformed HTML)
- Search for `<link rel="me" href="mailto:...">` tags
- Also check `<a rel="me" href="mailto:...">` tags
- Extract first matching mailto: link
- Case-insensitive rel attribute matching
3. **Validate Email Format**
- Basic RFC 5322 format validation
- Length checks (max 254 characters per RFC 5321)
- Format: `user@domain.tld`
4. **Use Discovered Email**
- Send verification code to discovered email
- Display partially masked email to user: `u***@example.com`
- User cannot modify email (discovered automatically)
5. **Error Handling**
- If no rel="me" link found: Display setup instructions
- If multiple mailto: links: Use first one
- If site unreachable: Display error with retry option
- If SSL verification fails: Reject (security)
### Example HTML
User adds this to their homepage:
```html
<!DOCTYPE html>
<html>
<head>
<title>Phil Skents</title>
<!-- rel="me" link for email -->
<link rel="me" href="mailto:phil@example.com">
<!-- Other rel="me" links (optional) -->
<link rel="me" href="https://github.com/philskents">
<link rel="me" href="https://twitter.com/philskents">
</head>
<body>
<h1>Phil Skents</h1>
<p>This is my personal website.</p>
</body>
</html>
```
Or visible link:
```html
<a rel="me" href="mailto:phil@example.com">Email me</a>
```
## Rationale
### Follows IndieWeb Standards
**IndieWeb Alignment**:
- rel="me" is the standard way to assert identity in IndieWeb
- Users familiar with IndieAuth likely already have rel="me" configured
- Interoperability with other IndieWeb tools
- Well-documented pattern: https://indieweb.org/rel-me
**Community Expectations**:
- IndieAuth ecosystem uses rel="me" extensively
- Users understand the pattern
- Existing tutorials and documentation available
- Aligns with decentralized identity principles
### Security Benefits
**Prevents Social Engineering**:
- User cannot claim arbitrary email addresses
- Email must be published on the user's own site
- Attacker cannot trick user into entering wrong email
- Self-attested identity (user declares on their domain)
**Reduces Attack Surface**:
- No user input field for email (no typos, no XSS)
- No email enumeration via guessing
- Email discovery transparent and auditable
- User controls what email is published
**Transparency**:
- User explicitly publishes email on their site
- Public declaration of email relationship
- User aware they're making email public
- No hidden or implicit email collection
### Implementation Simplicity
**Standard Libraries**:
- BeautifulSoup: Robust HTML parsing (handles malformed HTML)
- requests: HTTP client (widely used, well-tested)
- No custom protocols or complex parsing
- Python standard library for email validation
**Error Handling**:
- Clear error messages with setup instructions
- Graceful degradation (site unavailable, etc.)
- Standard HTTP status codes
- No complex state management
**Testing**:
- Easy to mock HTTP responses
- Straightforward unit tests
- BeautifulSoup handles edge cases (malformed HTML)
- No external service dependencies
### User Experience
**Self-Documenting**:
- User adds one HTML tag to their site
- Clear relationship between domain and email
- User understands what they're publishing
- No hidden configuration
**Familiar Pattern**:
- Similar to verifying site ownership (Google Search Console, etc.)
- Adding meta tags is common web practice
- Many users already have rel="me" for other purposes
- Works with static sites (no backend required)
**Setup Time**:
- ~1 minute to add link tag
- No waiting (unlike DNS propagation)
- Immediate verification possible
- Can be combined with other rel="me" links
## Consequences
### Positive Consequences
1. **IndieWeb Standard Compliance**:
- Follows established rel="me" pattern
- Interoperability with IndieWeb tools
- Community-vetted approach
- Well-documented standard
2. **Enhanced Security**:
- No user-provided email input (prevents social engineering)
- Email explicitly published by user
- Transparent and auditable
- Reduces phishing risk
3. **Implementation Simplicity**:
- Standard libraries (BeautifulSoup, requests)
- No complex protocols
- Easy to test and maintain
- Handles malformed HTML gracefully
4. **User Control**:
- User explicitly declares email on their site
- Can change email by updating HTML
- No hidden email collection
- User aware of public email
5. **Flexibility**:
- Works with static sites (no backend needed)
- Can use any email provider
- Email can be at different domain (e.g., Gmail)
- Supports multiple rel="me" links
### Negative Consequences
1. **Public Email Requirement**:
- User must publish email publicly on their site
- Not suitable for users who want private email
- Email harvesters can discover address
- Spam risk (mitigated: users can use spam filters)
2. **HTML Parsing Complexity**:
- Must handle various HTML formats
- Malformed HTML can cause issues (mitigated: BeautifulSoup)
- Case sensitivity considerations
- Multiple possible HTML structures
3. **Website Dependency**:
- User's site must be available during authentication
- Site downtime blocks authentication
- No fallback if site unreachable
- Requires HTTPS (not all sites have valid certificates)
4. **Discovery Failures**:
- User may not have rel="me" configured
- Link may be in wrong format
- Email may be invalid format
- Clear error messages required
5. **Privacy Considerations**:
- Email addresses visible to anyone
- Cannot use email verification without public disclosure
- Users must accept public email
- May deter privacy-conscious users
### Mitigation Strategies
**For Public Email Concern**:
- Document clearly that email will be public
- Suggest using dedicated email for IndieAuth
- Recommend spam filtering
- Note: Email is user's choice (they publish it)
**For HTML Parsing**:
```python
from bs4 import BeautifulSoup
# BeautifulSoup handles malformed HTML gracefully
soup = BeautifulSoup(html_content, 'html.parser')
# Case-insensitive attribute matching
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Multiple link formats supported
# <link rel="me" href="mailto:user@example.com">
# <a rel="me" href="mailto:user@example.com">Email</a>
```
**For Website Dependency**:
- Clear error messages with retry option
- Suggest checking site availability
- Timeout limits (10 seconds)
- Log errors for debugging
**For Discovery Failures**:
```markdown
Error: No rel="me" email link found
Please add this to your homepage:
<link rel="me" href="mailto:your-email@example.com">
See: https://indieweb.org/rel-me for more information.
```
## Implementation
### Email Discovery Service
```python
from bs4 import BeautifulSoup
import requests
from typing import Optional
import re
class RelMeEmailDiscovery:
"""
Discover email addresses from rel="me" links on user's homepage.
"""
def discover_email(self, domain: str) -> Optional[str]:
"""
Fetch domain homepage and discover email from rel="me" link.
Args:
domain: User's domain (e.g., "example.com")
Returns:
Email address or None if not found
"""
url = f"https://{domain}"
try:
# Fetch homepage with safety limits
response = requests.get(
url,
timeout=10,
allow_redirects=True,
max_redirects=5,
verify=True # Verify SSL certificate
)
response.raise_for_status()
# Parse HTML (handles malformed HTML)
soup = BeautifulSoup(response.content, 'html.parser')
# Find all rel="me" links
# Both <link> and <a> tags supported
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Look for mailto: links
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format
if self._validate_email_format(email):
logger.info(f"Discovered email via rel='me' for {domain}")
return email
logger.warning(f"No rel='me' mailto: link found on {domain}")
return None
except requests.exceptions.SSLError as e:
logger.error(f"SSL verification failed for {domain}: {e}")
return None
except requests.exceptions.Timeout:
logger.error(f"Timeout fetching {domain}")
return None
except requests.exceptions.HTTPError as e:
logger.error(f"HTTP error fetching {domain}: {e}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
def _validate_email_format(self, email: str) -> bool:
"""
Validate email address format.
Args:
email: Email address to validate
Returns:
True if valid format, False otherwise
"""
# Basic RFC 5322 format check
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, email):
return False
# Length check (RFC 5321)
if len(email) > 254:
return False
# Must have exactly one @
if email.count('@') != 1:
return False
return True
```
### Error Messages
```python
# DNS TXT found, but no rel="me" link
error_message = """
Domain verified via DNS, but no email found on your site.
Please add this to your homepage:
<link rel="me" href="mailto:your-email@example.com">
This allows us to discover your email address automatically.
Learn more: https://indieweb.org/rel-me
"""
# Site unreachable
error_message = """
Could not fetch your site at https://{domain}
Please check:
- Site is accessible via HTTPS
- SSL certificate is valid
- No firewall blocking requests
Try again once your site is accessible.
"""
# Invalid email format in rel="me"
error_message = """
Found rel="me" link, but email format is invalid: {email}
Please check your rel="me" link uses valid email format:
<link rel="me" href="mailto:valid-email@example.com">
"""
```
## Alternatives Considered
### Alternative 1: User-Provided Email Input
**Pros**:
- Simpler implementation (no HTTP fetch, no parsing)
- Works even if site is down
- User can use private email (not public)
- Immediate (no HTTP round-trip)
**Cons**:
- Social engineering risk (attacker tricks user into entering wrong email)
- Typo risk (user enters incorrect email)
- No self-attestation (email not on user's site)
- Not aligned with IndieWeb standards
**Rejected**: Security risks outweigh simplicity benefits. rel="me" provides self-attestation and prevents social engineering.
---
### Alternative 2: DNS TXT Record for Email
**Pros**:
- Stronger proof of domain control (DNS)
- No website dependency
- Machine-readable format
- Fast lookups (DNS cache)
**Cons**:
- Requires DNS configuration (more complex than HTML)
- DNS propagation delays (can be hours)
- Not user-friendly for non-technical users
- Not standard IndieWeb practice
**Rejected**: DNS configuration is more complex than adding HTML tag. rel="me" is more aligned with IndieWeb standards.
---
### Alternative 3: WebFinger Protocol
**Pros**:
- Standard protocol (RFC 7033)
- Machine-readable format (JSON)
- Supports multiple identities
- Well-defined spec
**Cons**:
- Requires server-side endpoint (not for static sites)
- More complex implementation
- Not common in IndieWeb ecosystem
- Overkill for email discovery
**Rejected**: Too complex for v1.0.0 MVP. Doesn't work with static sites. rel="me" is simpler and more aligned with IndieWeb.
---
### Alternative 4: Well-Known URI
**Pros**:
- Standard approach (`/.well-known/email`)
- Simple file-based implementation
- No HTML parsing required
- Fast lookups
**Cons**:
- Not an established standard for email
- Requires server configuration
- Not aligned with IndieWeb practices
- Duplicate effort (rel="me" already exists)
**Rejected**: Not standard practice. rel="me" is already established in IndieWeb ecosystem.
## References
- IndieWeb rel="me": https://indieweb.org/rel-me
- Example Implementation: https://thesatelliteoflove.com (Phil Skents' identity page)
- HTML Link Relations (W3C): https://www.w3.org/TR/html5/links.html#linkTypes
- BeautifulSoup Documentation: https://www.crummy.com/software/BeautifulSoup/
- RFC 5322 (Email Format): https://datatracker.ietf.org/doc/html/rfc5322
- RFC 5321 (SMTP): https://datatracker.ietf.org/doc/html/rfc5321
- WebFinger (RFC 7033): https://datatracker.ietf.org/doc/html/rfc7033 (alternative considered)
## Decision History
- 2025-11-20: Proposed (Architect)
- 2025-11-20: Accepted (Architect)
- Related to ADR-005 (Two-Factor Domain Verification)