docs: add Phase 2 domain verification design and clarifications

Add comprehensive Phase 2 documentation:
- Complete design document for two-factor domain verification
- Implementation guide with code examples
- ADR for implementation decisions (ADR-0004)
- ADR for rel="me" email discovery (ADR-008)
- Phase 1 impact assessment
- All 23 clarification questions answered
- Updated architecture docs (indieauth-protocol, security)
- Updated ADR-005 with rel="me" approach
- Updated backlog with technical debt items

Design ready for Phase 2 implementation.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-11-20 13:05:09 -07:00
parent bebd47955f
commit 6f06aebf40
10 changed files with 5605 additions and 410 deletions

View File

@@ -162,26 +162,34 @@ Accept: text/html
- Reject non-200 responses
- Log client_id fetch failures
#### Authentication Flow (v1.0.0: Email-based)
#### Authentication Flow (v1.0.0: Two-Factor Domain Verification)
1. **Domain Ownership Check**
- Check if `me` domain has verified TXT record: `_gondulf.example.com` = `verified`
- If found and cached, skip email verification
- If not found, proceed to email verification
1. **DNS TXT Record Verification (Required)**
- Check if `me` domain has TXT record: `_gondulf.{domain}` = `verified`
- Query multiple DNS resolvers (Google 8.8.8.8, Cloudflare 1.1.1.1)
- Require consensus from at least 2 resolvers
- If not found: Display error with instructions to add TXT record
- If found: Proceed to email discovery
- Proves: User controls DNS for the domain
2. **Email Verification**
- Display form requesting email address
- Validate email is at `me` domain (e.g., `admin@example.com` for `https://example.com`)
2. **Email Discovery via rel="me" (Required)**
- Fetch user's domain homepage (e.g., https://example.com)
- Parse HTML for `<link rel="me" href="mailto:user@example.com">` or `<a rel="me" href="mailto:user@example.com">`
- Extract email address from first matching mailto: link
- If not found: Display error with instructions to add rel="me" link
- If found: Proceed to email verification
- Proves: User has published email relationship on their site
- Reference: https://indieweb.org/rel-me
3. **Email Verification Code (Required)**
- Generate 6-digit verification code (cryptographically random)
- Store code in memory with 15-minute TTL
- Send code via SMTP
- Display code entry form
3. **Code Verification**
- Send code to discovered email address via SMTP
- Display code entry form showing discovered email (partially masked)
- User enters 6-digit code
- Validate code matches and hasn't expired
- Validate code matches and hasn't expired (max 3 attempts)
- Proves: User controls the email account
- Mark domain as verified (store in database)
- Proceed to authorization
4. **User Consent**
- Display authorization prompt:
@@ -208,6 +216,8 @@ Accept: text/html
Location: {redirect_uri}?code={code}&state={state}
```
**Security Model**: Two-factor verification requires BOTH DNS control AND email control. An attacker would need to compromise both to authenticate fraudulently.
#### Error Responses
Return error via redirect when possible:
@@ -404,18 +414,19 @@ Future implementation per RFC 7009.
```python
{
"email": "admin@example.com",
"email": "admin@example.com", # Discovered from rel="me", not user-provided
"code": "123456", # 6-digit string
"domain": "example.com",
"created_at": datetime,
"expires_at": datetime, # created_at + 15 minutes
"attempts": 0 # Rate limiting
"attempts": 0 # Rate limiting (max 3 attempts)
}
```
**Storage**: Python dict with TTL management
**Email Source**: Discovered from site's rel="me" link (not user input)
**Expiration**: 15 minutes
**Rate Limiting**: Max 3 attempts per email
**Rate Limiting**: Max 3 attempts per email, max 3 codes per domain per hour
**Cleanup**: Automatic expiration via TTL
### Access Token (SQLite)
@@ -448,18 +459,21 @@ CREATE TABLE tokens (
CREATE TABLE domains (
id INTEGER PRIMARY KEY AUTOINCREMENT,
domain TEXT NOT NULL UNIQUE,
verification_method TEXT NOT NULL, -- 'txt_record' or 'email'
verification_method TEXT NOT NULL, -- 'two_factor' (DNS + Email)
verified_at TIMESTAMP NOT NULL,
last_checked TIMESTAMP,
txt_record_valid BOOLEAN DEFAULT 0,
last_dns_check TIMESTAMP,
dns_txt_valid BOOLEAN DEFAULT 0,
last_email_check TIMESTAMP,
INDEX idx_domain (domain)
);
```
**Purpose**: Cache domain ownership verification
**TXT Record**: Re-verified periodically (daily)
**Email Verification**: Permanent unless admin deletes
**Verification Method**: Always 'two_factor' in v1.0.0 (DNS TXT + Email via rel="me")
**DNS TXT**: Re-verified periodically (daily check)
**Email**: NOT stored (only verification timestamp recorded)
**Re-verification**: DNS checked periodically, email re-verified on each login
**Cleanup**: Optional (admin decision)
## Security Considerations

View File

@@ -0,0 +1,809 @@
# Phase 1 Impact Assessment: Authentication Flow Change
**Date**: 2025-11-20
**Architect**: Claude (Architect Agent)
**Related ADRs**: ADR-005 (updated), ADR-008 (new)
**Related Report**: /docs/reports/2025-11-20-phase-1-foundation.md
## Summary
The authentication design has been updated to require BOTH DNS TXT verification AND email verification via rel="me" discovery. This change impacts Phase 1 implementation and defines new requirements for Phase 2.
## Authentication Flow Change
### Original Design (ADR-005 v1)
- **Primary**: Email verification (user provides email)
- **Optional**: DNS TXT verification (fast-path to skip email)
- **Flow**: DNS check → if not found, request email → send code → verify code
### Updated Design (ADR-005 v2 + ADR-008)
- **Required Factor 1**: DNS TXT verification (`_gondulf.{domain}` = `verified`)
- **Required Factor 2**: Email verification via rel="me" discovery
- **Flow**: DNS check → rel="me" discovery → send code to discovered email → verify code
### Key Differences
| Aspect | Original | Updated |
|--------|----------|---------|
| DNS TXT | Optional (fast-path) | Required (first factor) |
| Email Discovery | User input | rel="me" link parsing |
| Email Verification | Optional (fallback) | Required (second factor) |
| Security Model | Single-factor | Two-factor |
| Attack Resistance | Moderate | High (requires DNS + email control) |
| Setup Complexity | Lower (email only works) | Higher (both required) |
## Phase 1 Implementation Impact
### What Phase 1 Implemented
Phase 1 successfully implemented:
- ✅ Configuration management (GONDULF_* environment variables)
- ✅ Database layer with migrations (SQLite, SQLAlchemy Core)
- ✅ In-memory code storage (TTL-based expiration)
- ✅ Email service (SMTP with STARTTLS support)
- ✅ DNS service (TXT record querying with fallback resolvers)
- ✅ Structured logging
- ✅ FastAPI application with health check endpoint
- ✅ 94.16% test coverage (96 tests passing)
### Does Phase 1 Need Changes?
**Answer: NO. Phase 1 implementation remains valid.**
#### Analysis
**Email Service** (`src/gondulf/email.py`):
- Current: Generic email sending service
- Change Impact: **None**
- Reason: Email service sends codes to any email address. Whether email is user-provided or rel="me"-discovered doesn't affect this service.
- Status: **No changes needed**
**DNS Service** (`src/gondulf/dns.py`):
- Current: TXT record verification with fallback resolvers
- Change Impact: **None**
- Reason: DNS service already implements TXT record verification as designed. Changing from "optional" to "required" is a business logic change, not a DNS service change.
- Status: **No changes needed**
**In-Memory Storage** (`src/gondulf/storage.py`):
- Current: TTL-based code storage
- Change Impact: **None**
- Reason: Storage mechanism is independent of how email is discovered or whether DNS is optional/required.
- Status: **No changes needed**
**Database Schema** (`001_initial_schema.sql`):
- Current: `domains` table with `domain`, `verification_method`, `verified_at`
- Change Impact: **Minor update needed in Phase 2**
- Reason: Schema already supports storing verification method. Will need to update from `'txt_record'` or `'email'` to `'two_factor'` when storing records.
- Status: **Schema structure OK, values will change in Phase 2**
**Configuration** (`src/gondulf/config.py`):
- Current: SMTP configuration, DNS configuration, timeouts
- Change Impact: **None immediately, optional addition in Phase 2**
- Reason: Current configuration supports both email and DNS. May want to add timeout for HTML fetching in Phase 2.
- Status: **No changes needed now**
### Phase 1 Status: APPROVED
Phase 1 implementation remains valid and does NOT require any revisions due to the authentication flow change. All Phase 1 components are foundational services that work regardless of how they're orchestrated in the authentication flow.
## Phase 2 Requirements: New Implementation Needs
Phase 2 must now implement the updated authentication flow. Here's what needs to be built:
### 1. HTML Fetching Service (NEW)
**Purpose**: Fetch user's homepage to discover rel="me" links
**Implementation**:
```python
# src/gondulf/html_fetcher.py
import requests
from typing import Optional
class HTMLFetcherService:
"""
Fetch user's homepage over HTTPS.
"""
def __init__(self, timeout: int = 10):
self.timeout = timeout
self.max_redirects = 5
self.max_size = 5 * 1024 * 1024 # 5MB
def fetch_site(self, domain: str) -> Optional[str]:
"""
Fetch site HTML content.
Args:
domain: Domain to fetch (e.g., "example.com")
Returns:
HTML content as string, or None if fetch fails
"""
url = f"https://{domain}"
try:
response = requests.get(
url,
timeout=self.timeout,
allow_redirects=True,
max_redirects=self.max_redirects,
verify=True # Enforce SSL verification
)
response.raise_for_status()
# Check content size
if len(response.content) > self.max_size:
raise ValueError(f"Response too large: {len(response.content)} bytes")
return response.text
except requests.exceptions.SSLError as e:
logger.error(f"SSL verification failed for {domain}: {e}")
return None
except requests.exceptions.Timeout:
logger.error(f"Timeout fetching {domain}")
return None
except Exception as e:
logger.error(f"Failed to fetch {domain}: {e}")
return None
```
**Dependencies**:
- `requests` library (already in pyproject.toml)
- Timeout configuration (add to Config if needed)
**Tests Required**:
- Successful HTTPS fetch
- SSL verification failure
- Timeout handling
- HTTP error codes (404, 500, etc.)
- Redirect following
- Size limit enforcement
---
### 2. rel="me" Email Discovery Service (NEW)
**Purpose**: Parse HTML to discover email from rel="me" links
**Implementation**:
```python
# src/gondulf/relme.py
from bs4 import BeautifulSoup
from typing import Optional
import re
class RelMeDiscoveryService:
"""
Discover email addresses from rel="me" links in HTML.
"""
def discover_email(self, html_content: str) -> Optional[str]:
"""
Parse HTML and discover email from rel="me" link.
Args:
html_content: HTML content as string
Returns:
Email address or None if not found
"""
try:
# Parse HTML (BeautifulSoup handles malformed HTML)
soup = BeautifulSoup(html_content, 'html.parser')
# Find all rel="me" links (<link> and <a> tags)
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Look for mailto: links
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format
if self._validate_email_format(email):
logger.info(f"Discovered email via rel='me': {email[:3]}***")
return email
logger.warning("No rel='me' mailto: link found in HTML")
return None
except Exception as e:
logger.error(f"Failed to parse HTML: {e}")
return None
def _validate_email_format(self, email: str) -> bool:
"""Validate email address format (RFC 5322 simplified)."""
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, email):
return False
if len(email) > 254: # RFC 5321 maximum
return False
if email.count('@') != 1:
return False
return True
```
**Dependencies**:
- `beautifulsoup4` library (add to pyproject.toml)
- `html.parser` (Python standard library)
**Tests Required**:
- Discovery from `<link rel="me">` tags
- Discovery from `<a rel="me">` tags
- Multiple rel="me" links (select first mailto)
- Malformed HTML handling
- Missing rel="me" links
- Invalid email format in link
- Edge cases (empty href, non-mailto links, etc.)
---
### 3. Domain Verification Service (UPDATED)
**Purpose**: Orchestrate two-factor verification (DNS + Email)
**Implementation**:
```python
# src/gondulf/domain_verification.py
from typing import Tuple, Optional
from .dns import DNSService
from .html_fetcher import HTMLFetcherService
from .relme import RelMeDiscoveryService
from .email import EmailService
from .storage import CodeStorage
class DomainVerificationService:
"""
Two-factor domain verification service.
Verifies domain ownership through:
1. DNS TXT record verification
2. Email verification via rel="me" discovery
"""
def __init__(
self,
dns_service: DNSService,
html_fetcher: HTMLFetcherService,
relme_discovery: RelMeDiscoveryService,
email_service: EmailService,
code_storage: CodeStorage
):
self.dns = dns_service
self.html_fetcher = html_fetcher
self.relme = relme_discovery
self.email = email_service
self.code_storage = code_storage
def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]:
"""
Start domain verification process.
Returns: (success, discovered_email, error_message)
Raises HTTPException with appropriate error if verification cannot start.
"""
# Step 1: Verify DNS TXT record
dns_verified = self.dns.verify_txt_record(domain, "verified")
if not dns_verified:
error = f"DNS TXT record not found for {domain}. Please add: _gondulf.{domain} TXT verified"
return False, None, error
# Step 2: Fetch site and discover email
html = self.html_fetcher.fetch_site(domain)
if html is None:
error = f"Could not fetch site at https://{domain}. Please ensure site is accessible via HTTPS."
return False, None, error
# Step 3: Discover email from rel="me"
email = self.relme.discover_email(html)
if email is None:
error = 'No rel="me" mailto: link found. Please add: <link rel="me" href="mailto:you@example.com">'
return False, None, error
# Step 4: Generate and send verification code
code = self._generate_code()
self.code_storage.store(email, code, ttl=900) # 15 minutes
email_sent = self.email.send_verification_email(email, code)
if not email_sent:
error = f"Failed to send verification email to {email}. Please try again."
return False, email, error
# Success: code sent to discovered email
return True, email, None
def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, str]:
"""
Verify submitted code.
Returns: (success, domain_or_error_message)
"""
stored_data = self.code_storage.get(email)
if stored_data is None:
return False, "No verification code found. Please restart verification."
code, domain = stored_data
# Verify code (constant-time comparison)
if not secrets.compare_digest(submitted_code, code):
return False, "Invalid code. Please try again."
# Success: mark code as used
self.code_storage.delete(email)
return True, domain
def _generate_code(self) -> str:
"""Generate 6-digit verification code."""
return ''.join(secrets.choice('0123456789') for _ in range(6))
```
**Dependencies**:
- All Phase 1 services (DNS, Email, Storage)
- New HTML fetcher service
- New rel="me" discovery service
**Tests Required**:
- Full verification flow (DNS → rel="me" → email → code)
- DNS verification failure
- Site fetch failure
- rel="me" discovery failure
- Email send failure
- Code verification success/failure
- Multiple attempts tracking
- Code expiration
---
### 4. Domain Verification UI Endpoints (NEW)
**Purpose**: HTTP endpoints for user interaction
**Implementation**:
```python
# src/gondulf/routers/verification.py
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
router = APIRouter(prefix="/verify", tags=["verification"])
class VerificationStartRequest(BaseModel):
domain: str
class VerificationStartResponse(BaseModel):
success: bool
email_masked: Optional[str] # e.g., "u***@example.com"
error: Optional[str]
class VerificationCodeRequest(BaseModel):
email: str
code: str
class VerificationCodeResponse(BaseModel):
success: bool
domain: Optional[str]
error: Optional[str]
@router.post("/start", response_model=VerificationStartResponse)
async def start_verification(request: VerificationStartRequest):
"""
Start domain verification process.
Steps:
1. Verify DNS TXT record
2. Discover email from rel="me"
3. Send verification code to email
"""
success, email, error = domain_verification_service.start_verification(request.domain)
if not success:
return VerificationStartResponse(success=False, email_masked=None, error=error)
# Mask email for display: u***@example.com
masked_email = f"{email[0]}***@{email.split('@')[1]}"
return VerificationStartResponse(
success=True,
email_masked=masked_email,
error=None
)
@router.post("/code", response_model=VerificationCodeResponse)
async def verify_code(request: VerificationCodeRequest):
"""
Verify submitted code.
Returns domain if code is valid.
"""
success, result = domain_verification_service.verify_code(request.email, request.code)
if not success:
return VerificationCodeResponse(success=False, domain=None, error=result)
return VerificationCodeResponse(success=True, domain=result, error=None)
```
**Dependencies**:
- FastAPI router
- Pydantic models
- Domain verification service
**Tests Required**:
- POST /verify/start success case
- POST /verify/start with DNS failure
- POST /verify/start with rel="me" failure
- POST /verify/start with email send failure
- POST /verify/code success case
- POST /verify/code with invalid code
- POST /verify/code with expired code
- POST /verify/code with missing code
---
### 5. Authorization Endpoint Integration (UPDATED)
**Changes to Authorization Flow**:
**Before** (original design):
```
1. User enters domain (me parameter)
2. Display form: "Enter your email at {domain}"
3. User enters email manually
4. Send code, user enters code
5. Display consent screen
```
**After** (updated design):
```
1. User enters domain (me parameter)
2. Server performs two-factor verification:
a. Verify DNS TXT record
b. Discover email from rel="me"
c. Send code to discovered email
3. Display code entry form (show discovered email masked)
4. User enters code
5. Display consent screen
```
**Implementation Changes**:
- Call `DomainVerificationService.start_verification()` instead of requesting email from user
- Update UI to show "Sending code to u***@example.com" instead of email input form
- Handle new error cases (DNS not found, rel="me" not found, site unreachable)
---
## Phase 2 Feature Breakdown
### New Dependencies to Add
**pyproject.toml additions**:
```toml
[project]
dependencies = [
# ... existing dependencies
"beautifulsoup4>=4.12.0", # HTML parsing for rel="me" discovery
]
```
### New Source Files
1. `src/gondulf/html_fetcher.py` - HTML fetching service
2. `src/gondulf/relme.py` - rel="me" email discovery service
3. `src/gondulf/domain_verification.py` - Two-factor verification orchestration
4. `src/gondulf/routers/verification.py` - Verification endpoints (if implemented separately from authorization)
### Updated Files
1. `src/gondulf/main.py` - Register new routers, initialize new services
2. `src/gondulf/config.py` - Optional: add HTML fetch timeout config
3. Database migration (002_update_verification_method.sql) - Change domain.verification_method values
### New Test Files
1. `tests/unit/test_html_fetcher.py` - HTML fetching tests
2. `tests/unit/test_relme.py` - rel="me" discovery tests
3. `tests/unit/test_domain_verification.py` - Verification service tests
4. `tests/integration/test_verification_endpoints.py` - Verification endpoint tests
### Estimated Effort
**New Components**:
- HTML Fetcher Service: 0.5 days
- rel="me" Discovery Service: 0.5 days
- Domain Verification Service: 1 day
- Verification Endpoints: 0.5 days
- Tests (all new components): 1 day
**Total New Work**: ~3.5 days
**Authorization Endpoint** (already planned):
- Original estimate: 3-5 days
- Updated estimate: 3-5 days (same - just uses DomainVerificationService)
## Database Schema Updates
### Migration: 002_update_verification_method.sql
```sql
-- Update verification_method values from single-factor to two-factor
-- This is a data migration, not schema change
UPDATE domains
SET verification_method = 'two_factor'
WHERE verification_method IN ('txt_record', 'email');
-- No schema changes needed - 'verification_method' column already exists
```
**When to Apply**: Phase 2, before authorization endpoint implementation
## Error Message Updates
### DNS TXT Not Found
```
DNS Verification Failed
Please add this TXT record to your domain's DNS:
Type: TXT
Name: _gondulf.example.com
Value: verified
DNS changes may take up to 24 hours to propagate.
Need help? See: https://docs.gondulf.example.com/setup/dns
```
### rel="me" Not Found
```
Email Discovery Failed
Could not find a rel="me" email link on your homepage.
Please add this to your homepage (https://example.com):
<link rel="me" href="mailto:your-email@example.com">
This declares your email address for IndieAuth verification.
Learn more: https://indieweb.org/rel-me
```
### Site Unreachable
```
Site Fetch Failed
Could not fetch your site at https://example.com
Please check:
- Site is accessible via HTTPS
- SSL certificate is valid
- No firewall blocking requests
Try again once your site is accessible.
```
### Email Send Failure
```
Email Delivery Failed
Failed to send verification code to u***@example.com
Please check:
- Email address is correct in your rel="me" link
- Email server is accepting mail
- Check spam/junk folder
Try again, or contact support if the issue persists.
```
## Documentation Updates Needed
### User Documentation (Phase 2)
1. **Setup Guide**: `/docs/user/setup.md`
- Step 1: Add DNS TXT record
- Step 2: Add rel="me" link to homepage
- Step 3: Test verification
2. **Troubleshooting**: `/docs/user/troubleshooting.md`
- DNS verification failures
- rel="me" discovery issues
- Email delivery problems
3. **Examples**: `/docs/user/examples.md`
- Example HTML with rel="me" link
- Example DNS configuration (various providers)
### Developer Documentation (Phase 2)
1. **API Reference**: `/docs/api/verification.md`
- POST /verify/start endpoint
- POST /verify/code endpoint
- Error codes and responses
2. **Architecture**: `/docs/architecture/domain-verification.md`
- Two-factor verification flow diagram
- Service interaction diagram
- Error handling flowchart
## Security Considerations for Phase 2
### New Attack Surfaces
1. **HTML Parsing**:
- Risk: Malicious HTML exploiting parser
- Mitigation: BeautifulSoup handles untrusted HTML safely
- Test: Fuzzing with malformed HTML
2. **HTTPS Fetching**:
- Risk: SSL verification bypass
- Mitigation: Enforce `verify=True` in requests
- Test: Attempt to fetch site with invalid certificate (must fail)
3. **rel="me" Spoofing**:
- Risk: Attacker adds rel="me" to compromised site
- Mitigation: Two-factor requirement (also need DNS control)
- Test: Verify DNS check happens BEFORE rel="me" discovery
### Security Testing Required
1. **Input Validation**:
- Malformed domain names
- Oversized HTML responses (>5MB)
- Invalid email formats in rel="me" links
2. **TLS Enforcement**:
- Verify HTTPS-only fetching
- Verify SSL certificate validation
- Reject sites with invalid certificates
3. **Rate Limiting** (future):
- Prevent bulk rel="me" discovery
- Limit verification attempts per domain
## Configuration Updates
### Optional New Config
```python
# src/gondulf/config.py
class Config:
# ... existing config
# HTML Fetching (optional, has sensible defaults)
HTML_FETCH_TIMEOUT: int = 10 # seconds
HTML_MAX_SIZE: int = 5 * 1024 * 1024 # 5MB
HTML_MAX_REDIRECTS: int = 5
```
### Environment Variables
```bash
# .env.example additions (optional)
# HTML Fetching Configuration (optional - has defaults)
GONDULF_HTML_FETCH_TIMEOUT=10 # Timeout for fetching user's site (seconds)
GONDULF_HTML_MAX_SIZE=5242880 # Maximum HTML size (bytes, default 5MB)
GONDULF_HTML_MAX_REDIRECTS=5 # Maximum redirects to follow
```
## Testing Strategy for Phase 2
### Unit Tests
**HTML Fetcher**:
- Mock successful HTTPS response
- Mock SSL verification failure
- Mock timeout
- Mock HTTP errors (404, 500, etc.)
- Mock size limit exceeded
- Mock redirect following
**rel="me" Discovery**:
- Parse `<link rel="me" href="mailto:...">`
- Parse `<a rel="me" href="mailto:...">`
- Handle malformed HTML
- Handle missing rel="me" links
- Handle invalid email in link
- Handle multiple rel="me" links (select first)
**Domain Verification Service**:
- Full two-factor flow success
- DNS verification failure
- Site fetch failure
- rel="me" discovery failure
- Email send failure
- Code verification success/failure
### Integration Tests
**Verification Endpoints**:
- POST /verify/start with valid domain (mock services)
- POST /verify/start with DNS failure
- POST /verify/start with rel="me" failure
- POST /verify/code with valid code
- POST /verify/code with invalid code
### End-to-End Tests (Future)
- Complete verification flow with real HTML
- Authorization flow integration
- Token issuance after successful verification
## Acceptance Criteria for Phase 2
Phase 2 will be considered complete when:
1. ✅ HTML fetcher service implemented and tested
2. ✅ rel="me" discovery service implemented and tested
3. ✅ Domain verification service orchestrates two-factor verification
4. ✅ Verification endpoints return correct responses for all cases
5. ✅ Error messages are clear and actionable
6. ✅ All new tests passing (unit + integration)
7. ✅ Test coverage remains >80% overall
8. ✅ Security testing complete (HTML parsing, TLS enforcement)
9. ✅ Documentation updated (user setup guide, API reference)
10. ✅ Database migration applied successfully
## Timeline Estimate
**Phase 2 Components**:
- HTML Fetcher: 0.5 days
- rel="me" Discovery: 0.5 days
- Domain Verification Service: 1 day
- Verification Endpoints: 0.5 days
- Testing: 1 day
- Documentation: 0.5 days
**Total New Work**: ~4 days
**Authorization Endpoint** (already planned):
- Original estimate: 3-5 days
- Updated estimate: 3-5 days (uses DomainVerificationService)
**Phase 2 Total**: ~7-9 days (vs. original estimate of 3-5 days)
**Impact**: +4 days of work due to authentication flow change
## Recommendation
**Phase 1**: APPROVED as-is. No changes needed.
**Phase 2**: Proceed with implementation of:
1. HTML fetching service
2. rel="me" discovery service
3. Domain verification service (two-factor orchestration)
4. Verification endpoints
5. Updated authorization endpoint to use domain verification service
The additional work (HTML fetching + rel="me" discovery) adds ~4 days to Phase 2, bringing total Phase 2 estimate to 7-9 days instead of original 3-5 days.
## Sign-off
**Assessment Status**: Complete
**Phase 1 Impact**: None - Phase 1 approved as-is
**Phase 2 Impact**: Additional 4 days of work for new services
**Risk Level**: Low - All new work is well-scoped and testable
**Ready to Proceed**: Yes
---
**Assessment completed**: 2025-11-20
**Architect**: Claude (Architect Agent)

View File

@@ -58,108 +58,174 @@ Gondulf follows a defense-in-depth security model with these core principles:
## Authentication Security
### Email-Based Verification (v1.0.0)
### Two-Factor Domain Verification (v1.0.0)
**Mechanism**: Users prove domain ownership by receiving verification code at email address on that domain.
**Mechanism**: Users prove domain ownership through TWO independent factors:
1. **DNS TXT Record**: Proves DNS control (`_gondulf.{domain}` = `verified`)
2. **Email via rel="me"**: Proves email control (discovered from site's rel="me" link)
**Security Model**: An attacker must compromise BOTH factors to authenticate fraudulently. This is significantly stronger than single-factor verification.
#### Threat: Email Interception
**Risk**: Attacker intercepts email containing verification code.
**Mitigations**:
1. **Short Code Lifetime**: 15-minute expiration
2. **Single Use**: Code invalidated after verification
3. **Rate Limiting**: Max 3 code requests per email per hour
4. **TLS Email Delivery**: Require STARTTLS for SMTP
5. **Display Warning**: "Only request code if you initiated this login"
1. **Two-Factor Requirement**: Email alone is insufficient (DNS also required)
2. **Short Code Lifetime**: 15-minute expiration
3. **Single Use**: Code invalidated after verification
4. **Rate Limiting**: Max 3 code requests per domain per hour
5. **TLS Email Delivery**: Require STARTTLS for SMTP
6. **Display Warning**: "Only request code if you initiated this login"
**Residual Risk**: Acceptable for v1.0.0 given short lifetime and single-use.
**Residual Risk**: Low. Even with email interception, attacker still needs DNS control.
#### Threat: Code Brute Force
**Risk**: Attacker guesses 6-digit verification code.
**Mitigations**:
1. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
2. **Attempt Limiting**: Max 3 attempts per email
3. **Short Lifetime**: 15-minute window
4. **Rate Limiting**: Max 10 attempts per IP per hour
5. **Exponential Backoff**: 5-second delay after each failed attempt
1. **Two-Factor Requirement**: Code alone is insufficient (DNS also required)
2. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
3. **Attempt Limiting**: Max 3 attempts per email
4. **Short Lifetime**: 15-minute window
5. **Rate Limiting**: Max 3 codes per domain per hour
6. **Single-Use**: Code invalidated after use
**Math**:
- 3 attempts × 1,000,000 codes = 0.0003% success probability
- 15-minute window limits attack time
- Rate limiting prevents distributed guessing
- Even if guessed, attacker still needs DNS control
**Residual Risk**: Very low, acceptable for v1.0.0.
**Residual Risk**: Very low. Two-factor requirement makes brute force insufficient.
#### Threat: DNS TXT Record Spoofing
**Risk**: Attacker attempts to spoof DNS responses.
**Mitigations**:
1. **Multiple Resolvers**: Query 2+ independent DNS servers (Google, Cloudflare)
2. **Consensus Required**: Require agreement from at least 2 resolvers
3. **DNSSEC Support**: Validate DNSSEC signatures when available (future)
4. **Timeout Handling**: Fail securely if DNS unavailable
5. **Logging**: Log all DNS verification attempts
**Residual Risk**: Low. Spoofing multiple independent resolvers is difficult.
#### Threat: rel="me" Link Spoofing
**Risk**: Attacker compromises user's website to add malicious rel="me" link.
**Mitigations**:
1. **Two-Factor Requirement**: Website compromise alone insufficient (DNS also required)
2. **HTTPS Required**: Fetch site over TLS (prevents MITM)
3. **Certificate Validation**: Verify SSL certificate
4. **Email Domain Matching**: Email should match site domain (warning if not)
5. **User Education**: Inform users to secure their website
**Residual Risk**: Moderate. If attacker compromises both DNS and website, they can authenticate. This is acceptable as it represents full domain compromise.
#### Threat: Email Address Enumeration
**Risk**: Attacker discovers which domains are registered by requesting codes.
**Risk**: Attacker discovers email addresses by triggering rel="me" discovery.
**Mitigations**:
1. **Consistent Response**: Always say "If email exists, code sent"
2. **No Error Differentiation**: Same message for valid/invalid emails
3. **Rate Limiting**: Prevent bulk enumeration
1. **Public Information**: rel="me" links are intentionally public
2. **User Awareness**: Users know they're publishing email on their site
3. **Rate Limiting**: Prevent bulk scanning
4. **Robots.txt**: Users can restrict crawler access if desired
**Residual Risk**: Minimal, domain names are public anyway (DNS).
**Residual Risk**: Minimal. Email addresses are intentionally published by users on their own sites.
### Domain Ownership Verification
### Domain Ownership Verification (Two-Factor)
#### TXT Record Validation (Preferred)
**Mechanism**: v1.0.0 requires BOTH verification methods:
**Mechanism**: Admin adds DNS TXT record `_gondulf.example.com` = `verified`.
#### 1. TXT Record Validation (Required)
**Mechanism**: Admin adds DNS TXT record `_gondulf.{domain}` = `verified`.
**Security Properties**:
- Requires DNS control (stronger than email)
- Proves DNS control (first factor)
- Verifiable without user interaction
- Cacheable for performance
- Re-verifiable periodically
**Threat: DNS Spoofing**
**Mitigations**:
1. **DNSSEC**: Validate DNSSEC signatures if available
2. **Multiple Resolvers**: Query 2+ DNS servers, require consensus
3. **Caching**: Cache valid results, re-verify daily
4. **Logging**: Log all DNS verification attempts
**Implementation**:
```python
import dns.resolver
import dns.dnssec
def verify_txt_record(domain: str) -> bool:
"""
Verify _gondulf.{domain} TXT record exists with value 'verified'.
Requires consensus from multiple independent resolvers.
"""
try:
# Use Google and Cloudflare DNS for redundancy
resolvers = ['8.8.8.8', '1.1.1.1']
results = []
verified_count = 0
for resolver_ip in resolvers:
resolver = dns.resolver.Resolver()
resolver.nameservers = [resolver_ip]
resolver.timeout = 5
resolver.lifetime = 5
answers = resolver.resolve(f'_gondulf.{domain}', 'TXT')
for rdata in answers:
txt_value = rdata.to_text().strip('"')
if txt_value == 'verified':
results.append(True)
verified_count += 1
break
# Require consensus from both resolvers
return len(results) >= 2
# Require consensus from at least 2 resolvers
return verified_count >= 2
except Exception as e:
logger.warning(f"DNS verification failed for {domain}: {e}")
return False
```
**Residual Risk**: Low, DNS is foundational internet infrastructure.
#### 2. Email Verification via rel="me" (Required)
**Mechanism**: Email discovered from site's `<link rel="me" href="mailto:...">`, then verified with code.
**Security Properties**:
- Proves website control (can modify HTML)
- Proves email control (receives and enters code)
- Follows IndieWeb standards (rel="me")
- Self-documenting (user declares email publicly)
**Implementation**:
```python
from bs4 import BeautifulSoup
import requests
def discover_email_from_site(domain: str) -> Optional[str]:
"""
Fetch site and discover email from rel="me" link.
"""
try:
response = requests.get(f"https://{domain}", timeout=10, allow_redirects=True)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
if validate_email_format(email):
return email
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
```
**Combined Residual Risk**: Low. Attacker must compromise DNS, website, and email account to authenticate fraudulently.
## Authorization Security
@@ -431,15 +497,80 @@ class AuthorizeRequest(BaseModel):
**Residual Risk**: Minimal, Pydantic provides strong validation.
### HTML Parsing Security (rel="me" Discovery)
#### Threat: Malicious HTML Injection
**Risk**: Attacker's site contains malicious HTML to exploit parser.
**Mitigations**:
1. **Robust Parser**: Use BeautifulSoup (handles malformed HTML safely)
2. **Link Extraction Only**: Only extract href attributes, no script execution
3. **Timeout**: 10-second timeout for HTTP requests
4. **Size Limit**: Limit response size (prevent memory exhaustion)
5. **HTTPS Required**: Fetch over TLS only
6. **Certificate Validation**: Verify SSL certificates
**Implementation**:
```python
from bs4 import BeautifulSoup
import requests
def discover_email_from_site(domain: str) -> Optional[str]:
"""
Safely discover email from rel="me" link.
"""
try:
# Fetch with safety limits
response = requests.get(
f"https://{domain}",
timeout=10,
allow_redirects=True,
max_redirects=5,
stream=True # Don't load entire response into memory
)
response.raise_for_status()
# Limit response size (prevent memory exhaustion)
MAX_SIZE = 5 * 1024 * 1024 # 5MB
content = response.raw.read(MAX_SIZE)
# Parse HTML (BeautifulSoup handles malformed HTML safely)
soup = BeautifulSoup(content, 'html.parser')
# Find rel="me" links (no script execution)
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Extract mailto: links only
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format before returning
if validate_email_format(email):
return email
return None
except requests.exceptions.SSLError as e:
logger.error(f"SSL certificate validation failed for {domain}: {e}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
```
**Residual Risk**: Very low. BeautifulSoup is designed for untrusted HTML.
### Email Validation
#### Threat: Email Injection Attacks
**Risk**: Attacker injects SMTP commands via email address field.
**Risk**: Attacker crafts malicious email address in rel="me" link.
**Mitigations**:
1. **Format Validation**: Strict email regex (RFC 5322)
2. **Domain Matching**: Require email domain match `me` domain
2. **No User Input**: Email discovered from site (not user-provided)
3. **SMTP Library**: Use well-tested library (smtplib)
4. **Content Encoding**: Encode email content properly
5. **Rate Limiting**: Prevent abuse
@@ -447,31 +578,27 @@ class AuthorizeRequest(BaseModel):
**Validation**:
```python
import re
from email.utils import parseaddr
def validate_email(email: str, required_domain: str) -> tuple[bool, str]:
def validate_email_format(email: str) -> bool:
"""
Validate email address and domain match.
Validate email address format.
"""
# Parse email (RFC 5322 compliant)
name, addr = parseaddr(email)
# Basic format check
# Basic format check (RFC 5322 simplified)
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, addr):
return False, "Invalid email format"
if not re.match(email_regex, email):
return False
# Extract domain
email_domain = addr.split('@')[1].lower()
required_domain = required_domain.lower()
# Sanity checks
if len(email) > 254: # RFC 5321 maximum
return False
if email.count('@') != 1:
return False
# Domain must match
if email_domain != required_domain:
return False, f"Email must be at {required_domain}"
return True, ""
return True
```
**Note**: Domain matching is NOT enforced in v1.0.0. User may have email at different domain than their identity site (e.g., phil@gmail.com for phil.example.com). This is acceptable as user explicitly publishes the email on their site.
**Residual Risk**: Low, standard validation patterns.
## Network Security
@@ -567,21 +694,29 @@ async def add_security_headers(request: Request, call_next):
**Email Handling**:
```python
# Email stored ONLY during verification (in-memory, 15-min TTL)
# Email discovered from rel="me" link (not user-provided)
# Stored ONLY during verification (in-memory, 15-min TTL)
verification_codes[code_id] = {
"email": email, # ← Exists ONLY here, NEVER in database
"email": email, # ← Discovered from site, exists ONLY here, NEVER in database
"code": code,
"domain": domain,
"expires_at": datetime.utcnow() + timedelta(minutes=15)
}
# After verification: email is deleted, only domain stored
# After verification: email is deleted, only domain + timestamp stored
db.execute('''
INSERT INTO domains (domain, verification_method, verified_at)
VALUES (?, 'email', ?)
''', (domain, datetime.utcnow()))
# Note: NO email address in database
INSERT INTO domains (domain, verification_method, verified_at, last_email_check)
VALUES (?, 'two_factor', ?, ?)
''', (domain, datetime.utcnow(), datetime.utcnow()))
# Note: NO email address in database, only verification timestamp
```
**rel="me" Discovery**:
- Email addresses are public (user publishes on their site)
- Server fetches email from user's site (not user input)
- Reduces social engineering risk (can't claim arbitrary email)
- Follows IndieWeb standards for identity
### Database Security
**SQLite Security**:
@@ -829,13 +964,15 @@ security:
## Security Roadmap
### v1.0.0 (MVP)
-Email-based authentication
-Two-factor domain verification (DNS TXT + Email via rel="me")
- ✅ rel="me" email discovery (IndieWeb standard)
- ✅ HTML parsing security (BeautifulSoup)
- ✅ TLS/HTTPS enforcement
- ✅ Secure token generation (opaque, hashed)
- ✅ URL validation (open redirect prevention)
- ✅ Input validation (Pydantic)
- ✅ Security headers
- ✅ Minimal data collection
- ✅ Minimal data collection (no email storage)
### v1.1.0
- PKCE support (code challenge/verifier)