docs: add Phase 2 domain verification design and clarifications

Add comprehensive Phase 2 documentation:
- Complete design document for two-factor domain verification
- Implementation guide with code examples
- ADR for implementation decisions (ADR-0004)
- ADR for rel="me" email discovery (ADR-008)
- Phase 1 impact assessment
- All 23 clarification questions answered
- Updated architecture docs (indieauth-protocol, security)
- Updated ADR-005 with rel="me" approach
- Updated backlog with technical debt items

Design ready for Phase 2 implementation.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-11-20 13:05:09 -07:00
parent bebd47955f
commit 6f06aebf40
10 changed files with 5605 additions and 410 deletions

View File

@@ -0,0 +1,135 @@
# Phase 2 Clarifications - Architect's Responses
**Date**: 2024-11-20
**Status**: All 23 questions answered
**Developer Action**: Proceed with implementation
## Overview
The Architect has provided complete answers to all 8 categories (23 specific questions) raised by the Developer. This document provides a quick reference to the decisions made.
**Full Details**: See `/docs/designs/phase-2-implementation-guide.md` for complete implementation specifications.
**Architectural Decision Record**: See `/docs/decisions/0004-phase-2-implementation-decisions.md` for rationale and consequences.
## Quick Reference Answers
### 1. Rate Limiting Implementation
**Q: Should actual rate limiting be implemented or leave as stubs?**
- A: Implement actual rate limiting with in-memory storage
**Q: Should metadata storage use CodeStorage?**
- A: No, use simple dictionary in RateLimiter service instance
**Q: Should "Max 3 codes per domain per hour" be implemented?**
- A: Yes, with timestamp list tracking and automatic cleanup
### 2. Authorization Code Metadata Structure
**Q: Should storage include 'used' field in Phase 2?**
- A: Yes, include now (set to False, consume in Phase 3)
**Q: Use Phase 1's CodeStorage or separate storage?**
- A: Reuse Phase 1's CodeStorage with key prefix `authz:`
**Q: Store datetime objects or epoch integers?**
- A: Epoch integers (simpler)
### 3. HTML Template Implementation
**Q: Use Jinja2 or plain Python f-strings?**
- A: Use Jinja2 templates
**Q: Where should template files be located?**
- A: `src/gondulf/templates/`
**Q: Reusable layout templates or self-contained?**
- A: Reusable `base.html` with template inheritance
**Q: Template files vs inline HTML?**
- A: Separate template files
### 4. Database Migration Timing
**Q: Apply migration 002 as part of Phase 2?**
- A: Yes, apply immediately before Phase 2 implementation
**Q: Is migration necessary since Phase 1 doesn't write to domains?**
- A: Yes, keeps schema current with code expectations
**Q: Should new code use 'two_factor' immediately?**
- A: Yes, assume column exists (migration handles it)
### 5. Client Validation Helper Functions
**Q: Implement as standalone functions or methods on helper class?**
- A: Standalone functions in `src/gondulf/utils/validation.py`
**Q: Create shared utility module?**
- A: Yes, `gondulf.utils.validation` module
**Q: Full subdomain validation now or stub for Phase 3?**
- A: Full validation now (security should be complete)
### 6. Error Response Format Consistency
**Q: Should verification endpoints return JSON (200 OK with success:false)?**
- A: Yes, always JSON with 200 OK
**Q: Should authorization endpoint errors return HTML or redirects?**
- A: Depends on validation stage:
- Pre-client validation: HTML error page
- Post-client validation: OAuth redirect with error
**Q: When to use HTML vs OAuth redirect errors?**
- A: See decision tree in implementation guide
### 7. Dependency Injection Pattern
**Q: Create dependencies.py module?**
- A: Yes, `src/gondulf/dependencies.py`
**Q: Services instantiated at startup (singleton) or per-request?**
- A: Singleton at startup using `@lru_cache()`
**Q: Configuration passed at instantiation or read each time?**
- A: Read at instantiation (services configured once)
### 8. Test Organization for Authorization Endpoint
**Q: Separate test files per router?**
- A: Yes:
- `test_verification_endpoints.py`
- `test_authorization_endpoint.py`
**Q: Test sub-endpoints separately or as part of full flow?**
- A: Test complete flows (black box testing)
**Q: Shared fixtures for common scenarios?**
- A: Yes, use `tests/conftest.py` for shared fixtures
## Implementation Priority
All decisions are final and ready for implementation. The Developer should:
1. **Read** `/docs/designs/phase-2-implementation-guide.md` thoroughly
2. **Review** code examples and patterns provided
3. **Apply** migration 002 before starting implementation
4. **Implement** following the exact patterns specified
5. **Ask** additional questions ONLY if new ambiguities arise
## Architect's Guiding Principles
Every decision made reflects these core values:
- **Simplicity**: Real implementations using simple patterns
- **Reuse**: Leverage Phase 1 infrastructure where possible
- **Standards**: Use established tools (Jinja2, FastAPI patterns)
- **Clarity**: Explicit structures over implicit behavior
- **Security**: Complete security features, not stubs
## Status
**DESIGN READY: Phase 2 Implementation - All clarifications resolved**
Developer: Please proceed with implementation following the patterns in the implementation guide.

View File

@@ -162,26 +162,34 @@ Accept: text/html
- Reject non-200 responses - Reject non-200 responses
- Log client_id fetch failures - Log client_id fetch failures
#### Authentication Flow (v1.0.0: Email-based) #### Authentication Flow (v1.0.0: Two-Factor Domain Verification)
1. **Domain Ownership Check** 1. **DNS TXT Record Verification (Required)**
- Check if `me` domain has verified TXT record: `_gondulf.example.com` = `verified` - Check if `me` domain has TXT record: `_gondulf.{domain}` = `verified`
- If found and cached, skip email verification - Query multiple DNS resolvers (Google 8.8.8.8, Cloudflare 1.1.1.1)
- If not found, proceed to email verification - Require consensus from at least 2 resolvers
- If not found: Display error with instructions to add TXT record
- If found: Proceed to email discovery
- Proves: User controls DNS for the domain
2. **Email Verification** 2. **Email Discovery via rel="me" (Required)**
- Display form requesting email address - Fetch user's domain homepage (e.g., https://example.com)
- Validate email is at `me` domain (e.g., `admin@example.com` for `https://example.com`) - Parse HTML for `<link rel="me" href="mailto:user@example.com">` or `<a rel="me" href="mailto:user@example.com">`
- Extract email address from first matching mailto: link
- If not found: Display error with instructions to add rel="me" link
- If found: Proceed to email verification
- Proves: User has published email relationship on their site
- Reference: https://indieweb.org/rel-me
3. **Email Verification Code (Required)**
- Generate 6-digit verification code (cryptographically random) - Generate 6-digit verification code (cryptographically random)
- Store code in memory with 15-minute TTL - Store code in memory with 15-minute TTL
- Send code via SMTP - Send code to discovered email address via SMTP
- Display code entry form - Display code entry form showing discovered email (partially masked)
3. **Code Verification**
- User enters 6-digit code - User enters 6-digit code
- Validate code matches and hasn't expired - Validate code matches and hasn't expired (max 3 attempts)
- Proves: User controls the email account
- Mark domain as verified (store in database) - Mark domain as verified (store in database)
- Proceed to authorization
4. **User Consent** 4. **User Consent**
- Display authorization prompt: - Display authorization prompt:
@@ -208,6 +216,8 @@ Accept: text/html
Location: {redirect_uri}?code={code}&state={state} Location: {redirect_uri}?code={code}&state={state}
``` ```
**Security Model**: Two-factor verification requires BOTH DNS control AND email control. An attacker would need to compromise both to authenticate fraudulently.
#### Error Responses #### Error Responses
Return error via redirect when possible: Return error via redirect when possible:
@@ -404,18 +414,19 @@ Future implementation per RFC 7009.
```python ```python
{ {
"email": "admin@example.com", "email": "admin@example.com", # Discovered from rel="me", not user-provided
"code": "123456", # 6-digit string "code": "123456", # 6-digit string
"domain": "example.com", "domain": "example.com",
"created_at": datetime, "created_at": datetime,
"expires_at": datetime, # created_at + 15 minutes "expires_at": datetime, # created_at + 15 minutes
"attempts": 0 # Rate limiting "attempts": 0 # Rate limiting (max 3 attempts)
} }
``` ```
**Storage**: Python dict with TTL management **Storage**: Python dict with TTL management
**Email Source**: Discovered from site's rel="me" link (not user input)
**Expiration**: 15 minutes **Expiration**: 15 minutes
**Rate Limiting**: Max 3 attempts per email **Rate Limiting**: Max 3 attempts per email, max 3 codes per domain per hour
**Cleanup**: Automatic expiration via TTL **Cleanup**: Automatic expiration via TTL
### Access Token (SQLite) ### Access Token (SQLite)
@@ -448,18 +459,21 @@ CREATE TABLE tokens (
CREATE TABLE domains ( CREATE TABLE domains (
id INTEGER PRIMARY KEY AUTOINCREMENT, id INTEGER PRIMARY KEY AUTOINCREMENT,
domain TEXT NOT NULL UNIQUE, domain TEXT NOT NULL UNIQUE,
verification_method TEXT NOT NULL, -- 'txt_record' or 'email' verification_method TEXT NOT NULL, -- 'two_factor' (DNS + Email)
verified_at TIMESTAMP NOT NULL, verified_at TIMESTAMP NOT NULL,
last_checked TIMESTAMP, last_dns_check TIMESTAMP,
txt_record_valid BOOLEAN DEFAULT 0, dns_txt_valid BOOLEAN DEFAULT 0,
last_email_check TIMESTAMP,
INDEX idx_domain (domain) INDEX idx_domain (domain)
); );
``` ```
**Purpose**: Cache domain ownership verification **Purpose**: Cache domain ownership verification
**TXT Record**: Re-verified periodically (daily) **Verification Method**: Always 'two_factor' in v1.0.0 (DNS TXT + Email via rel="me")
**Email Verification**: Permanent unless admin deletes **DNS TXT**: Re-verified periodically (daily check)
**Email**: NOT stored (only verification timestamp recorded)
**Re-verification**: DNS checked periodically, email re-verified on each login
**Cleanup**: Optional (admin decision) **Cleanup**: Optional (admin decision)
## Security Considerations ## Security Considerations

View File

@@ -0,0 +1,809 @@
# Phase 1 Impact Assessment: Authentication Flow Change
**Date**: 2025-11-20
**Architect**: Claude (Architect Agent)
**Related ADRs**: ADR-005 (updated), ADR-008 (new)
**Related Report**: /docs/reports/2025-11-20-phase-1-foundation.md
## Summary
The authentication design has been updated to require BOTH DNS TXT verification AND email verification via rel="me" discovery. This change impacts Phase 1 implementation and defines new requirements for Phase 2.
## Authentication Flow Change
### Original Design (ADR-005 v1)
- **Primary**: Email verification (user provides email)
- **Optional**: DNS TXT verification (fast-path to skip email)
- **Flow**: DNS check → if not found, request email → send code → verify code
### Updated Design (ADR-005 v2 + ADR-008)
- **Required Factor 1**: DNS TXT verification (`_gondulf.{domain}` = `verified`)
- **Required Factor 2**: Email verification via rel="me" discovery
- **Flow**: DNS check → rel="me" discovery → send code to discovered email → verify code
### Key Differences
| Aspect | Original | Updated |
|--------|----------|---------|
| DNS TXT | Optional (fast-path) | Required (first factor) |
| Email Discovery | User input | rel="me" link parsing |
| Email Verification | Optional (fallback) | Required (second factor) |
| Security Model | Single-factor | Two-factor |
| Attack Resistance | Moderate | High (requires DNS + email control) |
| Setup Complexity | Lower (email only works) | Higher (both required) |
## Phase 1 Implementation Impact
### What Phase 1 Implemented
Phase 1 successfully implemented:
- ✅ Configuration management (GONDULF_* environment variables)
- ✅ Database layer with migrations (SQLite, SQLAlchemy Core)
- ✅ In-memory code storage (TTL-based expiration)
- ✅ Email service (SMTP with STARTTLS support)
- ✅ DNS service (TXT record querying with fallback resolvers)
- ✅ Structured logging
- ✅ FastAPI application with health check endpoint
- ✅ 94.16% test coverage (96 tests passing)
### Does Phase 1 Need Changes?
**Answer: NO. Phase 1 implementation remains valid.**
#### Analysis
**Email Service** (`src/gondulf/email.py`):
- Current: Generic email sending service
- Change Impact: **None**
- Reason: Email service sends codes to any email address. Whether email is user-provided or rel="me"-discovered doesn't affect this service.
- Status: **No changes needed**
**DNS Service** (`src/gondulf/dns.py`):
- Current: TXT record verification with fallback resolvers
- Change Impact: **None**
- Reason: DNS service already implements TXT record verification as designed. Changing from "optional" to "required" is a business logic change, not a DNS service change.
- Status: **No changes needed**
**In-Memory Storage** (`src/gondulf/storage.py`):
- Current: TTL-based code storage
- Change Impact: **None**
- Reason: Storage mechanism is independent of how email is discovered or whether DNS is optional/required.
- Status: **No changes needed**
**Database Schema** (`001_initial_schema.sql`):
- Current: `domains` table with `domain`, `verification_method`, `verified_at`
- Change Impact: **Minor update needed in Phase 2**
- Reason: Schema already supports storing verification method. Will need to update from `'txt_record'` or `'email'` to `'two_factor'` when storing records.
- Status: **Schema structure OK, values will change in Phase 2**
**Configuration** (`src/gondulf/config.py`):
- Current: SMTP configuration, DNS configuration, timeouts
- Change Impact: **None immediately, optional addition in Phase 2**
- Reason: Current configuration supports both email and DNS. May want to add timeout for HTML fetching in Phase 2.
- Status: **No changes needed now**
### Phase 1 Status: APPROVED
Phase 1 implementation remains valid and does NOT require any revisions due to the authentication flow change. All Phase 1 components are foundational services that work regardless of how they're orchestrated in the authentication flow.
## Phase 2 Requirements: New Implementation Needs
Phase 2 must now implement the updated authentication flow. Here's what needs to be built:
### 1. HTML Fetching Service (NEW)
**Purpose**: Fetch user's homepage to discover rel="me" links
**Implementation**:
```python
# src/gondulf/html_fetcher.py
import requests
from typing import Optional
class HTMLFetcherService:
"""
Fetch user's homepage over HTTPS.
"""
def __init__(self, timeout: int = 10):
self.timeout = timeout
self.max_redirects = 5
self.max_size = 5 * 1024 * 1024 # 5MB
def fetch_site(self, domain: str) -> Optional[str]:
"""
Fetch site HTML content.
Args:
domain: Domain to fetch (e.g., "example.com")
Returns:
HTML content as string, or None if fetch fails
"""
url = f"https://{domain}"
try:
response = requests.get(
url,
timeout=self.timeout,
allow_redirects=True,
max_redirects=self.max_redirects,
verify=True # Enforce SSL verification
)
response.raise_for_status()
# Check content size
if len(response.content) > self.max_size:
raise ValueError(f"Response too large: {len(response.content)} bytes")
return response.text
except requests.exceptions.SSLError as e:
logger.error(f"SSL verification failed for {domain}: {e}")
return None
except requests.exceptions.Timeout:
logger.error(f"Timeout fetching {domain}")
return None
except Exception as e:
logger.error(f"Failed to fetch {domain}: {e}")
return None
```
**Dependencies**:
- `requests` library (already in pyproject.toml)
- Timeout configuration (add to Config if needed)
**Tests Required**:
- Successful HTTPS fetch
- SSL verification failure
- Timeout handling
- HTTP error codes (404, 500, etc.)
- Redirect following
- Size limit enforcement
---
### 2. rel="me" Email Discovery Service (NEW)
**Purpose**: Parse HTML to discover email from rel="me" links
**Implementation**:
```python
# src/gondulf/relme.py
from bs4 import BeautifulSoup
from typing import Optional
import re
class RelMeDiscoveryService:
"""
Discover email addresses from rel="me" links in HTML.
"""
def discover_email(self, html_content: str) -> Optional[str]:
"""
Parse HTML and discover email from rel="me" link.
Args:
html_content: HTML content as string
Returns:
Email address or None if not found
"""
try:
# Parse HTML (BeautifulSoup handles malformed HTML)
soup = BeautifulSoup(html_content, 'html.parser')
# Find all rel="me" links (<link> and <a> tags)
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Look for mailto: links
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format
if self._validate_email_format(email):
logger.info(f"Discovered email via rel='me': {email[:3]}***")
return email
logger.warning("No rel='me' mailto: link found in HTML")
return None
except Exception as e:
logger.error(f"Failed to parse HTML: {e}")
return None
def _validate_email_format(self, email: str) -> bool:
"""Validate email address format (RFC 5322 simplified)."""
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, email):
return False
if len(email) > 254: # RFC 5321 maximum
return False
if email.count('@') != 1:
return False
return True
```
**Dependencies**:
- `beautifulsoup4` library (add to pyproject.toml)
- `html.parser` (Python standard library)
**Tests Required**:
- Discovery from `<link rel="me">` tags
- Discovery from `<a rel="me">` tags
- Multiple rel="me" links (select first mailto)
- Malformed HTML handling
- Missing rel="me" links
- Invalid email format in link
- Edge cases (empty href, non-mailto links, etc.)
---
### 3. Domain Verification Service (UPDATED)
**Purpose**: Orchestrate two-factor verification (DNS + Email)
**Implementation**:
```python
# src/gondulf/domain_verification.py
from typing import Tuple, Optional
from .dns import DNSService
from .html_fetcher import HTMLFetcherService
from .relme import RelMeDiscoveryService
from .email import EmailService
from .storage import CodeStorage
class DomainVerificationService:
"""
Two-factor domain verification service.
Verifies domain ownership through:
1. DNS TXT record verification
2. Email verification via rel="me" discovery
"""
def __init__(
self,
dns_service: DNSService,
html_fetcher: HTMLFetcherService,
relme_discovery: RelMeDiscoveryService,
email_service: EmailService,
code_storage: CodeStorage
):
self.dns = dns_service
self.html_fetcher = html_fetcher
self.relme = relme_discovery
self.email = email_service
self.code_storage = code_storage
def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]:
"""
Start domain verification process.
Returns: (success, discovered_email, error_message)
Raises HTTPException with appropriate error if verification cannot start.
"""
# Step 1: Verify DNS TXT record
dns_verified = self.dns.verify_txt_record(domain, "verified")
if not dns_verified:
error = f"DNS TXT record not found for {domain}. Please add: _gondulf.{domain} TXT verified"
return False, None, error
# Step 2: Fetch site and discover email
html = self.html_fetcher.fetch_site(domain)
if html is None:
error = f"Could not fetch site at https://{domain}. Please ensure site is accessible via HTTPS."
return False, None, error
# Step 3: Discover email from rel="me"
email = self.relme.discover_email(html)
if email is None:
error = 'No rel="me" mailto: link found. Please add: <link rel="me" href="mailto:you@example.com">'
return False, None, error
# Step 4: Generate and send verification code
code = self._generate_code()
self.code_storage.store(email, code, ttl=900) # 15 minutes
email_sent = self.email.send_verification_email(email, code)
if not email_sent:
error = f"Failed to send verification email to {email}. Please try again."
return False, email, error
# Success: code sent to discovered email
return True, email, None
def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, str]:
"""
Verify submitted code.
Returns: (success, domain_or_error_message)
"""
stored_data = self.code_storage.get(email)
if stored_data is None:
return False, "No verification code found. Please restart verification."
code, domain = stored_data
# Verify code (constant-time comparison)
if not secrets.compare_digest(submitted_code, code):
return False, "Invalid code. Please try again."
# Success: mark code as used
self.code_storage.delete(email)
return True, domain
def _generate_code(self) -> str:
"""Generate 6-digit verification code."""
return ''.join(secrets.choice('0123456789') for _ in range(6))
```
**Dependencies**:
- All Phase 1 services (DNS, Email, Storage)
- New HTML fetcher service
- New rel="me" discovery service
**Tests Required**:
- Full verification flow (DNS → rel="me" → email → code)
- DNS verification failure
- Site fetch failure
- rel="me" discovery failure
- Email send failure
- Code verification success/failure
- Multiple attempts tracking
- Code expiration
---
### 4. Domain Verification UI Endpoints (NEW)
**Purpose**: HTTP endpoints for user interaction
**Implementation**:
```python
# src/gondulf/routers/verification.py
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
router = APIRouter(prefix="/verify", tags=["verification"])
class VerificationStartRequest(BaseModel):
domain: str
class VerificationStartResponse(BaseModel):
success: bool
email_masked: Optional[str] # e.g., "u***@example.com"
error: Optional[str]
class VerificationCodeRequest(BaseModel):
email: str
code: str
class VerificationCodeResponse(BaseModel):
success: bool
domain: Optional[str]
error: Optional[str]
@router.post("/start", response_model=VerificationStartResponse)
async def start_verification(request: VerificationStartRequest):
"""
Start domain verification process.
Steps:
1. Verify DNS TXT record
2. Discover email from rel="me"
3. Send verification code to email
"""
success, email, error = domain_verification_service.start_verification(request.domain)
if not success:
return VerificationStartResponse(success=False, email_masked=None, error=error)
# Mask email for display: u***@example.com
masked_email = f"{email[0]}***@{email.split('@')[1]}"
return VerificationStartResponse(
success=True,
email_masked=masked_email,
error=None
)
@router.post("/code", response_model=VerificationCodeResponse)
async def verify_code(request: VerificationCodeRequest):
"""
Verify submitted code.
Returns domain if code is valid.
"""
success, result = domain_verification_service.verify_code(request.email, request.code)
if not success:
return VerificationCodeResponse(success=False, domain=None, error=result)
return VerificationCodeResponse(success=True, domain=result, error=None)
```
**Dependencies**:
- FastAPI router
- Pydantic models
- Domain verification service
**Tests Required**:
- POST /verify/start success case
- POST /verify/start with DNS failure
- POST /verify/start with rel="me" failure
- POST /verify/start with email send failure
- POST /verify/code success case
- POST /verify/code with invalid code
- POST /verify/code with expired code
- POST /verify/code with missing code
---
### 5. Authorization Endpoint Integration (UPDATED)
**Changes to Authorization Flow**:
**Before** (original design):
```
1. User enters domain (me parameter)
2. Display form: "Enter your email at {domain}"
3. User enters email manually
4. Send code, user enters code
5. Display consent screen
```
**After** (updated design):
```
1. User enters domain (me parameter)
2. Server performs two-factor verification:
a. Verify DNS TXT record
b. Discover email from rel="me"
c. Send code to discovered email
3. Display code entry form (show discovered email masked)
4. User enters code
5. Display consent screen
```
**Implementation Changes**:
- Call `DomainVerificationService.start_verification()` instead of requesting email from user
- Update UI to show "Sending code to u***@example.com" instead of email input form
- Handle new error cases (DNS not found, rel="me" not found, site unreachable)
---
## Phase 2 Feature Breakdown
### New Dependencies to Add
**pyproject.toml additions**:
```toml
[project]
dependencies = [
# ... existing dependencies
"beautifulsoup4>=4.12.0", # HTML parsing for rel="me" discovery
]
```
### New Source Files
1. `src/gondulf/html_fetcher.py` - HTML fetching service
2. `src/gondulf/relme.py` - rel="me" email discovery service
3. `src/gondulf/domain_verification.py` - Two-factor verification orchestration
4. `src/gondulf/routers/verification.py` - Verification endpoints (if implemented separately from authorization)
### Updated Files
1. `src/gondulf/main.py` - Register new routers, initialize new services
2. `src/gondulf/config.py` - Optional: add HTML fetch timeout config
3. Database migration (002_update_verification_method.sql) - Change domain.verification_method values
### New Test Files
1. `tests/unit/test_html_fetcher.py` - HTML fetching tests
2. `tests/unit/test_relme.py` - rel="me" discovery tests
3. `tests/unit/test_domain_verification.py` - Verification service tests
4. `tests/integration/test_verification_endpoints.py` - Verification endpoint tests
### Estimated Effort
**New Components**:
- HTML Fetcher Service: 0.5 days
- rel="me" Discovery Service: 0.5 days
- Domain Verification Service: 1 day
- Verification Endpoints: 0.5 days
- Tests (all new components): 1 day
**Total New Work**: ~3.5 days
**Authorization Endpoint** (already planned):
- Original estimate: 3-5 days
- Updated estimate: 3-5 days (same - just uses DomainVerificationService)
## Database Schema Updates
### Migration: 002_update_verification_method.sql
```sql
-- Update verification_method values from single-factor to two-factor
-- This is a data migration, not schema change
UPDATE domains
SET verification_method = 'two_factor'
WHERE verification_method IN ('txt_record', 'email');
-- No schema changes needed - 'verification_method' column already exists
```
**When to Apply**: Phase 2, before authorization endpoint implementation
## Error Message Updates
### DNS TXT Not Found
```
DNS Verification Failed
Please add this TXT record to your domain's DNS:
Type: TXT
Name: _gondulf.example.com
Value: verified
DNS changes may take up to 24 hours to propagate.
Need help? See: https://docs.gondulf.example.com/setup/dns
```
### rel="me" Not Found
```
Email Discovery Failed
Could not find a rel="me" email link on your homepage.
Please add this to your homepage (https://example.com):
<link rel="me" href="mailto:your-email@example.com">
This declares your email address for IndieAuth verification.
Learn more: https://indieweb.org/rel-me
```
### Site Unreachable
```
Site Fetch Failed
Could not fetch your site at https://example.com
Please check:
- Site is accessible via HTTPS
- SSL certificate is valid
- No firewall blocking requests
Try again once your site is accessible.
```
### Email Send Failure
```
Email Delivery Failed
Failed to send verification code to u***@example.com
Please check:
- Email address is correct in your rel="me" link
- Email server is accepting mail
- Check spam/junk folder
Try again, or contact support if the issue persists.
```
## Documentation Updates Needed
### User Documentation (Phase 2)
1. **Setup Guide**: `/docs/user/setup.md`
- Step 1: Add DNS TXT record
- Step 2: Add rel="me" link to homepage
- Step 3: Test verification
2. **Troubleshooting**: `/docs/user/troubleshooting.md`
- DNS verification failures
- rel="me" discovery issues
- Email delivery problems
3. **Examples**: `/docs/user/examples.md`
- Example HTML with rel="me" link
- Example DNS configuration (various providers)
### Developer Documentation (Phase 2)
1. **API Reference**: `/docs/api/verification.md`
- POST /verify/start endpoint
- POST /verify/code endpoint
- Error codes and responses
2. **Architecture**: `/docs/architecture/domain-verification.md`
- Two-factor verification flow diagram
- Service interaction diagram
- Error handling flowchart
## Security Considerations for Phase 2
### New Attack Surfaces
1. **HTML Parsing**:
- Risk: Malicious HTML exploiting parser
- Mitigation: BeautifulSoup handles untrusted HTML safely
- Test: Fuzzing with malformed HTML
2. **HTTPS Fetching**:
- Risk: SSL verification bypass
- Mitigation: Enforce `verify=True` in requests
- Test: Attempt to fetch site with invalid certificate (must fail)
3. **rel="me" Spoofing**:
- Risk: Attacker adds rel="me" to compromised site
- Mitigation: Two-factor requirement (also need DNS control)
- Test: Verify DNS check happens BEFORE rel="me" discovery
### Security Testing Required
1. **Input Validation**:
- Malformed domain names
- Oversized HTML responses (>5MB)
- Invalid email formats in rel="me" links
2. **TLS Enforcement**:
- Verify HTTPS-only fetching
- Verify SSL certificate validation
- Reject sites with invalid certificates
3. **Rate Limiting** (future):
- Prevent bulk rel="me" discovery
- Limit verification attempts per domain
## Configuration Updates
### Optional New Config
```python
# src/gondulf/config.py
class Config:
# ... existing config
# HTML Fetching (optional, has sensible defaults)
HTML_FETCH_TIMEOUT: int = 10 # seconds
HTML_MAX_SIZE: int = 5 * 1024 * 1024 # 5MB
HTML_MAX_REDIRECTS: int = 5
```
### Environment Variables
```bash
# .env.example additions (optional)
# HTML Fetching Configuration (optional - has defaults)
GONDULF_HTML_FETCH_TIMEOUT=10 # Timeout for fetching user's site (seconds)
GONDULF_HTML_MAX_SIZE=5242880 # Maximum HTML size (bytes, default 5MB)
GONDULF_HTML_MAX_REDIRECTS=5 # Maximum redirects to follow
```
## Testing Strategy for Phase 2
### Unit Tests
**HTML Fetcher**:
- Mock successful HTTPS response
- Mock SSL verification failure
- Mock timeout
- Mock HTTP errors (404, 500, etc.)
- Mock size limit exceeded
- Mock redirect following
**rel="me" Discovery**:
- Parse `<link rel="me" href="mailto:...">`
- Parse `<a rel="me" href="mailto:...">`
- Handle malformed HTML
- Handle missing rel="me" links
- Handle invalid email in link
- Handle multiple rel="me" links (select first)
**Domain Verification Service**:
- Full two-factor flow success
- DNS verification failure
- Site fetch failure
- rel="me" discovery failure
- Email send failure
- Code verification success/failure
### Integration Tests
**Verification Endpoints**:
- POST /verify/start with valid domain (mock services)
- POST /verify/start with DNS failure
- POST /verify/start with rel="me" failure
- POST /verify/code with valid code
- POST /verify/code with invalid code
### End-to-End Tests (Future)
- Complete verification flow with real HTML
- Authorization flow integration
- Token issuance after successful verification
## Acceptance Criteria for Phase 2
Phase 2 will be considered complete when:
1. ✅ HTML fetcher service implemented and tested
2. ✅ rel="me" discovery service implemented and tested
3. ✅ Domain verification service orchestrates two-factor verification
4. ✅ Verification endpoints return correct responses for all cases
5. ✅ Error messages are clear and actionable
6. ✅ All new tests passing (unit + integration)
7. ✅ Test coverage remains >80% overall
8. ✅ Security testing complete (HTML parsing, TLS enforcement)
9. ✅ Documentation updated (user setup guide, API reference)
10. ✅ Database migration applied successfully
## Timeline Estimate
**Phase 2 Components**:
- HTML Fetcher: 0.5 days
- rel="me" Discovery: 0.5 days
- Domain Verification Service: 1 day
- Verification Endpoints: 0.5 days
- Testing: 1 day
- Documentation: 0.5 days
**Total New Work**: ~4 days
**Authorization Endpoint** (already planned):
- Original estimate: 3-5 days
- Updated estimate: 3-5 days (uses DomainVerificationService)
**Phase 2 Total**: ~7-9 days (vs. original estimate of 3-5 days)
**Impact**: +4 days of work due to authentication flow change
## Recommendation
**Phase 1**: APPROVED as-is. No changes needed.
**Phase 2**: Proceed with implementation of:
1. HTML fetching service
2. rel="me" discovery service
3. Domain verification service (two-factor orchestration)
4. Verification endpoints
5. Updated authorization endpoint to use domain verification service
The additional work (HTML fetching + rel="me" discovery) adds ~4 days to Phase 2, bringing total Phase 2 estimate to 7-9 days instead of original 3-5 days.
## Sign-off
**Assessment Status**: Complete
**Phase 1 Impact**: None - Phase 1 approved as-is
**Phase 2 Impact**: Additional 4 days of work for new services
**Risk Level**: Low - All new work is well-scoped and testable
**Ready to Proceed**: Yes
---
**Assessment completed**: 2025-11-20
**Architect**: Claude (Architect Agent)

View File

@@ -58,108 +58,174 @@ Gondulf follows a defense-in-depth security model with these core principles:
## Authentication Security ## Authentication Security
### Email-Based Verification (v1.0.0) ### Two-Factor Domain Verification (v1.0.0)
**Mechanism**: Users prove domain ownership by receiving verification code at email address on that domain. **Mechanism**: Users prove domain ownership through TWO independent factors:
1. **DNS TXT Record**: Proves DNS control (`_gondulf.{domain}` = `verified`)
2. **Email via rel="me"**: Proves email control (discovered from site's rel="me" link)
**Security Model**: An attacker must compromise BOTH factors to authenticate fraudulently. This is significantly stronger than single-factor verification.
#### Threat: Email Interception #### Threat: Email Interception
**Risk**: Attacker intercepts email containing verification code. **Risk**: Attacker intercepts email containing verification code.
**Mitigations**: **Mitigations**:
1. **Short Code Lifetime**: 15-minute expiration 1. **Two-Factor Requirement**: Email alone is insufficient (DNS also required)
2. **Single Use**: Code invalidated after verification 2. **Short Code Lifetime**: 15-minute expiration
3. **Rate Limiting**: Max 3 code requests per email per hour 3. **Single Use**: Code invalidated after verification
4. **TLS Email Delivery**: Require STARTTLS for SMTP 4. **Rate Limiting**: Max 3 code requests per domain per hour
5. **Display Warning**: "Only request code if you initiated this login" 5. **TLS Email Delivery**: Require STARTTLS for SMTP
6. **Display Warning**: "Only request code if you initiated this login"
**Residual Risk**: Acceptable for v1.0.0 given short lifetime and single-use. **Residual Risk**: Low. Even with email interception, attacker still needs DNS control.
#### Threat: Code Brute Force #### Threat: Code Brute Force
**Risk**: Attacker guesses 6-digit verification code. **Risk**: Attacker guesses 6-digit verification code.
**Mitigations**: **Mitigations**:
1. **Sufficient Entropy**: 1,000,000 possible codes (6 digits) 1. **Two-Factor Requirement**: Code alone is insufficient (DNS also required)
2. **Attempt Limiting**: Max 3 attempts per email 2. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
3. **Short Lifetime**: 15-minute window 3. **Attempt Limiting**: Max 3 attempts per email
4. **Rate Limiting**: Max 10 attempts per IP per hour 4. **Short Lifetime**: 15-minute window
5. **Exponential Backoff**: 5-second delay after each failed attempt 5. **Rate Limiting**: Max 3 codes per domain per hour
6. **Single-Use**: Code invalidated after use
**Math**: **Math**:
- 3 attempts × 1,000,000 codes = 0.0003% success probability - 3 attempts × 1,000,000 codes = 0.0003% success probability
- 15-minute window limits attack time - 15-minute window limits attack time
- Rate limiting prevents distributed guessing - Even if guessed, attacker still needs DNS control
**Residual Risk**: Very low, acceptable for v1.0.0. **Residual Risk**: Very low. Two-factor requirement makes brute force insufficient.
#### Threat: DNS TXT Record Spoofing
**Risk**: Attacker attempts to spoof DNS responses.
**Mitigations**:
1. **Multiple Resolvers**: Query 2+ independent DNS servers (Google, Cloudflare)
2. **Consensus Required**: Require agreement from at least 2 resolvers
3. **DNSSEC Support**: Validate DNSSEC signatures when available (future)
4. **Timeout Handling**: Fail securely if DNS unavailable
5. **Logging**: Log all DNS verification attempts
**Residual Risk**: Low. Spoofing multiple independent resolvers is difficult.
#### Threat: rel="me" Link Spoofing
**Risk**: Attacker compromises user's website to add malicious rel="me" link.
**Mitigations**:
1. **Two-Factor Requirement**: Website compromise alone insufficient (DNS also required)
2. **HTTPS Required**: Fetch site over TLS (prevents MITM)
3. **Certificate Validation**: Verify SSL certificate
4. **Email Domain Matching**: Email should match site domain (warning if not)
5. **User Education**: Inform users to secure their website
**Residual Risk**: Moderate. If attacker compromises both DNS and website, they can authenticate. This is acceptable as it represents full domain compromise.
#### Threat: Email Address Enumeration #### Threat: Email Address Enumeration
**Risk**: Attacker discovers which domains are registered by requesting codes. **Risk**: Attacker discovers email addresses by triggering rel="me" discovery.
**Mitigations**: **Mitigations**:
1. **Consistent Response**: Always say "If email exists, code sent" 1. **Public Information**: rel="me" links are intentionally public
2. **No Error Differentiation**: Same message for valid/invalid emails 2. **User Awareness**: Users know they're publishing email on their site
3. **Rate Limiting**: Prevent bulk enumeration 3. **Rate Limiting**: Prevent bulk scanning
4. **Robots.txt**: Users can restrict crawler access if desired
**Residual Risk**: Minimal, domain names are public anyway (DNS). **Residual Risk**: Minimal. Email addresses are intentionally published by users on their own sites.
### Domain Ownership Verification ### Domain Ownership Verification (Two-Factor)
#### TXT Record Validation (Preferred) **Mechanism**: v1.0.0 requires BOTH verification methods:
**Mechanism**: Admin adds DNS TXT record `_gondulf.example.com` = `verified`. #### 1. TXT Record Validation (Required)
**Mechanism**: Admin adds DNS TXT record `_gondulf.{domain}` = `verified`.
**Security Properties**: **Security Properties**:
- Requires DNS control (stronger than email) - Proves DNS control (first factor)
- Verifiable without user interaction - Verifiable without user interaction
- Cacheable for performance - Cacheable for performance
- Re-verifiable periodically - Re-verifiable periodically
**Threat: DNS Spoofing**
**Mitigations**:
1. **DNSSEC**: Validate DNSSEC signatures if available
2. **Multiple Resolvers**: Query 2+ DNS servers, require consensus
3. **Caching**: Cache valid results, re-verify daily
4. **Logging**: Log all DNS verification attempts
**Implementation**: **Implementation**:
```python ```python
import dns.resolver import dns.resolver
import dns.dnssec
def verify_txt_record(domain: str) -> bool: def verify_txt_record(domain: str) -> bool:
""" """
Verify _gondulf.{domain} TXT record exists with value 'verified'. Verify _gondulf.{domain} TXT record exists with value 'verified'.
Requires consensus from multiple independent resolvers.
""" """
try: try:
# Use Google and Cloudflare DNS for redundancy # Use Google and Cloudflare DNS for redundancy
resolvers = ['8.8.8.8', '1.1.1.1'] resolvers = ['8.8.8.8', '1.1.1.1']
results = [] verified_count = 0
for resolver_ip in resolvers: for resolver_ip in resolvers:
resolver = dns.resolver.Resolver() resolver = dns.resolver.Resolver()
resolver.nameservers = [resolver_ip] resolver.nameservers = [resolver_ip]
resolver.timeout = 5 resolver.timeout = 5
resolver.lifetime = 5
answers = resolver.resolve(f'_gondulf.{domain}', 'TXT') answers = resolver.resolve(f'_gondulf.{domain}', 'TXT')
for rdata in answers: for rdata in answers:
txt_value = rdata.to_text().strip('"') txt_value = rdata.to_text().strip('"')
if txt_value == 'verified': if txt_value == 'verified':
results.append(True) verified_count += 1
break break
# Require consensus from both resolvers # Require consensus from at least 2 resolvers
return len(results) >= 2 return verified_count >= 2
except Exception as e: except Exception as e:
logger.warning(f"DNS verification failed for {domain}: {e}") logger.warning(f"DNS verification failed for {domain}: {e}")
return False return False
``` ```
**Residual Risk**: Low, DNS is foundational internet infrastructure. #### 2. Email Verification via rel="me" (Required)
**Mechanism**: Email discovered from site's `<link rel="me" href="mailto:...">`, then verified with code.
**Security Properties**:
- Proves website control (can modify HTML)
- Proves email control (receives and enters code)
- Follows IndieWeb standards (rel="me")
- Self-documenting (user declares email publicly)
**Implementation**:
```python
from bs4 import BeautifulSoup
import requests
def discover_email_from_site(domain: str) -> Optional[str]:
"""
Fetch site and discover email from rel="me" link.
"""
try:
response = requests.get(f"https://{domain}", timeout=10, allow_redirects=True)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
if validate_email_format(email):
return email
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
```
**Combined Residual Risk**: Low. Attacker must compromise DNS, website, and email account to authenticate fraudulently.
## Authorization Security ## Authorization Security
@@ -431,15 +497,80 @@ class AuthorizeRequest(BaseModel):
**Residual Risk**: Minimal, Pydantic provides strong validation. **Residual Risk**: Minimal, Pydantic provides strong validation.
### HTML Parsing Security (rel="me" Discovery)
#### Threat: Malicious HTML Injection
**Risk**: Attacker's site contains malicious HTML to exploit parser.
**Mitigations**:
1. **Robust Parser**: Use BeautifulSoup (handles malformed HTML safely)
2. **Link Extraction Only**: Only extract href attributes, no script execution
3. **Timeout**: 10-second timeout for HTTP requests
4. **Size Limit**: Limit response size (prevent memory exhaustion)
5. **HTTPS Required**: Fetch over TLS only
6. **Certificate Validation**: Verify SSL certificates
**Implementation**:
```python
from bs4 import BeautifulSoup
import requests
def discover_email_from_site(domain: str) -> Optional[str]:
"""
Safely discover email from rel="me" link.
"""
try:
# Fetch with safety limits
response = requests.get(
f"https://{domain}",
timeout=10,
allow_redirects=True,
max_redirects=5,
stream=True # Don't load entire response into memory
)
response.raise_for_status()
# Limit response size (prevent memory exhaustion)
MAX_SIZE = 5 * 1024 * 1024 # 5MB
content = response.raw.read(MAX_SIZE)
# Parse HTML (BeautifulSoup handles malformed HTML safely)
soup = BeautifulSoup(content, 'html.parser')
# Find rel="me" links (no script execution)
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Extract mailto: links only
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format before returning
if validate_email_format(email):
return email
return None
except requests.exceptions.SSLError as e:
logger.error(f"SSL certificate validation failed for {domain}: {e}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
```
**Residual Risk**: Very low. BeautifulSoup is designed for untrusted HTML.
### Email Validation ### Email Validation
#### Threat: Email Injection Attacks #### Threat: Email Injection Attacks
**Risk**: Attacker injects SMTP commands via email address field. **Risk**: Attacker crafts malicious email address in rel="me" link.
**Mitigations**: **Mitigations**:
1. **Format Validation**: Strict email regex (RFC 5322) 1. **Format Validation**: Strict email regex (RFC 5322)
2. **Domain Matching**: Require email domain match `me` domain 2. **No User Input**: Email discovered from site (not user-provided)
3. **SMTP Library**: Use well-tested library (smtplib) 3. **SMTP Library**: Use well-tested library (smtplib)
4. **Content Encoding**: Encode email content properly 4. **Content Encoding**: Encode email content properly
5. **Rate Limiting**: Prevent abuse 5. **Rate Limiting**: Prevent abuse
@@ -447,31 +578,27 @@ class AuthorizeRequest(BaseModel):
**Validation**: **Validation**:
```python ```python
import re import re
from email.utils import parseaddr
def validate_email(email: str, required_domain: str) -> tuple[bool, str]: def validate_email_format(email: str) -> bool:
""" """
Validate email address and domain match. Validate email address format.
""" """
# Parse email (RFC 5322 compliant) # Basic format check (RFC 5322 simplified)
name, addr = parseaddr(email)
# Basic format check
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, addr): if not re.match(email_regex, email):
return False, "Invalid email format" return False
# Extract domain # Sanity checks
email_domain = addr.split('@')[1].lower() if len(email) > 254: # RFC 5321 maximum
required_domain = required_domain.lower() return False
if email.count('@') != 1:
return False
# Domain must match return True
if email_domain != required_domain:
return False, f"Email must be at {required_domain}"
return True, ""
``` ```
**Note**: Domain matching is NOT enforced in v1.0.0. User may have email at different domain than their identity site (e.g., phil@gmail.com for phil.example.com). This is acceptable as user explicitly publishes the email on their site.
**Residual Risk**: Low, standard validation patterns. **Residual Risk**: Low, standard validation patterns.
## Network Security ## Network Security
@@ -567,21 +694,29 @@ async def add_security_headers(request: Request, call_next):
**Email Handling**: **Email Handling**:
```python ```python
# Email stored ONLY during verification (in-memory, 15-min TTL) # Email discovered from rel="me" link (not user-provided)
# Stored ONLY during verification (in-memory, 15-min TTL)
verification_codes[code_id] = { verification_codes[code_id] = {
"email": email, # ← Exists ONLY here, NEVER in database "email": email, # ← Discovered from site, exists ONLY here, NEVER in database
"code": code, "code": code,
"domain": domain,
"expires_at": datetime.utcnow() + timedelta(minutes=15) "expires_at": datetime.utcnow() + timedelta(minutes=15)
} }
# After verification: email is deleted, only domain stored # After verification: email is deleted, only domain + timestamp stored
db.execute(''' db.execute('''
INSERT INTO domains (domain, verification_method, verified_at) INSERT INTO domains (domain, verification_method, verified_at, last_email_check)
VALUES (?, 'email', ?) VALUES (?, 'two_factor', ?, ?)
''', (domain, datetime.utcnow())) ''', (domain, datetime.utcnow(), datetime.utcnow()))
# Note: NO email address in database # Note: NO email address in database, only verification timestamp
``` ```
**rel="me" Discovery**:
- Email addresses are public (user publishes on their site)
- Server fetches email from user's site (not user input)
- Reduces social engineering risk (can't claim arbitrary email)
- Follows IndieWeb standards for identity
### Database Security ### Database Security
**SQLite Security**: **SQLite Security**:
@@ -829,13 +964,15 @@ security:
## Security Roadmap ## Security Roadmap
### v1.0.0 (MVP) ### v1.0.0 (MVP)
-Email-based authentication -Two-factor domain verification (DNS TXT + Email via rel="me")
- ✅ rel="me" email discovery (IndieWeb standard)
- ✅ HTML parsing security (BeautifulSoup)
- ✅ TLS/HTTPS enforcement - ✅ TLS/HTTPS enforcement
- ✅ Secure token generation (opaque, hashed) - ✅ Secure token generation (opaque, hashed)
- ✅ URL validation (open redirect prevention) - ✅ URL validation (open redirect prevention)
- ✅ Input validation (Pydantic) - ✅ Input validation (Pydantic)
- ✅ Security headers - ✅ Security headers
- ✅ Minimal data collection - ✅ Minimal data collection (no email storage)
### v1.1.0 ### v1.1.0
- PKCE support (code challenge/verifier) - PKCE support (code challenge/verifier)

View File

@@ -0,0 +1,98 @@
# 0004. Phase 2 Implementation Decisions
Date: 2024-11-20
## Status
Accepted
## Context
The Developer has raised 8 categories of implementation questions for Phase 2 that require architectural decisions. These decisions need to balance simplicity with functionality while providing clear direction for implementation.
## Decisions
### 1. Rate Limiting Implementation
**Decision**: Implement actual rate limiting with in-memory storage in Phase 2.
**Rationale**: Security features should be real from the start, not stubs. In-memory is simplest.
**Implementation**:
- Use a simple dictionary with domain as key, list of timestamps as value
- Clean up old timestamps on each check (older than 1 hour)
- Store in `RateLimiter` service as instance variable
- No persistence needed - resets on restart is acceptable
### 2. Authorization Code Metadata Structure
**Decision**: Use Phase 1's `CodeStorage` service with complete structure from the start.
**Rationale**: Reuse existing infrastructure, avoid future migrations.
**Implementation**:
- Include `used` field (boolean, default False) even though Phase 3 consumes it
- Store epoch integers for timestamps (simpler than datetime objects)
- Use same `CodeStorage` from Phase 1 with authorization code keys
### 3. HTML Template Implementation
**Decision**: Use Jinja2 templates with separate template files.
**Rationale**: Jinja2 is standard, maintainable, and allows for future template customization.
**Implementation**:
- Templates in `src/gondulf/templates/`
- Create `base.html` for shared layout
- Individual templates: `verify_email.html`, `verify_totp.html`, `authorize.html`, `error.html`
- Pass minimal context to templates
### 4. Database Migration Timing
**Decision**: Apply migration 002 immediately as part of Phase 2 setup.
**Rationale**: Keep database schema current with code expectations.
**Implementation**:
- Run migration before any Phase 2 code execution
- New code assumes 'two_factor' column exists
- Migration updates existing rows (if any) to have 'two_factor' = false
### 5. Client Validation Helper Functions
**Decision**: Implement as standalone functions in a shared utility module.
**Rationale**: Functions over classes when no state is needed. Simpler to test and understand.
**Implementation**:
- Create `src/gondulf/utils/validation.py`
- Functions: `mask_email()`, `validate_redirect_uri()`, `normalize_client_id()`
- Full subdomain validation now (not a stub) - security should be complete
### 6. Error Response Format Consistency
**Decision**: Use format appropriate to the endpoint type.
**Rationale**: Follow OAuth 2.0 patterns and user experience expectations.
**Implementation**:
- Verification endpoints (`/verify/email`, `/verify/totp`): JSON responses, always 200 OK
- Authorization endpoint errors before user interaction: HTML error page
- Authorization endpoint errors after client validation: OAuth redirect with error
- Token endpoint (Phase 3): Always JSON
### 7. Dependency Injection Pattern
**Decision**: Create `dependencies.py` with singleton services instantiated at startup.
**Rationale**: Simpler than per-request instantiation, consistent with Phase 1 pattern.
**Implementation**:
- All services instantiated once in `dependencies.py`
- Services read configuration at instantiation
- FastAPI dependency injection provides same instance to all requests
- Pattern: `get_code_storage()`, `get_rate_limiter()`, etc.
### 8. Test Organization for Authorization Endpoint
**Decision**: Separate test files per major endpoint with shared fixtures module.
**Rationale**: Easier to navigate and maintain as tests grow.
**Implementation**:
- `tests/test_verification_endpoints.py` - email and TOTP verification
- `tests/test_authorization_endpoint.py` - authorization flow
- `tests/conftest.py` - shared fixtures for common scenarios
- Test complete flows, not sub-endpoints in isolation
## Consequences
### Positive
- Clear, consistent patterns across the codebase
- Real security from the start (no stubs)
- Reuse of existing Phase 1 infrastructure
- Standard, maintainable template approach
- Simple service architecture
### Negative
- Slightly more upfront work than stub implementations
- In-memory rate limiting loses state on restart
- Templates add a dependency (Jinja2)
### Neutral
- Following established patterns from other web frameworks
- Committing to specific implementation choices early

View File

@@ -1,9 +1,10 @@
# ADR-005: Email-Based Authentication for v1.0.0 # ADR-005: Two-Factor Domain Verification for v1.0.0 (DNS + Email via rel="me")
Date: 2025-11-20 Date: 2025-11-20
Last Updated: 2025-11-20
## Status ## Status
Accepted Accepted (Updated)
## Context ## Context
@@ -65,143 +66,289 @@ From project brief:
## Decision ## Decision
**Gondulf v1.0.0 will use email-based verification as the PRIMARY authentication method, with DNS TXT record verification as an OPTIONAL fast-path.** **Gondulf v1.0.0 will require BOTH DNS TXT record verification AND email verification using the IndieWeb rel="me" pattern. Both verifications must succeed for authentication to complete.**
### Implementation Approach ### Implementation Approach
**Two-Tier Verification**: **Two-Factor Verification (Both Required)**:
1. **DNS TXT Record (Preferred, Optional)**: 1. **DNS TXT Record Verification (Required)**:
- Check for `_gondulf.{domain}` TXT record = `verified` - Check for `_gondulf.{domain}` TXT record = `verified`
- If found: Skip email verification, use cached result - If found: Proceed to email verification
- If not found: Fall back to email verification - If not found: Authentication fails with instructions to add TXT record
- Result cached in database for future use - Proves: User controls DNS for the domain
2. **Email Verification (Required Fallback)**: 2. **Email Discovery via rel="me" (Required)**:
- User provides email address at their domain - Fetch user's domain homepage (e.g., https://example.com)
- Parse HTML for `<link rel="me" href="mailto:user@example.com">`
- Extract email address from rel="me" link
- If not found: Authentication fails with instructions to add rel="me" link
- Proves: User has published email relationship on their site
3. **Email Verification Code (Required)**:
- Server generates 6-digit verification code - Server generates 6-digit verification code
- Server sends code via SMTP - Server sends code to discovered email address via SMTP
- User enters code (15-minute expiration) - User enters code (15-minute expiration)
- Domain marked as verified in database - Verification code must be correct to complete authentication
- Proves: User controls the email account
**Why Both?**: **Why All Three?**:
- DNS provides fast path for tech-savvy users - **DNS TXT**: Proves domain DNS control (strong ownership signal)
- Email provides accessible path for all users - **rel="me"**: Follows IndieWeb standard for identity claims
- DNS requires upfront setup but smoother repeat authentication - **Email Code**: Proves active control of the email account (not just DNS/HTML)
- Email requires no setup but requires email access each time - **Combined**: Two-factor verification provides stronger security than either alone
### Rationale ### Rationale
**Meets User Requirements**: **Enhanced Security Model**:
- Email-based authentication as specified - Two-factor verification: DNS control + Email control
- No external identity providers (GitHub, GitLab) in v1.0.0 - Prevents attacks where only one factor is compromised
- Simple to understand and implement - DNS TXT proves domain ownership
- Familiar UX pattern - Email code proves active account control
- rel="me" follows IndieWeb standards for identity
**Simplicity**: **Follows IndieWeb Standards**:
- Email verification is well-understood - rel="me" is standard practice for identity claims (see: https://thesatelliteoflove.com)
- Standard library SMTP support (smtplib) - Aligns with IndieAuth ecosystem expectations
- No OAuth 2.0 client implementation needed - Users likely already have rel="me" links for other purposes
- No external API dependencies - Email discovery is self-documenting (user's site declares their email)
**Security Sufficient for MVP**: **No User-Provided Email Input**:
- Email access typically indicates domain control - Server discovers email from user's site (no manual entry)
- 6-digit codes provide 1,000,000 combinations - Prevents typos and social engineering
- 15-minute expiration limits brute-force window - Email is self-attested by user on their own domain
- Rate limiting prevents abuse - Reduces attack surface (can't claim arbitrary email)
- TLS for email delivery (STARTTLS)
**Operational Simplicity**: **Stronger Than Single-Factor**:
- Requires only SMTP configuration (widely available) - Attacker needs DNS control AND email access
- No API keys or provider accounts needed - Compromised DNS alone: insufficient
- No rate limits from external providers - Compromised email alone: insufficient
- Full control over verification flow - Requires control of both infrastructure and communication
**DNS TXT as Enhancement**: **Simplicity Maintained**:
- Provides better UX for repeat authentication - Two verification checks, but both straightforward
- Demonstrates domain control more directly - DNS TXT: standard practice
- Optional (users not forced to configure DNS) - rel="me": standard HTML link
- Cached result eliminates email requirement - Email code: familiar pattern
- Total setup time: < 5 minutes for technical users
## Consequences ## Consequences
### Positive Consequences ### Positive Consequences
1. **User Simplicity**: 1. **Enhanced Security**:
- Familiar email verification pattern - Two-factor verification (DNS + Email)
- No need to create accounts on external services - Stronger ownership proof than single factor
- Works with any email provider - Prevents single-point-of-compromise attacks
- Aligns with security best practices
2. **Implementation Simplicity**: 2. **IndieWeb Standard Compliance**:
- Standard library support (smtplib, email) - Follows rel="me" pattern from IndieWeb community
- No external API integration - Interoperability with other IndieWeb tools
- Straightforward testing (mock SMTP) - Users may already have rel="me" configured
- Self-documenting identity claims
3. **Operational Simplicity**: 3. **Reduced Attack Surface**:
- Single external dependency (SMTP server) - No user-provided email input (prevents typos/social engineering)
- No API rate limits to manage - Email discovered from user's own site
- No provider outages to worry about - Can't claim arbitrary email addresses
- Admin controls email templates - User controls all verification requirements
4. **Privacy**: 4. **Implementation Simplicity**:
- Email addresses NOT stored (deleted after verification) - HTML parsing for rel="me" (standard libraries)
- DNS queries (dnspython)
- SMTP email sending (smtplib)
- No external API dependencies
5. **Privacy**:
- Email addresses NOT stored after verification
- No data shared with third parties - No data shared with third parties
- No tracking by external providers - No tracking by external providers
- Minimal data collection
5. **Flexibility**: 6. **Transparency**:
- DNS TXT provides fast-path for power users - User explicitly declares email on their site
- Email fallback ensures accessibility - No hidden verification methods
- No user locked out if DNS unavailable - User controls both DNS and HTML
- Clear requirements for setup
### Negative Consequences ### Negative Consequences
1. **Email Dependency**: 1. **Higher Setup Complexity**:
- Users must configure TWO things (DNS TXT + rel="me" link)
- More steps than single-factor approaches
- Requires basic HTML editing skills
- May deter non-technical users
2. **Email Dependency**:
- Requires functioning SMTP configuration - Requires functioning SMTP configuration
- Email delivery not guaranteed (spam filters) - Email delivery not guaranteed (spam filters)
- Users must have email access during authentication - Users must have email access during authentication
- Email account compromise = domain compromise - Email account compromise still a risk (mitigated by DNS requirement)
2. **User Experience**: 3. **User Experience**:
- Extra step vs. provider OAuth (more clicks) - More setup steps vs. simpler alternatives
- Requires checking email inbox - Requires checking email inbox during login
- Potential delay (email delivery time) - Potential delay (email delivery time)
- Code expiration can frustrate users - Code expiration can frustrate users
- Both verifications must succeed (no fallback)
3. **Security Limitations**: 4. **HTML Parsing Complexity**:
- Email interception risk (mitigated by TLS) - Must parse potentially malformed HTML
- Email account compromise risk (user responsibility) - Multiple possible HTML formats for rel="me"
- Weaker than hardware-based auth (WebAuthn) - Case sensitivity issues
- Must handle various link formats (mailto: vs https://)
4. **Scalability Concerns**: 5. **Failure Points**:
- Email delivery at scale (future concern) - DNS lookup failure blocks authentication
- SMTP rate limits (future concern) - Site unavailable blocks authentication
- Email provider blocking (spam prevention) - Email send failure blocks authentication
- No fallback mechanism (both required)
### Mitigation Strategies ### Mitigation Strategies
**Email Delivery Reliability**: **Clear Setup Instructions**:
```python ```markdown
# Robust SMTP configuration ## Domain Verification Setup
SMTP_CONFIG = {
'host': os.environ['SMTP_HOST'],
'port': int(os.environ.get('SMTP_PORT', '587')),
'use_tls': True, # STARTTLS required
'username': os.environ['SMTP_USERNAME'],
'password': os.environ['SMTP_PASSWORD'],
'from_email': os.environ['SMTP_FROM'],
'timeout': 10, # Fail fast
}
# Comprehensive error handling Gondulf requires two verifications to prove domain ownership:
try:
send_email(to=email, code=code) ### Step 1: Add DNS TXT Record
except SMTPException as e: Add this DNS record to your domain:
logger.error(f"Email send failed: {e}") - Type: TXT
# Display user-friendly error - Name: _gondulf.example.com
raise HTTPException(500, "Email delivery failed. Try again or contact admin.") - Value: verified
This proves you control DNS for your domain.
### Step 2: Add rel="me" Link to Your Homepage
Add this HTML to your homepage (e.g., https://example.com/index.html):
<link rel="me" href="mailto:your-email@example.com">
This declares your email address publicly on your site.
### Step 3: Verify Email Access
During login:
- We'll discover your email from the rel="me" link
- We'll send a verification code to that email
- Enter the code to complete authentication
Setup time: ~5 minutes
``` ```
**Code Security**: **Robust HTML Parsing**:
```python
from bs4 import BeautifulSoup
from urllib.parse import urlparse
def discover_email_from_site(domain_url: str) -> Optional[str]:
"""
Fetch site and discover email from rel="me" link.
Returns: email address or None if not found
"""
try:
# Fetch homepage
response = requests.get(domain_url, timeout=10, allow_redirects=True)
response.raise_for_status()
# Parse HTML (handle malformed HTML gracefully)
soup = BeautifulSoup(response.content, 'html.parser')
# Find all rel="me" links
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Look for mailto: links
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format
if validate_email_format(email):
logger.info(f"Discovered email via rel='me' for {domain_url}")
return email
logger.warning(f"No rel='me' mailto: link found for {domain_url}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain_url}: {e}")
return None
```
**DNS Verification**:
```python
def verify_dns_txt(domain: str) -> bool:
"""
Verify _gondulf.{domain} TXT record exists.
Returns: True if verified, False otherwise
"""
try:
import dns.resolver
# Query multiple resolvers for redundancy
resolvers = ['8.8.8.8', '1.1.1.1']
verified_count = 0
for resolver_ip in resolvers:
resolver = dns.resolver.Resolver()
resolver.nameservers = [resolver_ip]
resolver.timeout = 5
answers = resolver.resolve(f'_gondulf.{domain}', 'TXT')
for rdata in answers:
if rdata.to_text().strip('"') == 'verified':
verified_count += 1
break
# Require consensus from multiple resolvers
return verified_count >= 2
except Exception as e:
logger.warning(f"DNS verification failed for {domain}: {e}")
return False
```
**Helpful Error Messages**:
```python
# DNS TXT not found
if not dns_verified:
return ErrorResponse("""
DNS verification failed.
Please add this TXT record to your domain:
- Type: TXT
- Name: _gondulf.{domain}
- Value: verified
DNS changes may take up to 24 hours to propagate.
""")
# rel="me" not found
if not email_discovered:
return ErrorResponse("""
Could not find rel="me" link on your site.
Please add this to your homepage:
<link rel="me" href="mailto:your-email@example.com">
See: https://indieweb.org/rel-me for more information.
""")
# Email send failure
if not email_sent:
return ErrorResponse("""
Failed to send verification code to {email}.
Please check:
- Email address is correct in your rel="me" link
- Email server is accepting mail
- Check spam/junk folder
""")
```
**Code Security** (unchanged):
```python ```python
# Sufficient entropy # Sufficient entropy
code = ''.join(secrets.choice('0123456789') for _ in range(6)) code = ''.join(secrets.choice('0123456789') for _ in range(6))
@@ -209,107 +356,182 @@ code = ''.join(secrets.choice('0123456789') for _ in range(6))
# Rate limiting # Rate limiting
MAX_ATTEMPTS = 3 # Per email MAX_ATTEMPTS = 3 # Per email
MAX_CODES = 3 # Per hour per email MAX_CODES = 3 # Per hour per domain
# Expiration # Expiration
CODE_LIFETIME = timedelta(minutes=15) CODE_LIFETIME = timedelta(minutes=15)
# Attempt tracking # Single-use enforcement
attempts = code_storage.get_attempts(email) code_storage.mark_used(code_id)
if attempts >= MAX_ATTEMPTS:
raise HTTPException(429, "Too many attempts. Try again in 15 minutes.")
```
**Email Interception**:
```python
# Require TLS for email delivery
smtp.starttls()
# Clear warning to users
"""
We've sent a verification code to your email.
Only enter this code if you initiated this login.
The code expires in 15 minutes.
"""
# Log suspicious activity
if time_between_send_and_verify < 1_second:
logger.warning(f"Suspiciously fast verification: {domain}")
```
**DNS TXT Fast-Path**:
```python
# Check DNS first, skip email if verified
txt_record = dns.query(f'_gondulf.{domain}', 'TXT')
if txt_record == 'verified':
logger.info(f"DNS verification successful: {domain}")
# Use cached verification, skip email
return verified_domain(domain)
# Fall back to email
logger.info(f"DNS verification not found, using email: {domain}")
return email_verification_flow(domain)
```
**User Education**:
```markdown
## Domain Verification
Gondulf offers two ways to verify domain ownership:
### Option 1: DNS TXT Record (Recommended)
Add this DNS record to skip email verification:
- Type: TXT
- Name: _gondulf.example.com
- Value: verified
Benefits:
- Faster authentication (no email required)
- Verify once, use forever
- More secure (DNS control = domain control)
### Option 2: Email Verification
- Enter an email address at your domain
- We'll send a 6-digit code
- Enter the code to verify
Benefits:
- No DNS configuration needed
- Works immediately
- Familiar process
``` ```
## Implementation ## Implementation
### Email Verification Flow ### Complete Authentication Flow (v1.0.0)
```python ```python
from datetime import datetime, timedelta from datetime import datetime, timedelta
import secrets import secrets
import smtplib import smtplib
import requests
import dns.resolver
from email.message import EmailMessage from email.message import EmailMessage
from bs4 import BeautifulSoup
from typing import Optional, Tuple
class EmailVerificationService: class DomainVerificationService:
"""
Two-factor domain verification: DNS TXT + Email via rel="me"
"""
def __init__(self, smtp_config: dict): def __init__(self, smtp_config: dict):
self.smtp = smtp_config self.smtp = smtp_config
self.codes = {} # In-memory storage (short-lived) self.codes = {} # In-memory storage for verification codes
def request_code(self, email: str, domain: str) -> None: def verify_domain_ownership(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]:
""" """
Generate and send verification code. Perform two-factor domain verification.
Raises: Returns: (success, email_discovered, error_message)
ValueError: If email domain doesn't match requested domain
HTTPException: If rate limit exceeded or email send fails Steps:
1. Verify DNS TXT record
2. Discover email from rel="me" link
3. Send verification code to email
4. User enters code (handled separately)
""" """
# Validate email matches domain # Step 1: Verify DNS TXT record
email_domain = email.split('@')[1].lower() dns_verified = self._verify_dns_txt(domain)
if email_domain != domain.lower(): if not dns_verified:
raise ValueError(f"Email must be at {domain}") return False, None, "DNS TXT record not found. Please add _gondulf.{domain} = verified"
# Step 2: Discover email from site's rel="me" link
email = self._discover_email_from_site(f"https://{domain}")
if not email:
return False, None, 'No rel="me" mailto: link found on homepage. Please add <link rel="me" href="mailto:you@example.com">'
# Step 3: Generate and send verification code
code_sent = self._send_verification_code(email, domain)
if not code_sent:
return False, email, f"Failed to send verification code to {email}"
# Return success with discovered email
return True, email, None
def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, str]:
"""
Verify submitted code.
Returns: (success, domain or error_message)
"""
code_data = self.codes.get(email)
if not code_data:
return False, "No verification code found. Please request a new code."
# Check expiration
if datetime.utcnow() > code_data['expires_at']:
del self.codes[email]
return False, "Code expired. Please request a new code."
# Check attempts
code_data['attempts'] += 1
if code_data['attempts'] > 3:
del self.codes[email]
return False, "Too many attempts. Please restart authentication."
# Verify code (constant-time comparison)
if not secrets.compare_digest(submitted_code, code_data['code']):
return False, "Invalid code. Please try again."
# Success: Clean up and return domain
domain = code_data['domain']
del self.codes[email] # Single-use code
logger.info(f"Domain verified: {domain} (DNS + Email)")
return True, domain
def _verify_dns_txt(self, domain: str) -> bool:
"""
Verify _gondulf.{domain} TXT record exists with value 'verified'.
Returns: True if verified, False otherwise
"""
record_name = f'_gondulf.{domain}'
# Use multiple resolvers for redundancy
resolvers = ['8.8.8.8', '1.1.1.1']
verified_count = 0
for resolver_ip in resolvers:
try:
resolver = dns.resolver.Resolver()
resolver.nameservers = [resolver_ip]
resolver.timeout = 5
answers = resolver.resolve(record_name, 'TXT')
for rdata in answers:
if rdata.to_text().strip('"') == 'verified':
verified_count += 1
break
except Exception as e:
logger.debug(f"DNS query failed (resolver {resolver_ip}): {e}")
continue
# Require consensus from at least 2 resolvers
if verified_count >= 2:
logger.info(f"DNS TXT verified: {domain}")
return True
logger.warning(f"DNS TXT verification failed: {domain}")
return False
def _discover_email_from_site(self, domain_url: str) -> Optional[str]:
"""
Fetch domain homepage and discover email from rel="me" link.
Returns: email address or None if not found
"""
try:
# Fetch homepage
response = requests.get(domain_url, timeout=10, allow_redirects=True)
response.raise_for_status()
# Parse HTML (BeautifulSoup handles malformed HTML)
soup = BeautifulSoup(response.content, 'html.parser')
# Find all rel="me" links (both <link> and <a>)
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Look for mailto: links
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Basic email validation
if '@' in email and '.' in email.split('@')[1]:
logger.info(f"Discovered email via rel='me': {domain_url}")
return email
logger.warning(f"No rel='me' mailto: link found: {domain_url}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain_url}: {e}")
return None
def _send_verification_code(self, email: str, domain: str) -> bool:
"""
Generate and send verification code to email.
Returns: True if sent successfully, False otherwise
"""
# Check rate limit # Check rate limit
if self._is_rate_limited(email): if self._is_rate_limited(domain):
raise HTTPException(429, "Too many requests. Try again in 1 hour.") logger.warning(f"Rate limit exceeded for domain: {domain}")
return False
# Generate 6-digit code # Generate 6-digit code
code = ''.join(secrets.choice('0123456789') for _ in range(6)) code = ''.join(secrets.choice('0123456789') for _ in range(6))
@@ -323,56 +545,14 @@ class EmailVerificationService:
'attempts': 0, 'attempts': 0,
} }
# Send email # Send email via SMTP
try: try:
self._send_code_email(email, code) msg = EmailMessage()
logger.info(f"Verification code sent to {email[:3]}***@{email_domain}") msg['From'] = self.smtp['from_email']
except Exception as e: msg['To'] = email
logger.error(f"Failed to send email to {email_domain}: {e}") msg['Subject'] = 'Gondulf Verification Code'
raise HTTPException(500, "Email delivery failed")
def verify_code(self, email: str, submitted_code: str) -> str: msg.set_content(f"""
"""
Verify submitted code.
Returns: domain if valid
Raises: HTTPException if invalid/expired
"""
code_data = self.codes.get(email)
if not code_data:
raise HTTPException(400, "No verification code found")
# Check expiration
if datetime.utcnow() > code_data['expires_at']:
del self.codes[email]
raise HTTPException(400, "Code expired. Request a new one.")
# Check attempts
code_data['attempts'] += 1
if code_data['attempts'] > 3:
del self.codes[email]
raise HTTPException(429, "Too many attempts")
# Verify code (constant-time comparison)
if not secrets.compare_digest(submitted_code, code_data['code']):
raise HTTPException(400, "Invalid code")
# Success: Clean up and return domain
domain = code_data['domain']
del self.codes[email] # Single-use code
logger.info(f"Domain verified via email: {domain}")
return domain
def _send_code_email(self, to: str, code: str) -> None:
"""Send verification code via SMTP."""
msg = EmailMessage()
msg['From'] = self.smtp['from_email']
msg['To'] = to
msg['Subject'] = 'Gondulf Verification Code'
msg.set_content(f"""
Your Gondulf verification code is: Your Gondulf verification code is:
{code} {code}
@@ -381,96 +561,34 @@ This code expires in 15 minutes.
Only enter this code if you initiated this login. Only enter this code if you initiated this login.
If you did not request this code, ignore this email. If you did not request this code, ignore this email.
""") """)
with smtplib.SMTP(self.smtp['host'], self.smtp['port'], timeout=10) as smtp: with smtplib.SMTP(self.smtp['host'], self.smtp['port'], timeout=10) as smtp:
smtp.starttls() smtp.starttls()
smtp.login(self.smtp['username'], self.smtp['password']) smtp.login(self.smtp['username'], self.smtp['password'])
smtp.send_message(msg) smtp.send_message(msg)
def _is_rate_limited(self, email: str) -> bool: logger.info(f"Verification code sent to {email[:3]}***@{email.split('@')[1]}")
"""Check if email is rate limited.""" return True
# Simple in-memory tracking (for v1.0.0)
# Future: Redis-based rate limiting except Exception as e:
logger.error(f"Failed to send email to {email}: {e}")
return False
def _is_rate_limited(self, domain: str) -> bool:
"""
Check if domain is rate limited (max 3 codes per hour).
Returns: True if rate limited, False otherwise
"""
recent_codes = [ recent_codes = [
code for code in self.codes.values() code for code in self.codes.values()
if code.get('email') == email if code.get('domain') == domain
and datetime.utcnow() - code['created_at'] < timedelta(hours=1) and datetime.utcnow() - code['created_at'] < timedelta(hours=1)
] ]
return len(recent_codes) >= 3 return len(recent_codes) >= 3
``` ```
### DNS TXT Record Verification
```python
import dns.resolver
class DNSVerificationService:
def __init__(self, cache_storage):
self.cache = cache_storage
def verify_domain(self, domain: str) -> bool:
"""
Check if domain has valid DNS TXT record.
Returns: True if verified, False otherwise
"""
# Check cache first
cached = self.cache.get(domain)
if cached and cached['verified']:
logger.info(f"Using cached DNS verification: {domain}")
return True
# Query DNS
try:
verified = self._query_txt_record(domain)
# Cache result
self.cache.set(domain, {
'verified': verified,
'verified_at': datetime.utcnow(),
'method': 'txt_record'
})
return verified
except Exception as e:
logger.warning(f"DNS verification failed for {domain}: {e}")
return False
def _query_txt_record(self, domain: str) -> bool:
"""
Query _gondulf.{domain} TXT record.
Returns: True if record exists with value 'verified'
"""
record_name = f'_gondulf.{domain}'
# Use multiple resolvers for redundancy
resolvers = ['8.8.8.8', '1.1.1.1']
for resolver_ip in resolvers:
try:
resolver = dns.resolver.Resolver()
resolver.nameservers = [resolver_ip]
resolver.timeout = 5
resolver.lifetime = 5
answers = resolver.resolve(record_name, 'TXT')
for rdata in answers:
txt_value = rdata.to_text().strip('"')
if txt_value == 'verified':
logger.info(f"DNS TXT verified: {domain} (resolver: {resolver_ip})")
return True
except Exception as e:
logger.debug(f"DNS query failed (resolver {resolver_ip}): {e}")
continue
return False
```
## Future Enhancements ## Future Enhancements
### v1.1.0+: Additional Authentication Methods ### v1.1.0+: Additional Authentication Methods
@@ -561,13 +679,22 @@ These will be additive (user chooses method), not replacing email.
## References ## References
- IndieWeb rel="me": https://indieweb.org/rel-me
- Example Implementation: https://thesatelliteoflove.com (Phil Skents' identity page)
- SMTP Protocol (RFC 5321): https://datatracker.ietf.org/doc/html/rfc5321 - SMTP Protocol (RFC 5321): https://datatracker.ietf.org/doc/html/rfc5321
- Email Security (STARTTLS): https://datatracker.ietf.org/doc/html/rfc3207 - Email Security (STARTTLS): https://datatracker.ietf.org/doc/html/rfc3207
- DNS TXT Records (RFC 1035): https://datatracker.ietf.org/doc/html/rfc1035 - DNS TXT Records (RFC 1035): https://datatracker.ietf.org/doc/html/rfc1035
- HTML Link Relations: https://www.w3.org/TR/html5/links.html#linkTypes
- BeautifulSoup (HTML parsing): https://www.crummy.com/software/BeautifulSoup/
- WebAuthn (W3C): https://www.w3.org/TR/webauthn/ (future) - WebAuthn (W3C): https://www.w3.org/TR/webauthn/ (future)
## Decision History ## Decision History
- 2025-11-20: Proposed (Architect) - 2025-11-20: Proposed (Architect) - Email primary, DNS optional
- 2025-11-20: Accepted (Architect) - 2025-11-20: Accepted (Architect) - Email primary, DNS optional
- 2025-11-20: **UPDATED** (Architect) - BOTH required (DNS + Email via rel="me")
- Changed from single-factor (email OR DNS) to two-factor (email AND DNS)
- Added rel="me" email discovery (IndieWeb standard)
- Removed user-provided email input (security improvement)
- Enhanced security model with dual verification
- TBD: Review after v1.0.0 deployment (gather user feedback) - TBD: Review after v1.0.0 deployment (gather user feedback)

View File

@@ -0,0 +1,516 @@
# ADR-008: rel="me" Email Discovery Pattern
Date: 2025-11-20
## Status
Accepted
## Context
Gondulf's authentication flow requires email verification as part of two-factor domain verification (see ADR-005). This raises the question: How do we obtain the user's email address?
### Email Acquisition Methods Evaluated
**1. User-Provided Email Input**
- User manually enters their email address
- Server validates email domain matches identity domain
- Simple UX pattern (familiar from many sites)
**2. DNS TXT Record**
- Email address stored in DNS: `_email.example.com` TXT `user@example.com`
- Server queries DNS to discover email
- Requires DNS configuration
**3. rel="me" Link Discovery (IndieWeb Standard)**
- User publishes email on their site: `<link rel="me" href="mailto:user@example.com">`
- Server fetches site and parses HTML for rel="me" links
- Follows IndieWeb standards for identity claims
**4. WebFinger Protocol**
- Server queries `/.well-known/webfinger?resource={domain}`
- Standard protocol for identity discovery
- Requires additional endpoint implementation
### Requirements
From the user requirement and IndieAuth ecosystem:
- **Security**: Prevent social engineering and email spoofing
- **Simplicity**: Keep v1.0.0 implementation straightforward
- **Standards**: Align with IndieWeb/IndieAuth community practices
- **Self-Documenting**: Users should understand what they're publishing
### IndieWeb Context
The IndieWeb community uses `rel="me"` as a standard way to assert identity relationships:
- Users publish rel="me" links on their homepage to various profiles (GitHub, Twitter, email, etc.)
- Other tools can discover these relationships by parsing the page
- Well-established pattern in the IndieWeb ecosystem
- Reference implementation: https://thesatelliteoflove.com
## Decision
**Gondulf v1.0.0 will discover email addresses from rel="me" links published on the user's homepage, following the IndieWeb standard.**
### Implementation Approach
1. **Fetch User's Homepage**
- When user initiates authentication with domain (e.g., `https://example.com`)
- Server fetches the homepage over HTTPS
- Timeout: 10 seconds
- Follow redirects (max 5)
- Verify SSL certificate
2. **Parse HTML for rel="me" Links**
- Use BeautifulSoup for robust HTML parsing (handles malformed HTML)
- Search for `<link rel="me" href="mailto:...">` tags
- Also check `<a rel="me" href="mailto:...">` tags
- Extract first matching mailto: link
- Case-insensitive rel attribute matching
3. **Validate Email Format**
- Basic RFC 5322 format validation
- Length checks (max 254 characters per RFC 5321)
- Format: `user@domain.tld`
4. **Use Discovered Email**
- Send verification code to discovered email
- Display partially masked email to user: `u***@example.com`
- User cannot modify email (discovered automatically)
5. **Error Handling**
- If no rel="me" link found: Display setup instructions
- If multiple mailto: links: Use first one
- If site unreachable: Display error with retry option
- If SSL verification fails: Reject (security)
### Example HTML
User adds this to their homepage:
```html
<!DOCTYPE html>
<html>
<head>
<title>Phil Skents</title>
<!-- rel="me" link for email -->
<link rel="me" href="mailto:phil@example.com">
<!-- Other rel="me" links (optional) -->
<link rel="me" href="https://github.com/philskents">
<link rel="me" href="https://twitter.com/philskents">
</head>
<body>
<h1>Phil Skents</h1>
<p>This is my personal website.</p>
</body>
</html>
```
Or visible link:
```html
<a rel="me" href="mailto:phil@example.com">Email me</a>
```
## Rationale
### Follows IndieWeb Standards
**IndieWeb Alignment**:
- rel="me" is the standard way to assert identity in IndieWeb
- Users familiar with IndieAuth likely already have rel="me" configured
- Interoperability with other IndieWeb tools
- Well-documented pattern: https://indieweb.org/rel-me
**Community Expectations**:
- IndieAuth ecosystem uses rel="me" extensively
- Users understand the pattern
- Existing tutorials and documentation available
- Aligns with decentralized identity principles
### Security Benefits
**Prevents Social Engineering**:
- User cannot claim arbitrary email addresses
- Email must be published on the user's own site
- Attacker cannot trick user into entering wrong email
- Self-attested identity (user declares on their domain)
**Reduces Attack Surface**:
- No user input field for email (no typos, no XSS)
- No email enumeration via guessing
- Email discovery transparent and auditable
- User controls what email is published
**Transparency**:
- User explicitly publishes email on their site
- Public declaration of email relationship
- User aware they're making email public
- No hidden or implicit email collection
### Implementation Simplicity
**Standard Libraries**:
- BeautifulSoup: Robust HTML parsing (handles malformed HTML)
- requests: HTTP client (widely used, well-tested)
- No custom protocols or complex parsing
- Python standard library for email validation
**Error Handling**:
- Clear error messages with setup instructions
- Graceful degradation (site unavailable, etc.)
- Standard HTTP status codes
- No complex state management
**Testing**:
- Easy to mock HTTP responses
- Straightforward unit tests
- BeautifulSoup handles edge cases (malformed HTML)
- No external service dependencies
### User Experience
**Self-Documenting**:
- User adds one HTML tag to their site
- Clear relationship between domain and email
- User understands what they're publishing
- No hidden configuration
**Familiar Pattern**:
- Similar to verifying site ownership (Google Search Console, etc.)
- Adding meta tags is common web practice
- Many users already have rel="me" for other purposes
- Works with static sites (no backend required)
**Setup Time**:
- ~1 minute to add link tag
- No waiting (unlike DNS propagation)
- Immediate verification possible
- Can be combined with other rel="me" links
## Consequences
### Positive Consequences
1. **IndieWeb Standard Compliance**:
- Follows established rel="me" pattern
- Interoperability with IndieWeb tools
- Community-vetted approach
- Well-documented standard
2. **Enhanced Security**:
- No user-provided email input (prevents social engineering)
- Email explicitly published by user
- Transparent and auditable
- Reduces phishing risk
3. **Implementation Simplicity**:
- Standard libraries (BeautifulSoup, requests)
- No complex protocols
- Easy to test and maintain
- Handles malformed HTML gracefully
4. **User Control**:
- User explicitly declares email on their site
- Can change email by updating HTML
- No hidden email collection
- User aware of public email
5. **Flexibility**:
- Works with static sites (no backend needed)
- Can use any email provider
- Email can be at different domain (e.g., Gmail)
- Supports multiple rel="me" links
### Negative Consequences
1. **Public Email Requirement**:
- User must publish email publicly on their site
- Not suitable for users who want private email
- Email harvesters can discover address
- Spam risk (mitigated: users can use spam filters)
2. **HTML Parsing Complexity**:
- Must handle various HTML formats
- Malformed HTML can cause issues (mitigated: BeautifulSoup)
- Case sensitivity considerations
- Multiple possible HTML structures
3. **Website Dependency**:
- User's site must be available during authentication
- Site downtime blocks authentication
- No fallback if site unreachable
- Requires HTTPS (not all sites have valid certificates)
4. **Discovery Failures**:
- User may not have rel="me" configured
- Link may be in wrong format
- Email may be invalid format
- Clear error messages required
5. **Privacy Considerations**:
- Email addresses visible to anyone
- Cannot use email verification without public disclosure
- Users must accept public email
- May deter privacy-conscious users
### Mitigation Strategies
**For Public Email Concern**:
- Document clearly that email will be public
- Suggest using dedicated email for IndieAuth
- Recommend spam filtering
- Note: Email is user's choice (they publish it)
**For HTML Parsing**:
```python
from bs4 import BeautifulSoup
# BeautifulSoup handles malformed HTML gracefully
soup = BeautifulSoup(html_content, 'html.parser')
# Case-insensitive attribute matching
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Multiple link formats supported
# <link rel="me" href="mailto:user@example.com">
# <a rel="me" href="mailto:user@example.com">Email</a>
```
**For Website Dependency**:
- Clear error messages with retry option
- Suggest checking site availability
- Timeout limits (10 seconds)
- Log errors for debugging
**For Discovery Failures**:
```markdown
Error: No rel="me" email link found
Please add this to your homepage:
<link rel="me" href="mailto:your-email@example.com">
See: https://indieweb.org/rel-me for more information.
```
## Implementation
### Email Discovery Service
```python
from bs4 import BeautifulSoup
import requests
from typing import Optional
import re
class RelMeEmailDiscovery:
"""
Discover email addresses from rel="me" links on user's homepage.
"""
def discover_email(self, domain: str) -> Optional[str]:
"""
Fetch domain homepage and discover email from rel="me" link.
Args:
domain: User's domain (e.g., "example.com")
Returns:
Email address or None if not found
"""
url = f"https://{domain}"
try:
# Fetch homepage with safety limits
response = requests.get(
url,
timeout=10,
allow_redirects=True,
max_redirects=5,
verify=True # Verify SSL certificate
)
response.raise_for_status()
# Parse HTML (handles malformed HTML)
soup = BeautifulSoup(response.content, 'html.parser')
# Find all rel="me" links
# Both <link> and <a> tags supported
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Look for mailto: links
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
email = href.replace('mailto:', '').strip()
# Validate email format
if self._validate_email_format(email):
logger.info(f"Discovered email via rel='me' for {domain}")
return email
logger.warning(f"No rel='me' mailto: link found on {domain}")
return None
except requests.exceptions.SSLError as e:
logger.error(f"SSL verification failed for {domain}: {e}")
return None
except requests.exceptions.Timeout:
logger.error(f"Timeout fetching {domain}")
return None
except requests.exceptions.HTTPError as e:
logger.error(f"HTTP error fetching {domain}: {e}")
return None
except Exception as e:
logger.error(f"Failed to discover email for {domain}: {e}")
return None
def _validate_email_format(self, email: str) -> bool:
"""
Validate email address format.
Args:
email: Email address to validate
Returns:
True if valid format, False otherwise
"""
# Basic RFC 5322 format check
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, email):
return False
# Length check (RFC 5321)
if len(email) > 254:
return False
# Must have exactly one @
if email.count('@') != 1:
return False
return True
```
### Error Messages
```python
# DNS TXT found, but no rel="me" link
error_message = """
Domain verified via DNS, but no email found on your site.
Please add this to your homepage:
<link rel="me" href="mailto:your-email@example.com">
This allows us to discover your email address automatically.
Learn more: https://indieweb.org/rel-me
"""
# Site unreachable
error_message = """
Could not fetch your site at https://{domain}
Please check:
- Site is accessible via HTTPS
- SSL certificate is valid
- No firewall blocking requests
Try again once your site is accessible.
"""
# Invalid email format in rel="me"
error_message = """
Found rel="me" link, but email format is invalid: {email}
Please check your rel="me" link uses valid email format:
<link rel="me" href="mailto:valid-email@example.com">
"""
```
## Alternatives Considered
### Alternative 1: User-Provided Email Input
**Pros**:
- Simpler implementation (no HTTP fetch, no parsing)
- Works even if site is down
- User can use private email (not public)
- Immediate (no HTTP round-trip)
**Cons**:
- Social engineering risk (attacker tricks user into entering wrong email)
- Typo risk (user enters incorrect email)
- No self-attestation (email not on user's site)
- Not aligned with IndieWeb standards
**Rejected**: Security risks outweigh simplicity benefits. rel="me" provides self-attestation and prevents social engineering.
---
### Alternative 2: DNS TXT Record for Email
**Pros**:
- Stronger proof of domain control (DNS)
- No website dependency
- Machine-readable format
- Fast lookups (DNS cache)
**Cons**:
- Requires DNS configuration (more complex than HTML)
- DNS propagation delays (can be hours)
- Not user-friendly for non-technical users
- Not standard IndieWeb practice
**Rejected**: DNS configuration is more complex than adding HTML tag. rel="me" is more aligned with IndieWeb standards.
---
### Alternative 3: WebFinger Protocol
**Pros**:
- Standard protocol (RFC 7033)
- Machine-readable format (JSON)
- Supports multiple identities
- Well-defined spec
**Cons**:
- Requires server-side endpoint (not for static sites)
- More complex implementation
- Not common in IndieWeb ecosystem
- Overkill for email discovery
**Rejected**: Too complex for v1.0.0 MVP. Doesn't work with static sites. rel="me" is simpler and more aligned with IndieWeb.
---
### Alternative 4: Well-Known URI
**Pros**:
- Standard approach (`/.well-known/email`)
- Simple file-based implementation
- No HTML parsing required
- Fast lookups
**Cons**:
- Not an established standard for email
- Requires server configuration
- Not aligned with IndieWeb practices
- Duplicate effort (rel="me" already exists)
**Rejected**: Not standard practice. rel="me" is already established in IndieWeb ecosystem.
## References
- IndieWeb rel="me": https://indieweb.org/rel-me
- Example Implementation: https://thesatelliteoflove.com (Phil Skents' identity page)
- HTML Link Relations (W3C): https://www.w3.org/TR/html5/links.html#linkTypes
- BeautifulSoup Documentation: https://www.crummy.com/software/BeautifulSoup/
- RFC 5322 (Email Format): https://datatracker.ietf.org/doc/html/rfc5322
- RFC 5321 (SMTP): https://datatracker.ietf.org/doc/html/rfc5321
- WebFinger (RFC 7033): https://datatracker.ietf.org/doc/html/rfc7033 (alternative considered)
## Decision History
- 2025-11-20: Proposed (Architect)
- 2025-11-20: Accepted (Architect)
- Related to ADR-005 (Two-Factor Domain Verification)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,739 @@
# Phase 2 Implementation Guide - Specific Details
**Date**: 2024-11-20
**Architect**: Claude (Architect Agent)
**Status**: Supplementary to Phase 2 Design
**Purpose**: Provide specific implementation details for Developer clarification questions
This document supplements `/docs/designs/phase-2-domain-verification.md` with specific implementation decisions from ADR-0004.
## 1. Rate Limiting Implementation
### Approach
Implement actual in-memory rate limiting with timestamp tracking.
### Implementation Specifications
**Service Structure**:
```python
# src/gondulf/rate_limiter.py
from typing import Dict, List
import time
class RateLimiter:
"""In-memory rate limiter for domain verification attempts."""
def __init__(self, max_attempts: int = 3, window_hours: int = 1):
"""
Args:
max_attempts: Maximum attempts per domain in time window (default: 3)
window_hours: Time window in hours (default: 1)
"""
self.max_attempts = max_attempts
self.window_seconds = window_hours * 3600
self._attempts: Dict[str, List[int]] = {} # domain -> [timestamp1, timestamp2, ...]
def check_rate_limit(self, domain: str) -> bool:
"""
Check if domain has exceeded rate limit.
Args:
domain: Domain to check
Returns:
True if within rate limit, False if exceeded
"""
# Clean old timestamps first
self._clean_old_attempts(domain)
# Check current count
if domain not in self._attempts:
return True
return len(self._attempts[domain]) < self.max_attempts
def record_attempt(self, domain: str) -> None:
"""Record a verification attempt for domain."""
now = int(time.time())
if domain not in self._attempts:
self._attempts[domain] = []
self._attempts[domain].append(now)
def _clean_old_attempts(self, domain: str) -> None:
"""Remove timestamps older than window."""
if domain not in self._attempts:
return
now = int(time.time())
cutoff = now - self.window_seconds
self._attempts[domain] = [ts for ts in self._attempts[domain] if ts > cutoff]
# Remove domain entirely if no recent attempts
if not self._attempts[domain]:
del self._attempts[domain]
```
**Usage in Endpoints**:
```python
# In verification endpoint
rate_limiter = get_rate_limiter()
if not rate_limiter.check_rate_limit(domain):
return {"success": False, "error": "rate_limit_exceeded"}
rate_limiter.record_attempt(domain)
# ... proceed with verification
```
**Consequences**:
- State lost on restart (acceptable trade-off for simplicity)
- No persistence needed
- Simple dictionary-based implementation
## 2. Authorization Code Metadata Structure
### Approach
Use Phase 1's `CodeStorage` service with complete metadata structure from the start.
### Data Structure Specification
**Authorization Code Metadata**:
```python
{
"client_id": "https://client.example.com/",
"redirect_uri": "https://client.example.com/callback",
"state": "client_state_value",
"code_challenge": "base64url_encoded_challenge",
"code_challenge_method": "S256",
"scope": "profile email",
"me": "https://user.example.com/",
"created_at": 1700000000, # epoch integer
"expires_at": 1700000600, # epoch integer (created_at + 600)
"used": False # Include now, consume in Phase 3
}
```
**Storage Implementation**:
```python
# Use Phase 1's CodeStorage
code_storage = get_code_storage()
authorization_code = generate_random_code()
metadata = {
"client_id": client_id,
"redirect_uri": redirect_uri,
"state": state,
"code_challenge": code_challenge,
"code_challenge_method": code_challenge_method,
"scope": scope,
"me": me,
"created_at": int(time.time()),
"expires_at": int(time.time()) + 600,
"used": False
}
code_storage.store(f"authz:{authorization_code}", metadata, ttl=600)
```
**Rationale**:
- Epoch integers simpler than datetime objects
- Include `used` field now (Phase 3 will check/update it)
- Reuse existing `CodeStorage` infrastructure
- Key prefix `authz:` distinguishes from verification codes
## 3. HTML Template Implementation
### Approach
Use Jinja2 templates with separate template files.
### Directory Structure
```
src/gondulf/templates/
├── base.html # Shared layout
├── verify_email.html # Email verification form
├── verify_totp.html # TOTP verification form (future)
├── authorize.html # Authorization consent page
└── error.html # Generic error page
```
### Base Template
```html
<!-- src/gondulf/templates/base.html -->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{% block title %}Gondulf IndieAuth{% endblock %}</title>
<style>
body {
font-family: system-ui, -apple-system, sans-serif;
max-width: 600px;
margin: 50px auto;
padding: 20px;
line-height: 1.6;
}
.error { color: #d32f2f; }
.success { color: #388e3c; }
form { margin-top: 20px; }
input, button { font-size: 16px; padding: 8px; }
button { background: #1976d2; color: white; border: none; cursor: pointer; }
button:hover { background: #1565c0; }
</style>
</head>
<body>
{% block content %}{% endblock %}
</body>
</html>
```
### Email Verification Template
```html
<!-- src/gondulf/templates/verify_email.html -->
{% extends "base.html" %}
{% block title %}Verify Email - Gondulf{% endblock %}
{% block content %}
<h1>Verify Your Email</h1>
<p>A verification code has been sent to <strong>{{ masked_email }}</strong></p>
<p>Please enter the 6-digit code to complete verification:</p>
{% if error %}
<p class="error">{{ error }}</p>
{% endif %}
<form method="POST" action="/verify/email">
<input type="hidden" name="domain" value="{{ domain }}">
<input type="text" name="code" placeholder="000000" maxlength="6" required autofocus>
<button type="submit">Verify</button>
</form>
{% endblock %}
```
### FastAPI Integration
```python
from fastapi import FastAPI, Request
from fastapi.templating import Jinja2Templates
templates = Jinja2Templates(directory="src/gondulf/templates")
@app.get("/verify/email")
async def verify_email_page(request: Request, domain: str):
masked = mask_email(discovered_email)
return templates.TemplateResponse("verify_email.html", {
"request": request,
"domain": domain,
"masked_email": masked
})
```
**Dependencies**:
- Add to `pyproject.toml`: `jinja2 = "^3.1.0"`
## 4. Database Migration Timing
### Approach
Apply migration 002 immediately as part of Phase 2 setup.
### Execution Order
1. Developer runs migration: `alembic upgrade head`
2. Migration 002 adds `two_factor` column with default value `false`
3. All Phase 2 code assumes column exists
4. New domains inserted with explicit `two_factor` value
### Migration File (if not already created)
```python
# migrations/versions/002_add_two_factor_column.py
"""Add two_factor column to domains table
Revision ID: 002
Revises: 001
Create Date: 2024-11-20
"""
from alembic import op
import sqlalchemy as sa
def upgrade():
op.add_column('domains',
sa.Column('two_factor', sa.Boolean(), nullable=False, server_default='false')
)
def downgrade():
op.drop_column('domains', 'two_factor')
```
**Rationale**:
- Keep database schema current with code expectations
- No conditional logic needed in Phase 2 code
- Clean separation: migration handles existing data, new code uses new schema
## 5. Client Validation Helper Functions
### Approach
Standalone utility functions in shared module.
### Module Structure
```python
# src/gondulf/utils/validation.py
"""Client validation and utility functions."""
from urllib.parse import urlparse
import re
def mask_email(email: str) -> str:
"""
Mask email for display: user@example.com -> u***@example.com
Args:
email: Email address to mask
Returns:
Masked email string
"""
if '@' not in email:
return email
local, domain = email.split('@', 1)
if len(local) <= 1:
return email
masked_local = local[0] + '***'
return f"{masked_local}@{domain}"
def normalize_client_id(client_id: str) -> str:
"""
Normalize client_id URL to canonical form.
Rules:
- Ensure https:// scheme
- Remove default port (443)
- Preserve path
Args:
client_id: Client ID URL
Returns:
Normalized client_id
"""
parsed = urlparse(client_id)
# Ensure https
if parsed.scheme != 'https':
raise ValueError("client_id must use https scheme")
# Remove default HTTPS port
netloc = parsed.netloc
if netloc.endswith(':443'):
netloc = netloc[:-4]
# Reconstruct
normalized = f"https://{netloc}{parsed.path}"
if parsed.query:
normalized += f"?{parsed.query}"
if parsed.fragment:
normalized += f"#{parsed.fragment}"
return normalized
def validate_redirect_uri(redirect_uri: str, client_id: str) -> bool:
"""
Validate redirect_uri against client_id per IndieAuth spec.
Rules:
- Must use https scheme (except localhost)
- Must share same origin as client_id OR
- Must be subdomain of client_id domain
Args:
redirect_uri: Redirect URI to validate
client_id: Client ID for comparison
Returns:
True if valid, False otherwise
"""
try:
redirect_parsed = urlparse(redirect_uri)
client_parsed = urlparse(client_id)
# Check scheme (allow http for localhost only)
if redirect_parsed.scheme != 'https':
if redirect_parsed.hostname not in ('localhost', '127.0.0.1'):
return False
# Same origin check
if (redirect_parsed.scheme == client_parsed.scheme and
redirect_parsed.netloc == client_parsed.netloc):
return True
# Subdomain check
redirect_host = redirect_parsed.hostname or ''
client_host = client_parsed.hostname or ''
# Must end with .{client_host}
if redirect_host.endswith(f".{client_host}"):
return True
return False
except Exception:
return False
```
**Usage**:
```python
from gondulf.utils.validation import mask_email, validate_redirect_uri, normalize_client_id
# In verification endpoint
masked = mask_email(discovered_email)
# In authorization endpoint
normalized_client = normalize_client_id(client_id)
if not validate_redirect_uri(redirect_uri, normalized_client):
return error_response("invalid_redirect_uri")
```
## 6. Error Response Format Consistency
### Approach
Use format appropriate to endpoint type.
### Format Rules by Endpoint Type
**Verification Endpoints** (`/verify/email`, `/verify/totp`):
```python
# Always return 200 OK with JSON
return JSONResponse(
status_code=200,
content={"success": False, "error": "invalid_code"}
)
```
**Authorization Endpoint - Pre-Client Validation**:
```python
# Return HTML error page if client_id not yet validated
return templates.TemplateResponse("error.html", {
"request": request,
"error": "Missing required parameter: client_id",
"error_code": "invalid_request"
}, status_code=400)
```
**Authorization Endpoint - Post-Client Validation**:
```python
# Return OAuth redirect with error parameter
from urllib.parse import urlencode
error_params = {
"error": "invalid_request",
"error_description": "Missing code_challenge parameter",
"state": request.query_params.get("state", "")
}
redirect_url = f"{redirect_uri}?{urlencode(error_params)}"
return RedirectResponse(url=redirect_url, status_code=302)
```
**Token Endpoint** (Phase 3):
```python
# Always return JSON with appropriate status code
return JSONResponse(
status_code=400,
content={
"error": "invalid_grant",
"error_description": "Authorization code has expired"
}
)
```
### Error Flow Decision Tree
```
Is this a verification endpoint?
YES -> Return JSON (200 OK) with success:false
NO -> Continue
Has client_id been validated yet?
NO -> Return HTML error page
YES -> Continue
Is redirect_uri valid?
NO -> Return HTML error page (can't redirect safely)
YES -> Return OAuth redirect with error
```
## 7. Dependency Injection Pattern
### Approach
Singleton services instantiated at startup in `dependencies.py`.
### Implementation Structure
**Dependencies Module**:
```python
# src/gondulf/dependencies.py
"""FastAPI dependency injection for services."""
from functools import lru_cache
from gondulf.config import get_config
from gondulf.database import DatabaseService
from gondulf.code_storage import CodeStorage
from gondulf.email_service import EmailService
from gondulf.dns_service import DNSService
from gondulf.html_fetcher import HTMLFetcherService
from gondulf.relme_parser import RelMeParser
from gondulf.verification_service import DomainVerificationService
from gondulf.rate_limiter import RateLimiter
# Configuration
@lru_cache()
def get_config_singleton():
"""Get singleton configuration instance."""
return get_config()
# Phase 1 Services
@lru_cache()
def get_database():
"""Get singleton database service."""
config = get_config_singleton()
return DatabaseService(config.database_url)
@lru_cache()
def get_code_storage():
"""Get singleton code storage service."""
return CodeStorage()
@lru_cache()
def get_email_service():
"""Get singleton email service."""
config = get_config_singleton()
return EmailService(
smtp_host=config.smtp_host,
smtp_port=config.smtp_port,
smtp_username=config.smtp_username,
smtp_password=config.smtp_password,
from_address=config.smtp_from_address
)
@lru_cache()
def get_dns_service():
"""Get singleton DNS service."""
config = get_config_singleton()
return DNSService(nameservers=config.dns_nameservers)
# Phase 2 Services
@lru_cache()
def get_html_fetcher():
"""Get singleton HTML fetcher service."""
return HTMLFetcherService()
@lru_cache()
def get_relme_parser():
"""Get singleton rel=me parser service."""
return RelMeParser()
@lru_cache()
def get_rate_limiter():
"""Get singleton rate limiter service."""
return RateLimiter(max_attempts=3, window_hours=1)
@lru_cache()
def get_verification_service():
"""Get singleton domain verification service."""
return DomainVerificationService(
dns_service=get_dns_service(),
email_service=get_email_service(),
code_storage=get_code_storage(),
html_fetcher=get_html_fetcher(),
relme_parser=get_relme_parser()
)
```
**Usage in Endpoints**:
```python
from fastapi import Depends
from gondulf.dependencies import get_verification_service, get_rate_limiter
@app.post("/verify/email")
async def verify_email(
domain: str,
code: str,
verification_service: DomainVerificationService = Depends(get_verification_service),
rate_limiter: RateLimiter = Depends(get_rate_limiter)
):
# Use injected services
if not rate_limiter.check_rate_limit(domain):
return {"success": False, "error": "rate_limit_exceeded"}
result = verification_service.verify_email_code(domain, code)
return {"success": result}
```
**Rationale**:
- `@lru_cache()` ensures single instance per function
- Services configured once at startup
- Consistent with Phase 1 pattern
- Simple to test (can override dependencies in tests)
## 8. Test Organization for Authorization Endpoint
### Approach
Separate test files per major endpoint with shared fixtures.
### File Structure
```
tests/
├── conftest.py # Shared fixtures and configuration
├── test_verification_endpoints.py # Email/TOTP verification tests
└── test_authorization_endpoint.py # Authorization flow tests
```
### Shared Fixtures Module
```python
# tests/conftest.py
import pytest
from fastapi.testclient import TestClient
from gondulf.main import app
from gondulf.dependencies import get_database, get_code_storage, get_rate_limiter
@pytest.fixture
def client():
"""FastAPI test client."""
return TestClient(app)
@pytest.fixture
def mock_database():
"""Mock database service for testing."""
# Create in-memory test database
from gondulf.database import DatabaseService
db = DatabaseService("sqlite:///:memory:")
db.initialize()
return db
@pytest.fixture
def mock_code_storage():
"""Mock code storage for testing."""
from gondulf.code_storage import CodeStorage
return CodeStorage()
@pytest.fixture
def mock_rate_limiter():
"""Mock rate limiter with clean state."""
from gondulf.rate_limiter import RateLimiter
return RateLimiter()
@pytest.fixture
def verified_domain(mock_database):
"""Fixture providing a pre-verified domain."""
domain = "example.com"
mock_database.store_verified_domain(
domain=domain,
email="user@example.com",
two_factor=True
)
return domain
@pytest.fixture
def override_dependencies(mock_database, mock_code_storage, mock_rate_limiter):
"""Override FastAPI dependencies with test mocks."""
app.dependency_overrides[get_database] = lambda: mock_database
app.dependency_overrides[get_code_storage] = lambda: mock_code_storage
app.dependency_overrides[get_rate_limiter] = lambda: mock_rate_limiter
yield
app.dependency_overrides.clear()
```
### Verification Endpoints Tests
```python
# tests/test_verification_endpoints.py
import pytest
class TestEmailVerification:
"""Tests for /verify/email endpoint."""
def test_email_verification_success(self, client, override_dependencies):
"""Test successful email verification."""
# Test implementation
pass
def test_email_verification_invalid_code(self, client, override_dependencies):
"""Test email verification with invalid code."""
pass
def test_email_verification_rate_limit(self, client, override_dependencies):
"""Test rate limiting on email verification."""
pass
class TestTOTPVerification:
"""Tests for /verify/totp endpoint (future)."""
pass
```
### Authorization Endpoint Tests
```python
# tests/test_authorization_endpoint.py
import pytest
from urllib.parse import parse_qs, urlparse
class TestAuthorizationEndpoint:
"""Tests for /authorize endpoint."""
def test_authorize_missing_client_id(self, client, override_dependencies):
"""Test authorization with missing client_id parameter."""
response = client.get("/authorize")
assert response.status_code == 400
assert "client_id" in response.text
def test_authorize_invalid_redirect_uri(self, client, override_dependencies):
"""Test authorization with mismatched redirect_uri."""
params = {
"client_id": "https://client.example.com/",
"redirect_uri": "https://evil.com/callback",
"response_type": "code",
"state": "test_state"
}
response = client.get("/authorize", params=params)
assert response.status_code == 400
def test_authorize_success_flow(self, client, override_dependencies, verified_domain):
"""Test complete successful authorization flow."""
# Full flow test with verified domain
params = {
"client_id": "https://client.example.com/",
"redirect_uri": "https://client.example.com/callback",
"response_type": "code",
"state": "test_state",
"code_challenge": "test_challenge",
"code_challenge_method": "S256",
"me": f"https://{verified_domain}/"
}
response = client.get("/authorize", params=params, allow_redirects=False)
assert response.status_code == 302
# Verify redirect contains authorization code
redirect_url = response.headers["location"]
parsed = urlparse(redirect_url)
query_params = parse_qs(parsed.query)
assert "code" in query_params
assert query_params["state"][0] == "test_state"
```
### Test Organization Rules
1. **One test class per major functionality** (email verification, authorization flow)
2. **Test complete flows, not internal methods** (black box testing)
3. **Use shared fixtures** for common setup (verified domains, mock services)
4. **Test both success and error paths**
5. **Test security boundaries** (rate limiting, invalid inputs, unauthorized access)
## Summary
These implementation decisions provide the Developer with unambiguous direction for Phase 2 implementation. All decisions prioritize simplicity while maintaining security and specification compliance.
**Key Principles Applied**:
- Real implementations over stubs (rate limiting, validation)
- Reuse existing infrastructure (CodeStorage, dependency pattern)
- Standard tools over custom solutions (Jinja2 templates)
- Simple data structures (epoch integers, dictionaries)
- Clear separation of concerns (utility functions, test organization)
**Next Steps for Developer**:
1. Review this guide alongside Phase 2 design document
2. Implement in the order specified by Phase 2 design
3. Follow patterns and structures defined here
4. Ask clarification questions if any ambiguity remains before implementation
All architectural decisions are now documented and ready for implementation.

View File

@@ -568,9 +568,86 @@ These features are REQUIRED for the first production-ready release.
Technical debt items are tracked here with a DEBT: prefix. Per project standards, each release must allocate at least 10% of effort to technical debt reduction. Technical debt items are tracked here with a DEBT: prefix. Per project standards, each release must allocate at least 10% of effort to technical debt reduction.
### DEBT: Add Redis for session storage (M) ### DEBT: TD-001 - FastAPI Lifespan Migration (XS)
**Created**: 2025-11-20 (Phase 1 review)
**Priority**: P2
**Component**: Core Infrastructure
**Issue**: Using deprecated `@app.on_event()` decorators instead of lifespan context manager.
**Impact**:
- Deprecation warnings in FastAPI 0.109+
- Will break in future FastAPI version
- Not following current best practices
**Current Mitigation**: Still works in current FastAPI version.
**Effort to Fix**: < 1 day
- Replace `@app.on_event("startup")` with lifespan context manager
- Update database initialization to use lifespan
- Update tests if needed
**Plan**: Address in v1.1.0 or during FastAPI upgrade.
**References**: FastAPI lifespan documentation
---
### DEBT: TD-002 - Database Migration Rollback Safety (S)
**Created**: 2025-11-20 (Phase 1 review)
**Priority**: P2
**Component**: Database Layer
**Issue**: No migration rollback capability. Migrations are one-way only.
**Impact**:
- Cannot easily roll back schema changes
- Requires manual SQL to undo migrations
- Risk during production deployments
**Current Mitigation**: Simple schema, manual SQL backups acceptable for v1.0.0.
**Effort to Fix**: 1-2 days
- Integrate Alembic for migration management
- Create rollback scripts for existing migrations
- Update deployment documentation
**Plan**: Address before v1.1.0 when schema changes become more frequent.
**References**: Alembic documentation
---
### DEBT: TD-003 - Async Email Support (S)
**Created**: 2025-11-20 (Phase 1 review)
**Priority**: P2
**Component**: Email Service
**Issue**: Synchronous SMTP blocks request thread during email sending.
**Impact**:
- Email sending delays response to user (1-5 seconds)
- Thread blocked during SMTP operation
- Poor UX during slow email delivery
**Current Mitigation**: Acceptable for low-volume v1.0.0. Timeout limits (10s) prevent long blocks.
**Effort to Fix**: 1-2 days
- Implement background task queue (FastAPI BackgroundTasks or Celery)
- Make email sending non-blocking
- Update UX to show "Sending email..." message
- Add retry logic for failed sends
**Plan**: Address in v1.1.0 when user volume increases or when UX feedback indicates issue.
**Alternative**: Use async SMTP library (aiosmtplib)
---
### DEBT: TD-004 - Add Redis for Session Storage (M)
**Created**: 2025-11-20 (architectural decision) **Created**: 2025-11-20 (architectural decision)
**Priority**: P2 **Priority**: P2
**Component**: Storage Layer
**Issue**: In-memory storage doesn't survive restarts. **Issue**: In-memory storage doesn't survive restarts.
@@ -584,22 +661,6 @@ Technical debt items are tracked here with a DEBT: prefix. Per project standards
--- ---
### DEBT: Implement schema migrations (S)
**Created**: 2025-11-20 (architectural decision)
**Priority**: P2
**Issue**: No formal migration system, using raw SQL files.
**Impact**: Schema changes require manual intervention.
**Mitigation (current)**: Simple schema, infrequent changes acceptable for v1.0.0.
**Effort to Fix**: 1-2 days (Alembic integration)
**Plan**: Address before v1.1.0 when schema changes become more frequent.
---
## Backlog Management ## Backlog Management
### Adding New Features ### Adding New Features