From 6f06aebf40b5513f6ab7cc94c0d26e936875e68e Mon Sep 17 00:00:00 2001 From: Phil Skentelbery Date: Thu, 20 Nov 2025 13:05:09 -0700 Subject: [PATCH] docs: add Phase 2 domain verification design and clarifications Add comprehensive Phase 2 documentation: - Complete design document for two-factor domain verification - Implementation guide with code examples - ADR for implementation decisions (ADR-0004) - ADR for rel="me" email discovery (ADR-008) - Phase 1 impact assessment - All 23 clarification questions answered - Updated architecture docs (indieauth-protocol, security) - Updated ADR-005 with rel="me" approach - Updated backlog with technical debt items Design ready for Phase 2 implementation. Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude --- docs/CLARIFICATIONS-ANSWERED.md | 135 + docs/architecture/indieauth-protocol.md | 58 +- .../architecture/phase-1-impact-assessment.md | 809 ++++++ docs/architecture/security.md | 271 +- .../0004-phase-2-implementation-decisions.md | 98 + ...R-005-email-based-authentication-v1-0-0.md | 735 +++-- .../ADR-008-rel-me-email-discovery.md | 516 ++++ docs/designs/phase-2-domain-verification.md | 2559 +++++++++++++++++ docs/designs/phase-2-implementation-guide.md | 739 +++++ docs/roadmap/backlog.md | 95 +- 10 files changed, 5605 insertions(+), 410 deletions(-) create mode 100644 docs/CLARIFICATIONS-ANSWERED.md create mode 100644 docs/architecture/phase-1-impact-assessment.md create mode 100644 docs/decisions/0004-phase-2-implementation-decisions.md create mode 100644 docs/decisions/ADR-008-rel-me-email-discovery.md create mode 100644 docs/designs/phase-2-domain-verification.md create mode 100644 docs/designs/phase-2-implementation-guide.md diff --git a/docs/CLARIFICATIONS-ANSWERED.md b/docs/CLARIFICATIONS-ANSWERED.md new file mode 100644 index 0000000..c584454 --- /dev/null +++ b/docs/CLARIFICATIONS-ANSWERED.md @@ -0,0 +1,135 @@ +# Phase 2 Clarifications - Architect's Responses + +**Date**: 2024-11-20 +**Status**: All 23 questions answered +**Developer Action**: Proceed with implementation + +## Overview + +The Architect has provided complete answers to all 8 categories (23 specific questions) raised by the Developer. This document provides a quick reference to the decisions made. + +**Full Details**: See `/docs/designs/phase-2-implementation-guide.md` for complete implementation specifications. + +**Architectural Decision Record**: See `/docs/decisions/0004-phase-2-implementation-decisions.md` for rationale and consequences. + +## Quick Reference Answers + +### 1. Rate Limiting Implementation + +**Q: Should actual rate limiting be implemented or leave as stubs?** +- A: Implement actual rate limiting with in-memory storage + +**Q: Should metadata storage use CodeStorage?** +- A: No, use simple dictionary in RateLimiter service instance + +**Q: Should "Max 3 codes per domain per hour" be implemented?** +- A: Yes, with timestamp list tracking and automatic cleanup + +### 2. Authorization Code Metadata Structure + +**Q: Should storage include 'used' field in Phase 2?** +- A: Yes, include now (set to False, consume in Phase 3) + +**Q: Use Phase 1's CodeStorage or separate storage?** +- A: Reuse Phase 1's CodeStorage with key prefix `authz:` + +**Q: Store datetime objects or epoch integers?** +- A: Epoch integers (simpler) + +### 3. HTML Template Implementation + +**Q: Use Jinja2 or plain Python f-strings?** +- A: Use Jinja2 templates + +**Q: Where should template files be located?** +- A: `src/gondulf/templates/` + +**Q: Reusable layout templates or self-contained?** +- A: Reusable `base.html` with template inheritance + +**Q: Template files vs inline HTML?** +- A: Separate template files + +### 4. Database Migration Timing + +**Q: Apply migration 002 as part of Phase 2?** +- A: Yes, apply immediately before Phase 2 implementation + +**Q: Is migration necessary since Phase 1 doesn't write to domains?** +- A: Yes, keeps schema current with code expectations + +**Q: Should new code use 'two_factor' immediately?** +- A: Yes, assume column exists (migration handles it) + +### 5. Client Validation Helper Functions + +**Q: Implement as standalone functions or methods on helper class?** +- A: Standalone functions in `src/gondulf/utils/validation.py` + +**Q: Create shared utility module?** +- A: Yes, `gondulf.utils.validation` module + +**Q: Full subdomain validation now or stub for Phase 3?** +- A: Full validation now (security should be complete) + +### 6. Error Response Format Consistency + +**Q: Should verification endpoints return JSON (200 OK with success:false)?** +- A: Yes, always JSON with 200 OK + +**Q: Should authorization endpoint errors return HTML or redirects?** +- A: Depends on validation stage: + - Pre-client validation: HTML error page + - Post-client validation: OAuth redirect with error + +**Q: When to use HTML vs OAuth redirect errors?** +- A: See decision tree in implementation guide + +### 7. Dependency Injection Pattern + +**Q: Create dependencies.py module?** +- A: Yes, `src/gondulf/dependencies.py` + +**Q: Services instantiated at startup (singleton) or per-request?** +- A: Singleton at startup using `@lru_cache()` + +**Q: Configuration passed at instantiation or read each time?** +- A: Read at instantiation (services configured once) + +### 8. Test Organization for Authorization Endpoint + +**Q: Separate test files per router?** +- A: Yes: + - `test_verification_endpoints.py` + - `test_authorization_endpoint.py` + +**Q: Test sub-endpoints separately or as part of full flow?** +- A: Test complete flows (black box testing) + +**Q: Shared fixtures for common scenarios?** +- A: Yes, use `tests/conftest.py` for shared fixtures + +## Implementation Priority + +All decisions are final and ready for implementation. The Developer should: + +1. **Read** `/docs/designs/phase-2-implementation-guide.md` thoroughly +2. **Review** code examples and patterns provided +3. **Apply** migration 002 before starting implementation +4. **Implement** following the exact patterns specified +5. **Ask** additional questions ONLY if new ambiguities arise + +## Architect's Guiding Principles + +Every decision made reflects these core values: +- **Simplicity**: Real implementations using simple patterns +- **Reuse**: Leverage Phase 1 infrastructure where possible +- **Standards**: Use established tools (Jinja2, FastAPI patterns) +- **Clarity**: Explicit structures over implicit behavior +- **Security**: Complete security features, not stubs + +## Status + +**DESIGN READY: Phase 2 Implementation - All clarifications resolved** + +Developer: Please proceed with implementation following the patterns in the implementation guide. diff --git a/docs/architecture/indieauth-protocol.md b/docs/architecture/indieauth-protocol.md index 5767ecc..d346c31 100644 --- a/docs/architecture/indieauth-protocol.md +++ b/docs/architecture/indieauth-protocol.md @@ -162,26 +162,34 @@ Accept: text/html - Reject non-200 responses - Log client_id fetch failures -#### Authentication Flow (v1.0.0: Email-based) +#### Authentication Flow (v1.0.0: Two-Factor Domain Verification) -1. **Domain Ownership Check** - - Check if `me` domain has verified TXT record: `_gondulf.example.com` = `verified` - - If found and cached, skip email verification - - If not found, proceed to email verification +1. **DNS TXT Record Verification (Required)** + - Check if `me` domain has TXT record: `_gondulf.{domain}` = `verified` + - Query multiple DNS resolvers (Google 8.8.8.8, Cloudflare 1.1.1.1) + - Require consensus from at least 2 resolvers + - If not found: Display error with instructions to add TXT record + - If found: Proceed to email discovery + - Proves: User controls DNS for the domain -2. **Email Verification** - - Display form requesting email address - - Validate email is at `me` domain (e.g., `admin@example.com` for `https://example.com`) +2. **Email Discovery via rel="me" (Required)** + - Fetch user's domain homepage (e.g., https://example.com) + - Parse HTML for `` or `` + - Extract email address from first matching mailto: link + - If not found: Display error with instructions to add rel="me" link + - If found: Proceed to email verification + - Proves: User has published email relationship on their site + - Reference: https://indieweb.org/rel-me + +3. **Email Verification Code (Required)** - Generate 6-digit verification code (cryptographically random) - Store code in memory with 15-minute TTL - - Send code via SMTP - - Display code entry form - -3. **Code Verification** + - Send code to discovered email address via SMTP + - Display code entry form showing discovered email (partially masked) - User enters 6-digit code - - Validate code matches and hasn't expired + - Validate code matches and hasn't expired (max 3 attempts) + - Proves: User controls the email account - Mark domain as verified (store in database) - - Proceed to authorization 4. **User Consent** - Display authorization prompt: @@ -208,6 +216,8 @@ Accept: text/html Location: {redirect_uri}?code={code}&state={state} ``` +**Security Model**: Two-factor verification requires BOTH DNS control AND email control. An attacker would need to compromise both to authenticate fraudulently. + #### Error Responses Return error via redirect when possible: @@ -404,18 +414,19 @@ Future implementation per RFC 7009. ```python { - "email": "admin@example.com", + "email": "admin@example.com", # Discovered from rel="me", not user-provided "code": "123456", # 6-digit string "domain": "example.com", "created_at": datetime, "expires_at": datetime, # created_at + 15 minutes - "attempts": 0 # Rate limiting + "attempts": 0 # Rate limiting (max 3 attempts) } ``` **Storage**: Python dict with TTL management +**Email Source**: Discovered from site's rel="me" link (not user input) **Expiration**: 15 minutes -**Rate Limiting**: Max 3 attempts per email +**Rate Limiting**: Max 3 attempts per email, max 3 codes per domain per hour **Cleanup**: Automatic expiration via TTL ### Access Token (SQLite) @@ -448,18 +459,21 @@ CREATE TABLE tokens ( CREATE TABLE domains ( id INTEGER PRIMARY KEY AUTOINCREMENT, domain TEXT NOT NULL UNIQUE, - verification_method TEXT NOT NULL, -- 'txt_record' or 'email' + verification_method TEXT NOT NULL, -- 'two_factor' (DNS + Email) verified_at TIMESTAMP NOT NULL, - last_checked TIMESTAMP, - txt_record_valid BOOLEAN DEFAULT 0, + last_dns_check TIMESTAMP, + dns_txt_valid BOOLEAN DEFAULT 0, + last_email_check TIMESTAMP, INDEX idx_domain (domain) ); ``` **Purpose**: Cache domain ownership verification -**TXT Record**: Re-verified periodically (daily) -**Email Verification**: Permanent unless admin deletes +**Verification Method**: Always 'two_factor' in v1.0.0 (DNS TXT + Email via rel="me") +**DNS TXT**: Re-verified periodically (daily check) +**Email**: NOT stored (only verification timestamp recorded) +**Re-verification**: DNS checked periodically, email re-verified on each login **Cleanup**: Optional (admin decision) ## Security Considerations diff --git a/docs/architecture/phase-1-impact-assessment.md b/docs/architecture/phase-1-impact-assessment.md new file mode 100644 index 0000000..64db09e --- /dev/null +++ b/docs/architecture/phase-1-impact-assessment.md @@ -0,0 +1,809 @@ +# Phase 1 Impact Assessment: Authentication Flow Change + +**Date**: 2025-11-20 +**Architect**: Claude (Architect Agent) +**Related ADRs**: ADR-005 (updated), ADR-008 (new) +**Related Report**: /docs/reports/2025-11-20-phase-1-foundation.md + +## Summary + +The authentication design has been updated to require BOTH DNS TXT verification AND email verification via rel="me" discovery. This change impacts Phase 1 implementation and defines new requirements for Phase 2. + +## Authentication Flow Change + +### Original Design (ADR-005 v1) +- **Primary**: Email verification (user provides email) +- **Optional**: DNS TXT verification (fast-path to skip email) +- **Flow**: DNS check → if not found, request email → send code → verify code + +### Updated Design (ADR-005 v2 + ADR-008) +- **Required Factor 1**: DNS TXT verification (`_gondulf.{domain}` = `verified`) +- **Required Factor 2**: Email verification via rel="me" discovery +- **Flow**: DNS check → rel="me" discovery → send code to discovered email → verify code + +### Key Differences + +| Aspect | Original | Updated | +|--------|----------|---------| +| DNS TXT | Optional (fast-path) | Required (first factor) | +| Email Discovery | User input | rel="me" link parsing | +| Email Verification | Optional (fallback) | Required (second factor) | +| Security Model | Single-factor | Two-factor | +| Attack Resistance | Moderate | High (requires DNS + email control) | +| Setup Complexity | Lower (email only works) | Higher (both required) | + +## Phase 1 Implementation Impact + +### What Phase 1 Implemented + +Phase 1 successfully implemented: +- ✅ Configuration management (GONDULF_* environment variables) +- ✅ Database layer with migrations (SQLite, SQLAlchemy Core) +- ✅ In-memory code storage (TTL-based expiration) +- ✅ Email service (SMTP with STARTTLS support) +- ✅ DNS service (TXT record querying with fallback resolvers) +- ✅ Structured logging +- ✅ FastAPI application with health check endpoint +- ✅ 94.16% test coverage (96 tests passing) + +### Does Phase 1 Need Changes? + +**Answer: NO. Phase 1 implementation remains valid.** + +#### Analysis + +**Email Service** (`src/gondulf/email.py`): +- Current: Generic email sending service +- Change Impact: **None** +- Reason: Email service sends codes to any email address. Whether email is user-provided or rel="me"-discovered doesn't affect this service. +- Status: **No changes needed** + +**DNS Service** (`src/gondulf/dns.py`): +- Current: TXT record verification with fallback resolvers +- Change Impact: **None** +- Reason: DNS service already implements TXT record verification as designed. Changing from "optional" to "required" is a business logic change, not a DNS service change. +- Status: **No changes needed** + +**In-Memory Storage** (`src/gondulf/storage.py`): +- Current: TTL-based code storage +- Change Impact: **None** +- Reason: Storage mechanism is independent of how email is discovered or whether DNS is optional/required. +- Status: **No changes needed** + +**Database Schema** (`001_initial_schema.sql`): +- Current: `domains` table with `domain`, `verification_method`, `verified_at` +- Change Impact: **Minor update needed in Phase 2** +- Reason: Schema already supports storing verification method. Will need to update from `'txt_record'` or `'email'` to `'two_factor'` when storing records. +- Status: **Schema structure OK, values will change in Phase 2** + +**Configuration** (`src/gondulf/config.py`): +- Current: SMTP configuration, DNS configuration, timeouts +- Change Impact: **None immediately, optional addition in Phase 2** +- Reason: Current configuration supports both email and DNS. May want to add timeout for HTML fetching in Phase 2. +- Status: **No changes needed now** + +### Phase 1 Status: APPROVED + +Phase 1 implementation remains valid and does NOT require any revisions due to the authentication flow change. All Phase 1 components are foundational services that work regardless of how they're orchestrated in the authentication flow. + +## Phase 2 Requirements: New Implementation Needs + +Phase 2 must now implement the updated authentication flow. Here's what needs to be built: + +### 1. HTML Fetching Service (NEW) + +**Purpose**: Fetch user's homepage to discover rel="me" links + +**Implementation**: +```python +# src/gondulf/html_fetcher.py + +import requests +from typing import Optional + +class HTMLFetcherService: + """ + Fetch user's homepage over HTTPS. + """ + def __init__(self, timeout: int = 10): + self.timeout = timeout + self.max_redirects = 5 + self.max_size = 5 * 1024 * 1024 # 5MB + + def fetch_site(self, domain: str) -> Optional[str]: + """ + Fetch site HTML content. + + Args: + domain: Domain to fetch (e.g., "example.com") + + Returns: + HTML content as string, or None if fetch fails + """ + url = f"https://{domain}" + + try: + response = requests.get( + url, + timeout=self.timeout, + allow_redirects=True, + max_redirects=self.max_redirects, + verify=True # Enforce SSL verification + ) + response.raise_for_status() + + # Check content size + if len(response.content) > self.max_size: + raise ValueError(f"Response too large: {len(response.content)} bytes") + + return response.text + + except requests.exceptions.SSLError as e: + logger.error(f"SSL verification failed for {domain}: {e}") + return None + except requests.exceptions.Timeout: + logger.error(f"Timeout fetching {domain}") + return None + except Exception as e: + logger.error(f"Failed to fetch {domain}: {e}") + return None +``` + +**Dependencies**: +- `requests` library (already in pyproject.toml) +- Timeout configuration (add to Config if needed) + +**Tests Required**: +- Successful HTTPS fetch +- SSL verification failure +- Timeout handling +- HTTP error codes (404, 500, etc.) +- Redirect following +- Size limit enforcement + +--- + +### 2. rel="me" Email Discovery Service (NEW) + +**Purpose**: Parse HTML to discover email from rel="me" links + +**Implementation**: +```python +# src/gondulf/relme.py + +from bs4 import BeautifulSoup +from typing import Optional +import re + +class RelMeDiscoveryService: + """ + Discover email addresses from rel="me" links in HTML. + """ + + def discover_email(self, html_content: str) -> Optional[str]: + """ + Parse HTML and discover email from rel="me" link. + + Args: + html_content: HTML content as string + + Returns: + Email address or None if not found + """ + try: + # Parse HTML (BeautifulSoup handles malformed HTML) + soup = BeautifulSoup(html_content, 'html.parser') + + # Find all rel="me" links ( and tags) + me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + + # Look for mailto: links + for link in me_links: + href = link.get('href', '') + if href.startswith('mailto:'): + email = href.replace('mailto:', '').strip() + + # Validate email format + if self._validate_email_format(email): + logger.info(f"Discovered email via rel='me': {email[:3]}***") + return email + + logger.warning("No rel='me' mailto: link found in HTML") + return None + + except Exception as e: + logger.error(f"Failed to parse HTML: {e}") + return None + + def _validate_email_format(self, email: str) -> bool: + """Validate email address format (RFC 5322 simplified).""" + email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' + + if not re.match(email_regex, email): + return False + + if len(email) > 254: # RFC 5321 maximum + return False + + if email.count('@') != 1: + return False + + return True +``` + +**Dependencies**: +- `beautifulsoup4` library (add to pyproject.toml) +- `html.parser` (Python standard library) + +**Tests Required**: +- Discovery from `` tags +- Discovery from `` tags +- Multiple rel="me" links (select first mailto) +- Malformed HTML handling +- Missing rel="me" links +- Invalid email format in link +- Edge cases (empty href, non-mailto links, etc.) + +--- + +### 3. Domain Verification Service (UPDATED) + +**Purpose**: Orchestrate two-factor verification (DNS + Email) + +**Implementation**: +```python +# src/gondulf/domain_verification.py + +from typing import Tuple, Optional +from .dns import DNSService +from .html_fetcher import HTMLFetcherService +from .relme import RelMeDiscoveryService +from .email import EmailService +from .storage import CodeStorage + +class DomainVerificationService: + """ + Two-factor domain verification service. + + Verifies domain ownership through: + 1. DNS TXT record verification + 2. Email verification via rel="me" discovery + """ + + def __init__( + self, + dns_service: DNSService, + html_fetcher: HTMLFetcherService, + relme_discovery: RelMeDiscoveryService, + email_service: EmailService, + code_storage: CodeStorage + ): + self.dns = dns_service + self.html_fetcher = html_fetcher + self.relme = relme_discovery + self.email = email_service + self.code_storage = code_storage + + def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]: + """ + Start domain verification process. + + Returns: (success, discovered_email, error_message) + + Raises HTTPException with appropriate error if verification cannot start. + """ + # Step 1: Verify DNS TXT record + dns_verified = self.dns.verify_txt_record(domain, "verified") + if not dns_verified: + error = f"DNS TXT record not found for {domain}. Please add: _gondulf.{domain} TXT verified" + return False, None, error + + # Step 2: Fetch site and discover email + html = self.html_fetcher.fetch_site(domain) + if html is None: + error = f"Could not fetch site at https://{domain}. Please ensure site is accessible via HTTPS." + return False, None, error + + # Step 3: Discover email from rel="me" + email = self.relme.discover_email(html) + if email is None: + error = 'No rel="me" mailto: link found. Please add: ' + return False, None, error + + # Step 4: Generate and send verification code + code = self._generate_code() + self.code_storage.store(email, code, ttl=900) # 15 minutes + + email_sent = self.email.send_verification_email(email, code) + if not email_sent: + error = f"Failed to send verification email to {email}. Please try again." + return False, email, error + + # Success: code sent to discovered email + return True, email, None + + def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, str]: + """ + Verify submitted code. + + Returns: (success, domain_or_error_message) + """ + stored_data = self.code_storage.get(email) + + if stored_data is None: + return False, "No verification code found. Please restart verification." + + code, domain = stored_data + + # Verify code (constant-time comparison) + if not secrets.compare_digest(submitted_code, code): + return False, "Invalid code. Please try again." + + # Success: mark code as used + self.code_storage.delete(email) + + return True, domain + + def _generate_code(self) -> str: + """Generate 6-digit verification code.""" + return ''.join(secrets.choice('0123456789') for _ in range(6)) +``` + +**Dependencies**: +- All Phase 1 services (DNS, Email, Storage) +- New HTML fetcher service +- New rel="me" discovery service + +**Tests Required**: +- Full verification flow (DNS → rel="me" → email → code) +- DNS verification failure +- Site fetch failure +- rel="me" discovery failure +- Email send failure +- Code verification success/failure +- Multiple attempts tracking +- Code expiration + +--- + +### 4. Domain Verification UI Endpoints (NEW) + +**Purpose**: HTTP endpoints for user interaction + +**Implementation**: +```python +# src/gondulf/routers/verification.py + +from fastapi import APIRouter, HTTPException +from pydantic import BaseModel + +router = APIRouter(prefix="/verify", tags=["verification"]) + +class VerificationStartRequest(BaseModel): + domain: str + +class VerificationStartResponse(BaseModel): + success: bool + email_masked: Optional[str] # e.g., "u***@example.com" + error: Optional[str] + +class VerificationCodeRequest(BaseModel): + email: str + code: str + +class VerificationCodeResponse(BaseModel): + success: bool + domain: Optional[str] + error: Optional[str] + +@router.post("/start", response_model=VerificationStartResponse) +async def start_verification(request: VerificationStartRequest): + """ + Start domain verification process. + + Steps: + 1. Verify DNS TXT record + 2. Discover email from rel="me" + 3. Send verification code to email + """ + success, email, error = domain_verification_service.start_verification(request.domain) + + if not success: + return VerificationStartResponse(success=False, email_masked=None, error=error) + + # Mask email for display: u***@example.com + masked_email = f"{email[0]}***@{email.split('@')[1]}" + + return VerificationStartResponse( + success=True, + email_masked=masked_email, + error=None + ) + +@router.post("/code", response_model=VerificationCodeResponse) +async def verify_code(request: VerificationCodeRequest): + """ + Verify submitted code. + + Returns domain if code is valid. + """ + success, result = domain_verification_service.verify_code(request.email, request.code) + + if not success: + return VerificationCodeResponse(success=False, domain=None, error=result) + + return VerificationCodeResponse(success=True, domain=result, error=None) +``` + +**Dependencies**: +- FastAPI router +- Pydantic models +- Domain verification service + +**Tests Required**: +- POST /verify/start success case +- POST /verify/start with DNS failure +- POST /verify/start with rel="me" failure +- POST /verify/start with email send failure +- POST /verify/code success case +- POST /verify/code with invalid code +- POST /verify/code with expired code +- POST /verify/code with missing code + +--- + +### 5. Authorization Endpoint Integration (UPDATED) + +**Changes to Authorization Flow**: + +**Before** (original design): +``` +1. User enters domain (me parameter) +2. Display form: "Enter your email at {domain}" +3. User enters email manually +4. Send code, user enters code +5. Display consent screen +``` + +**After** (updated design): +``` +1. User enters domain (me parameter) +2. Server performs two-factor verification: + a. Verify DNS TXT record + b. Discover email from rel="me" + c. Send code to discovered email +3. Display code entry form (show discovered email masked) +4. User enters code +5. Display consent screen +``` + +**Implementation Changes**: +- Call `DomainVerificationService.start_verification()` instead of requesting email from user +- Update UI to show "Sending code to u***@example.com" instead of email input form +- Handle new error cases (DNS not found, rel="me" not found, site unreachable) + +--- + +## Phase 2 Feature Breakdown + +### New Dependencies to Add + +**pyproject.toml additions**: +```toml +[project] +dependencies = [ + # ... existing dependencies + "beautifulsoup4>=4.12.0", # HTML parsing for rel="me" discovery +] +``` + +### New Source Files + +1. `src/gondulf/html_fetcher.py` - HTML fetching service +2. `src/gondulf/relme.py` - rel="me" email discovery service +3. `src/gondulf/domain_verification.py` - Two-factor verification orchestration +4. `src/gondulf/routers/verification.py` - Verification endpoints (if implemented separately from authorization) + +### Updated Files + +1. `src/gondulf/main.py` - Register new routers, initialize new services +2. `src/gondulf/config.py` - Optional: add HTML fetch timeout config +3. Database migration (002_update_verification_method.sql) - Change domain.verification_method values + +### New Test Files + +1. `tests/unit/test_html_fetcher.py` - HTML fetching tests +2. `tests/unit/test_relme.py` - rel="me" discovery tests +3. `tests/unit/test_domain_verification.py` - Verification service tests +4. `tests/integration/test_verification_endpoints.py` - Verification endpoint tests + +### Estimated Effort + +**New Components**: +- HTML Fetcher Service: 0.5 days +- rel="me" Discovery Service: 0.5 days +- Domain Verification Service: 1 day +- Verification Endpoints: 0.5 days +- Tests (all new components): 1 day + +**Total New Work**: ~3.5 days + +**Authorization Endpoint** (already planned): +- Original estimate: 3-5 days +- Updated estimate: 3-5 days (same - just uses DomainVerificationService) + +## Database Schema Updates + +### Migration: 002_update_verification_method.sql + +```sql +-- Update verification_method values from single-factor to two-factor +-- This is a data migration, not schema change + +UPDATE domains +SET verification_method = 'two_factor' +WHERE verification_method IN ('txt_record', 'email'); + +-- No schema changes needed - 'verification_method' column already exists +``` + +**When to Apply**: Phase 2, before authorization endpoint implementation + +## Error Message Updates + +### DNS TXT Not Found + +``` +DNS Verification Failed + +Please add this TXT record to your domain's DNS: + +Type: TXT +Name: _gondulf.example.com +Value: verified + +DNS changes may take up to 24 hours to propagate. + +Need help? See: https://docs.gondulf.example.com/setup/dns +``` + +### rel="me" Not Found + +``` +Email Discovery Failed + +Could not find a rel="me" email link on your homepage. + +Please add this to your homepage (https://example.com): + + +This declares your email address for IndieAuth verification. + +Learn more: https://indieweb.org/rel-me +``` + +### Site Unreachable + +``` +Site Fetch Failed + +Could not fetch your site at https://example.com + +Please check: +- Site is accessible via HTTPS +- SSL certificate is valid +- No firewall blocking requests + +Try again once your site is accessible. +``` + +### Email Send Failure + +``` +Email Delivery Failed + +Failed to send verification code to u***@example.com + +Please check: +- Email address is correct in your rel="me" link +- Email server is accepting mail +- Check spam/junk folder + +Try again, or contact support if the issue persists. +``` + +## Documentation Updates Needed + +### User Documentation (Phase 2) + +1. **Setup Guide**: `/docs/user/setup.md` + - Step 1: Add DNS TXT record + - Step 2: Add rel="me" link to homepage + - Step 3: Test verification + +2. **Troubleshooting**: `/docs/user/troubleshooting.md` + - DNS verification failures + - rel="me" discovery issues + - Email delivery problems + +3. **Examples**: `/docs/user/examples.md` + - Example HTML with rel="me" link + - Example DNS configuration (various providers) + +### Developer Documentation (Phase 2) + +1. **API Reference**: `/docs/api/verification.md` + - POST /verify/start endpoint + - POST /verify/code endpoint + - Error codes and responses + +2. **Architecture**: `/docs/architecture/domain-verification.md` + - Two-factor verification flow diagram + - Service interaction diagram + - Error handling flowchart + +## Security Considerations for Phase 2 + +### New Attack Surfaces + +1. **HTML Parsing**: + - Risk: Malicious HTML exploiting parser + - Mitigation: BeautifulSoup handles untrusted HTML safely + - Test: Fuzzing with malformed HTML + +2. **HTTPS Fetching**: + - Risk: SSL verification bypass + - Mitigation: Enforce `verify=True` in requests + - Test: Attempt to fetch site with invalid certificate (must fail) + +3. **rel="me" Spoofing**: + - Risk: Attacker adds rel="me" to compromised site + - Mitigation: Two-factor requirement (also need DNS control) + - Test: Verify DNS check happens BEFORE rel="me" discovery + +### Security Testing Required + +1. **Input Validation**: + - Malformed domain names + - Oversized HTML responses (>5MB) + - Invalid email formats in rel="me" links + +2. **TLS Enforcement**: + - Verify HTTPS-only fetching + - Verify SSL certificate validation + - Reject sites with invalid certificates + +3. **Rate Limiting** (future): + - Prevent bulk rel="me" discovery + - Limit verification attempts per domain + +## Configuration Updates + +### Optional New Config + +```python +# src/gondulf/config.py + +class Config: + # ... existing config + + # HTML Fetching (optional, has sensible defaults) + HTML_FETCH_TIMEOUT: int = 10 # seconds + HTML_MAX_SIZE: int = 5 * 1024 * 1024 # 5MB + HTML_MAX_REDIRECTS: int = 5 +``` + +### Environment Variables + +```bash +# .env.example additions (optional) + +# HTML Fetching Configuration (optional - has defaults) +GONDULF_HTML_FETCH_TIMEOUT=10 # Timeout for fetching user's site (seconds) +GONDULF_HTML_MAX_SIZE=5242880 # Maximum HTML size (bytes, default 5MB) +GONDULF_HTML_MAX_REDIRECTS=5 # Maximum redirects to follow +``` + +## Testing Strategy for Phase 2 + +### Unit Tests + +**HTML Fetcher**: +- Mock successful HTTPS response +- Mock SSL verification failure +- Mock timeout +- Mock HTTP errors (404, 500, etc.) +- Mock size limit exceeded +- Mock redirect following + +**rel="me" Discovery**: +- Parse `` +- Parse `` +- Handle malformed HTML +- Handle missing rel="me" links +- Handle invalid email in link +- Handle multiple rel="me" links (select first) + +**Domain Verification Service**: +- Full two-factor flow success +- DNS verification failure +- Site fetch failure +- rel="me" discovery failure +- Email send failure +- Code verification success/failure + +### Integration Tests + +**Verification Endpoints**: +- POST /verify/start with valid domain (mock services) +- POST /verify/start with DNS failure +- POST /verify/start with rel="me" failure +- POST /verify/code with valid code +- POST /verify/code with invalid code + +### End-to-End Tests (Future) + +- Complete verification flow with real HTML +- Authorization flow integration +- Token issuance after successful verification + +## Acceptance Criteria for Phase 2 + +Phase 2 will be considered complete when: + +1. ✅ HTML fetcher service implemented and tested +2. ✅ rel="me" discovery service implemented and tested +3. ✅ Domain verification service orchestrates two-factor verification +4. ✅ Verification endpoints return correct responses for all cases +5. ✅ Error messages are clear and actionable +6. ✅ All new tests passing (unit + integration) +7. ✅ Test coverage remains >80% overall +8. ✅ Security testing complete (HTML parsing, TLS enforcement) +9. ✅ Documentation updated (user setup guide, API reference) +10. ✅ Database migration applied successfully + +## Timeline Estimate + +**Phase 2 Components**: +- HTML Fetcher: 0.5 days +- rel="me" Discovery: 0.5 days +- Domain Verification Service: 1 day +- Verification Endpoints: 0.5 days +- Testing: 1 day +- Documentation: 0.5 days + +**Total New Work**: ~4 days + +**Authorization Endpoint** (already planned): +- Original estimate: 3-5 days +- Updated estimate: 3-5 days (uses DomainVerificationService) + +**Phase 2 Total**: ~7-9 days (vs. original estimate of 3-5 days) + +**Impact**: +4 days of work due to authentication flow change + +## Recommendation + +**Phase 1**: APPROVED as-is. No changes needed. + +**Phase 2**: Proceed with implementation of: +1. HTML fetching service +2. rel="me" discovery service +3. Domain verification service (two-factor orchestration) +4. Verification endpoints +5. Updated authorization endpoint to use domain verification service + +The additional work (HTML fetching + rel="me" discovery) adds ~4 days to Phase 2, bringing total Phase 2 estimate to 7-9 days instead of original 3-5 days. + +## Sign-off + +**Assessment Status**: Complete +**Phase 1 Impact**: None - Phase 1 approved as-is +**Phase 2 Impact**: Additional 4 days of work for new services +**Risk Level**: Low - All new work is well-scoped and testable +**Ready to Proceed**: Yes + +--- + +**Assessment completed**: 2025-11-20 +**Architect**: Claude (Architect Agent) diff --git a/docs/architecture/security.md b/docs/architecture/security.md index f5692a5..0c33ea0 100644 --- a/docs/architecture/security.md +++ b/docs/architecture/security.md @@ -58,108 +58,174 @@ Gondulf follows a defense-in-depth security model with these core principles: ## Authentication Security -### Email-Based Verification (v1.0.0) +### Two-Factor Domain Verification (v1.0.0) -**Mechanism**: Users prove domain ownership by receiving verification code at email address on that domain. +**Mechanism**: Users prove domain ownership through TWO independent factors: +1. **DNS TXT Record**: Proves DNS control (`_gondulf.{domain}` = `verified`) +2. **Email via rel="me"**: Proves email control (discovered from site's rel="me" link) + +**Security Model**: An attacker must compromise BOTH factors to authenticate fraudulently. This is significantly stronger than single-factor verification. #### Threat: Email Interception **Risk**: Attacker intercepts email containing verification code. **Mitigations**: -1. **Short Code Lifetime**: 15-minute expiration -2. **Single Use**: Code invalidated after verification -3. **Rate Limiting**: Max 3 code requests per email per hour -4. **TLS Email Delivery**: Require STARTTLS for SMTP -5. **Display Warning**: "Only request code if you initiated this login" +1. **Two-Factor Requirement**: Email alone is insufficient (DNS also required) +2. **Short Code Lifetime**: 15-minute expiration +3. **Single Use**: Code invalidated after verification +4. **Rate Limiting**: Max 3 code requests per domain per hour +5. **TLS Email Delivery**: Require STARTTLS for SMTP +6. **Display Warning**: "Only request code if you initiated this login" -**Residual Risk**: Acceptable for v1.0.0 given short lifetime and single-use. +**Residual Risk**: Low. Even with email interception, attacker still needs DNS control. #### Threat: Code Brute Force **Risk**: Attacker guesses 6-digit verification code. **Mitigations**: -1. **Sufficient Entropy**: 1,000,000 possible codes (6 digits) -2. **Attempt Limiting**: Max 3 attempts per email -3. **Short Lifetime**: 15-minute window -4. **Rate Limiting**: Max 10 attempts per IP per hour -5. **Exponential Backoff**: 5-second delay after each failed attempt +1. **Two-Factor Requirement**: Code alone is insufficient (DNS also required) +2. **Sufficient Entropy**: 1,000,000 possible codes (6 digits) +3. **Attempt Limiting**: Max 3 attempts per email +4. **Short Lifetime**: 15-minute window +5. **Rate Limiting**: Max 3 codes per domain per hour +6. **Single-Use**: Code invalidated after use **Math**: - 3 attempts × 1,000,000 codes = 0.0003% success probability - 15-minute window limits attack time -- Rate limiting prevents distributed guessing +- Even if guessed, attacker still needs DNS control -**Residual Risk**: Very low, acceptable for v1.0.0. +**Residual Risk**: Very low. Two-factor requirement makes brute force insufficient. + +#### Threat: DNS TXT Record Spoofing + +**Risk**: Attacker attempts to spoof DNS responses. + +**Mitigations**: +1. **Multiple Resolvers**: Query 2+ independent DNS servers (Google, Cloudflare) +2. **Consensus Required**: Require agreement from at least 2 resolvers +3. **DNSSEC Support**: Validate DNSSEC signatures when available (future) +4. **Timeout Handling**: Fail securely if DNS unavailable +5. **Logging**: Log all DNS verification attempts + +**Residual Risk**: Low. Spoofing multiple independent resolvers is difficult. + +#### Threat: rel="me" Link Spoofing + +**Risk**: Attacker compromises user's website to add malicious rel="me" link. + +**Mitigations**: +1. **Two-Factor Requirement**: Website compromise alone insufficient (DNS also required) +2. **HTTPS Required**: Fetch site over TLS (prevents MITM) +3. **Certificate Validation**: Verify SSL certificate +4. **Email Domain Matching**: Email should match site domain (warning if not) +5. **User Education**: Inform users to secure their website + +**Residual Risk**: Moderate. If attacker compromises both DNS and website, they can authenticate. This is acceptable as it represents full domain compromise. #### Threat: Email Address Enumeration -**Risk**: Attacker discovers which domains are registered by requesting codes. +**Risk**: Attacker discovers email addresses by triggering rel="me" discovery. **Mitigations**: -1. **Consistent Response**: Always say "If email exists, code sent" -2. **No Error Differentiation**: Same message for valid/invalid emails -3. **Rate Limiting**: Prevent bulk enumeration +1. **Public Information**: rel="me" links are intentionally public +2. **User Awareness**: Users know they're publishing email on their site +3. **Rate Limiting**: Prevent bulk scanning +4. **Robots.txt**: Users can restrict crawler access if desired -**Residual Risk**: Minimal, domain names are public anyway (DNS). +**Residual Risk**: Minimal. Email addresses are intentionally published by users on their own sites. -### Domain Ownership Verification +### Domain Ownership Verification (Two-Factor) -#### TXT Record Validation (Preferred) +**Mechanism**: v1.0.0 requires BOTH verification methods: -**Mechanism**: Admin adds DNS TXT record `_gondulf.example.com` = `verified`. +#### 1. TXT Record Validation (Required) + +**Mechanism**: Admin adds DNS TXT record `_gondulf.{domain}` = `verified`. **Security Properties**: -- Requires DNS control (stronger than email) +- Proves DNS control (first factor) - Verifiable without user interaction - Cacheable for performance - Re-verifiable periodically -**Threat: DNS Spoofing** - -**Mitigations**: -1. **DNSSEC**: Validate DNSSEC signatures if available -2. **Multiple Resolvers**: Query 2+ DNS servers, require consensus -3. **Caching**: Cache valid results, re-verify daily -4. **Logging**: Log all DNS verification attempts - **Implementation**: ```python import dns.resolver -import dns.dnssec def verify_txt_record(domain: str) -> bool: """ Verify _gondulf.{domain} TXT record exists with value 'verified'. + Requires consensus from multiple independent resolvers. """ try: # Use Google and Cloudflare DNS for redundancy resolvers = ['8.8.8.8', '1.1.1.1'] - results = [] + verified_count = 0 for resolver_ip in resolvers: resolver = dns.resolver.Resolver() resolver.nameservers = [resolver_ip] resolver.timeout = 5 - resolver.lifetime = 5 answers = resolver.resolve(f'_gondulf.{domain}', 'TXT') for rdata in answers: txt_value = rdata.to_text().strip('"') if txt_value == 'verified': - results.append(True) + verified_count += 1 break - # Require consensus from both resolvers - return len(results) >= 2 + # Require consensus from at least 2 resolvers + return verified_count >= 2 except Exception as e: logger.warning(f"DNS verification failed for {domain}: {e}") return False ``` -**Residual Risk**: Low, DNS is foundational internet infrastructure. +#### 2. Email Verification via rel="me" (Required) + +**Mechanism**: Email discovered from site's ``, then verified with code. + +**Security Properties**: +- Proves website control (can modify HTML) +- Proves email control (receives and enters code) +- Follows IndieWeb standards (rel="me") +- Self-documenting (user declares email publicly) + +**Implementation**: +```python +from bs4 import BeautifulSoup +import requests + +def discover_email_from_site(domain: str) -> Optional[str]: + """ + Fetch site and discover email from rel="me" link. + """ + try: + response = requests.get(f"https://{domain}", timeout=10, allow_redirects=True) + response.raise_for_status() + + soup = BeautifulSoup(response.content, 'html.parser') + me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + + for link in me_links: + href = link.get('href', '') + if href.startswith('mailto:'): + email = href.replace('mailto:', '').strip() + if validate_email_format(email): + return email + + return None + + except Exception as e: + logger.error(f"Failed to discover email for {domain}: {e}") + return None +``` + +**Combined Residual Risk**: Low. Attacker must compromise DNS, website, and email account to authenticate fraudulently. ## Authorization Security @@ -431,15 +497,80 @@ class AuthorizeRequest(BaseModel): **Residual Risk**: Minimal, Pydantic provides strong validation. +### HTML Parsing Security (rel="me" Discovery) + +#### Threat: Malicious HTML Injection + +**Risk**: Attacker's site contains malicious HTML to exploit parser. + +**Mitigations**: +1. **Robust Parser**: Use BeautifulSoup (handles malformed HTML safely) +2. **Link Extraction Only**: Only extract href attributes, no script execution +3. **Timeout**: 10-second timeout for HTTP requests +4. **Size Limit**: Limit response size (prevent memory exhaustion) +5. **HTTPS Required**: Fetch over TLS only +6. **Certificate Validation**: Verify SSL certificates + +**Implementation**: +```python +from bs4 import BeautifulSoup +import requests + +def discover_email_from_site(domain: str) -> Optional[str]: + """ + Safely discover email from rel="me" link. + """ + try: + # Fetch with safety limits + response = requests.get( + f"https://{domain}", + timeout=10, + allow_redirects=True, + max_redirects=5, + stream=True # Don't load entire response into memory + ) + response.raise_for_status() + + # Limit response size (prevent memory exhaustion) + MAX_SIZE = 5 * 1024 * 1024 # 5MB + content = response.raw.read(MAX_SIZE) + + # Parse HTML (BeautifulSoup handles malformed HTML safely) + soup = BeautifulSoup(content, 'html.parser') + + # Find rel="me" links (no script execution) + me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + + # Extract mailto: links only + for link in me_links: + href = link.get('href', '') + if href.startswith('mailto:'): + email = href.replace('mailto:', '').strip() + # Validate email format before returning + if validate_email_format(email): + return email + + return None + + except requests.exceptions.SSLError as e: + logger.error(f"SSL certificate validation failed for {domain}: {e}") + return None + except Exception as e: + logger.error(f"Failed to discover email for {domain}: {e}") + return None +``` + +**Residual Risk**: Very low. BeautifulSoup is designed for untrusted HTML. + ### Email Validation #### Threat: Email Injection Attacks -**Risk**: Attacker injects SMTP commands via email address field. +**Risk**: Attacker crafts malicious email address in rel="me" link. **Mitigations**: 1. **Format Validation**: Strict email regex (RFC 5322) -2. **Domain Matching**: Require email domain match `me` domain +2. **No User Input**: Email discovered from site (not user-provided) 3. **SMTP Library**: Use well-tested library (smtplib) 4. **Content Encoding**: Encode email content properly 5. **Rate Limiting**: Prevent abuse @@ -447,31 +578,27 @@ class AuthorizeRequest(BaseModel): **Validation**: ```python import re -from email.utils import parseaddr -def validate_email(email: str, required_domain: str) -> tuple[bool, str]: +def validate_email_format(email: str) -> bool: """ - Validate email address and domain match. + Validate email address format. """ - # Parse email (RFC 5322 compliant) - name, addr = parseaddr(email) - - # Basic format check + # Basic format check (RFC 5322 simplified) email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' - if not re.match(email_regex, addr): - return False, "Invalid email format" + if not re.match(email_regex, email): + return False - # Extract domain - email_domain = addr.split('@')[1].lower() - required_domain = required_domain.lower() + # Sanity checks + if len(email) > 254: # RFC 5321 maximum + return False + if email.count('@') != 1: + return False - # Domain must match - if email_domain != required_domain: - return False, f"Email must be at {required_domain}" - - return True, "" + return True ``` +**Note**: Domain matching is NOT enforced in v1.0.0. User may have email at different domain than their identity site (e.g., phil@gmail.com for phil.example.com). This is acceptable as user explicitly publishes the email on their site. + **Residual Risk**: Low, standard validation patterns. ## Network Security @@ -567,21 +694,29 @@ async def add_security_headers(request: Request, call_next): **Email Handling**: ```python -# Email stored ONLY during verification (in-memory, 15-min TTL) +# Email discovered from rel="me" link (not user-provided) +# Stored ONLY during verification (in-memory, 15-min TTL) verification_codes[code_id] = { - "email": email, # ← Exists ONLY here, NEVER in database + "email": email, # ← Discovered from site, exists ONLY here, NEVER in database "code": code, + "domain": domain, "expires_at": datetime.utcnow() + timedelta(minutes=15) } -# After verification: email is deleted, only domain stored +# After verification: email is deleted, only domain + timestamp stored db.execute(''' - INSERT INTO domains (domain, verification_method, verified_at) - VALUES (?, 'email', ?) -''', (domain, datetime.utcnow())) -# Note: NO email address in database + INSERT INTO domains (domain, verification_method, verified_at, last_email_check) + VALUES (?, 'two_factor', ?, ?) +''', (domain, datetime.utcnow(), datetime.utcnow())) +# Note: NO email address in database, only verification timestamp ``` +**rel="me" Discovery**: +- Email addresses are public (user publishes on their site) +- Server fetches email from user's site (not user input) +- Reduces social engineering risk (can't claim arbitrary email) +- Follows IndieWeb standards for identity + ### Database Security **SQLite Security**: @@ -829,13 +964,15 @@ security: ## Security Roadmap ### v1.0.0 (MVP) -- ✅ Email-based authentication +- ✅ Two-factor domain verification (DNS TXT + Email via rel="me") +- ✅ rel="me" email discovery (IndieWeb standard) +- ✅ HTML parsing security (BeautifulSoup) - ✅ TLS/HTTPS enforcement - ✅ Secure token generation (opaque, hashed) - ✅ URL validation (open redirect prevention) - ✅ Input validation (Pydantic) - ✅ Security headers -- ✅ Minimal data collection +- ✅ Minimal data collection (no email storage) ### v1.1.0 - PKCE support (code challenge/verifier) diff --git a/docs/decisions/0004-phase-2-implementation-decisions.md b/docs/decisions/0004-phase-2-implementation-decisions.md new file mode 100644 index 0000000..cf1872c --- /dev/null +++ b/docs/decisions/0004-phase-2-implementation-decisions.md @@ -0,0 +1,98 @@ +# 0004. Phase 2 Implementation Decisions + +Date: 2024-11-20 + +## Status +Accepted + +## Context +The Developer has raised 8 categories of implementation questions for Phase 2 that require architectural decisions. These decisions need to balance simplicity with functionality while providing clear direction for implementation. + +## Decisions + +### 1. Rate Limiting Implementation +**Decision**: Implement actual rate limiting with in-memory storage in Phase 2. +**Rationale**: Security features should be real from the start, not stubs. In-memory is simplest. +**Implementation**: +- Use a simple dictionary with domain as key, list of timestamps as value +- Clean up old timestamps on each check (older than 1 hour) +- Store in `RateLimiter` service as instance variable +- No persistence needed - resets on restart is acceptable + +### 2. Authorization Code Metadata Structure +**Decision**: Use Phase 1's `CodeStorage` service with complete structure from the start. +**Rationale**: Reuse existing infrastructure, avoid future migrations. +**Implementation**: +- Include `used` field (boolean, default False) even though Phase 3 consumes it +- Store epoch integers for timestamps (simpler than datetime objects) +- Use same `CodeStorage` from Phase 1 with authorization code keys + +### 3. HTML Template Implementation +**Decision**: Use Jinja2 templates with separate template files. +**Rationale**: Jinja2 is standard, maintainable, and allows for future template customization. +**Implementation**: +- Templates in `src/gondulf/templates/` +- Create `base.html` for shared layout +- Individual templates: `verify_email.html`, `verify_totp.html`, `authorize.html`, `error.html` +- Pass minimal context to templates + +### 4. Database Migration Timing +**Decision**: Apply migration 002 immediately as part of Phase 2 setup. +**Rationale**: Keep database schema current with code expectations. +**Implementation**: +- Run migration before any Phase 2 code execution +- New code assumes 'two_factor' column exists +- Migration updates existing rows (if any) to have 'two_factor' = false + +### 5. Client Validation Helper Functions +**Decision**: Implement as standalone functions in a shared utility module. +**Rationale**: Functions over classes when no state is needed. Simpler to test and understand. +**Implementation**: +- Create `src/gondulf/utils/validation.py` +- Functions: `mask_email()`, `validate_redirect_uri()`, `normalize_client_id()` +- Full subdomain validation now (not a stub) - security should be complete + +### 6. Error Response Format Consistency +**Decision**: Use format appropriate to the endpoint type. +**Rationale**: Follow OAuth 2.0 patterns and user experience expectations. +**Implementation**: +- Verification endpoints (`/verify/email`, `/verify/totp`): JSON responses, always 200 OK +- Authorization endpoint errors before user interaction: HTML error page +- Authorization endpoint errors after client validation: OAuth redirect with error +- Token endpoint (Phase 3): Always JSON + +### 7. Dependency Injection Pattern +**Decision**: Create `dependencies.py` with singleton services instantiated at startup. +**Rationale**: Simpler than per-request instantiation, consistent with Phase 1 pattern. +**Implementation**: +- All services instantiated once in `dependencies.py` +- Services read configuration at instantiation +- FastAPI dependency injection provides same instance to all requests +- Pattern: `get_code_storage()`, `get_rate_limiter()`, etc. + +### 8. Test Organization for Authorization Endpoint +**Decision**: Separate test files per major endpoint with shared fixtures module. +**Rationale**: Easier to navigate and maintain as tests grow. +**Implementation**: +- `tests/test_verification_endpoints.py` - email and TOTP verification +- `tests/test_authorization_endpoint.py` - authorization flow +- `tests/conftest.py` - shared fixtures for common scenarios +- Test complete flows, not sub-endpoints in isolation + +## Consequences + +### Positive +- Clear, consistent patterns across the codebase +- Real security from the start (no stubs) +- Reuse of existing Phase 1 infrastructure +- Standard, maintainable template approach +- Simple service architecture + +### Negative +- Slightly more upfront work than stub implementations +- In-memory rate limiting loses state on restart +- Templates add a dependency (Jinja2) + +### Neutral +- Following established patterns from other web frameworks +- Committing to specific implementation choices early \ No newline at end of file diff --git a/docs/decisions/ADR-005-email-based-authentication-v1-0-0.md b/docs/decisions/ADR-005-email-based-authentication-v1-0-0.md index 09b1c24..eb850fb 100644 --- a/docs/decisions/ADR-005-email-based-authentication-v1-0-0.md +++ b/docs/decisions/ADR-005-email-based-authentication-v1-0-0.md @@ -1,9 +1,10 @@ -# ADR-005: Email-Based Authentication for v1.0.0 +# ADR-005: Two-Factor Domain Verification for v1.0.0 (DNS + Email via rel="me") Date: 2025-11-20 +Last Updated: 2025-11-20 ## Status -Accepted +Accepted (Updated) ## Context @@ -65,143 +66,289 @@ From project brief: ## Decision -**Gondulf v1.0.0 will use email-based verification as the PRIMARY authentication method, with DNS TXT record verification as an OPTIONAL fast-path.** +**Gondulf v1.0.0 will require BOTH DNS TXT record verification AND email verification using the IndieWeb rel="me" pattern. Both verifications must succeed for authentication to complete.** ### Implementation Approach -**Two-Tier Verification**: +**Two-Factor Verification (Both Required)**: -1. **DNS TXT Record (Preferred, Optional)**: +1. **DNS TXT Record Verification (Required)**: - Check for `_gondulf.{domain}` TXT record = `verified` - - If found: Skip email verification, use cached result - - If not found: Fall back to email verification - - Result cached in database for future use + - If found: Proceed to email verification + - If not found: Authentication fails with instructions to add TXT record + - Proves: User controls DNS for the domain -2. **Email Verification (Required Fallback)**: - - User provides email address at their domain +2. **Email Discovery via rel="me" (Required)**: + - Fetch user's domain homepage (e.g., https://example.com) + - Parse HTML for `` + - Extract email address from rel="me" link + - If not found: Authentication fails with instructions to add rel="me" link + - Proves: User has published email relationship on their site + +3. **Email Verification Code (Required)**: - Server generates 6-digit verification code - - Server sends code via SMTP + - Server sends code to discovered email address via SMTP - User enters code (15-minute expiration) - - Domain marked as verified in database + - Verification code must be correct to complete authentication + - Proves: User controls the email account -**Why Both?**: -- DNS provides fast path for tech-savvy users -- Email provides accessible path for all users -- DNS requires upfront setup but smoother repeat authentication -- Email requires no setup but requires email access each time +**Why All Three?**: +- **DNS TXT**: Proves domain DNS control (strong ownership signal) +- **rel="me"**: Follows IndieWeb standard for identity claims +- **Email Code**: Proves active control of the email account (not just DNS/HTML) +- **Combined**: Two-factor verification provides stronger security than either alone ### Rationale -**Meets User Requirements**: -- Email-based authentication as specified -- No external identity providers (GitHub, GitLab) in v1.0.0 -- Simple to understand and implement -- Familiar UX pattern +**Enhanced Security Model**: +- Two-factor verification: DNS control + Email control +- Prevents attacks where only one factor is compromised +- DNS TXT proves domain ownership +- Email code proves active account control +- rel="me" follows IndieWeb standards for identity -**Simplicity**: -- Email verification is well-understood -- Standard library SMTP support (smtplib) -- No OAuth 2.0 client implementation needed -- No external API dependencies +**Follows IndieWeb Standards**: +- rel="me" is standard practice for identity claims (see: https://thesatelliteoflove.com) +- Aligns with IndieAuth ecosystem expectations +- Users likely already have rel="me" links for other purposes +- Email discovery is self-documenting (user's site declares their email) -**Security Sufficient for MVP**: -- Email access typically indicates domain control -- 6-digit codes provide 1,000,000 combinations -- 15-minute expiration limits brute-force window -- Rate limiting prevents abuse -- TLS for email delivery (STARTTLS) +**No User-Provided Email Input**: +- Server discovers email from user's site (no manual entry) +- Prevents typos and social engineering +- Email is self-attested by user on their own domain +- Reduces attack surface (can't claim arbitrary email) -**Operational Simplicity**: -- Requires only SMTP configuration (widely available) -- No API keys or provider accounts needed -- No rate limits from external providers -- Full control over verification flow +**Stronger Than Single-Factor**: +- Attacker needs DNS control AND email access +- Compromised DNS alone: insufficient +- Compromised email alone: insufficient +- Requires control of both infrastructure and communication -**DNS TXT as Enhancement**: -- Provides better UX for repeat authentication -- Demonstrates domain control more directly -- Optional (users not forced to configure DNS) -- Cached result eliminates email requirement +**Simplicity Maintained**: +- Two verification checks, but both straightforward +- DNS TXT: standard practice +- rel="me": standard HTML link +- Email code: familiar pattern +- Total setup time: < 5 minutes for technical users ## Consequences ### Positive Consequences -1. **User Simplicity**: - - Familiar email verification pattern - - No need to create accounts on external services - - Works with any email provider +1. **Enhanced Security**: + - Two-factor verification (DNS + Email) + - Stronger ownership proof than single factor + - Prevents single-point-of-compromise attacks + - Aligns with security best practices -2. **Implementation Simplicity**: - - Standard library support (smtplib, email) - - No external API integration - - Straightforward testing (mock SMTP) +2. **IndieWeb Standard Compliance**: + - Follows rel="me" pattern from IndieWeb community + - Interoperability with other IndieWeb tools + - Users may already have rel="me" configured + - Self-documenting identity claims -3. **Operational Simplicity**: - - Single external dependency (SMTP server) - - No API rate limits to manage - - No provider outages to worry about - - Admin controls email templates +3. **Reduced Attack Surface**: + - No user-provided email input (prevents typos/social engineering) + - Email discovered from user's own site + - Can't claim arbitrary email addresses + - User controls all verification requirements -4. **Privacy**: - - Email addresses NOT stored (deleted after verification) +4. **Implementation Simplicity**: + - HTML parsing for rel="me" (standard libraries) + - DNS queries (dnspython) + - SMTP email sending (smtplib) + - No external API dependencies + +5. **Privacy**: + - Email addresses NOT stored after verification - No data shared with third parties - No tracking by external providers + - Minimal data collection -5. **Flexibility**: - - DNS TXT provides fast-path for power users - - Email fallback ensures accessibility - - No user locked out if DNS unavailable +6. **Transparency**: + - User explicitly declares email on their site + - No hidden verification methods + - User controls both DNS and HTML + - Clear requirements for setup ### Negative Consequences -1. **Email Dependency**: +1. **Higher Setup Complexity**: + - Users must configure TWO things (DNS TXT + rel="me" link) + - More steps than single-factor approaches + - Requires basic HTML editing skills + - May deter non-technical users + +2. **Email Dependency**: - Requires functioning SMTP configuration - Email delivery not guaranteed (spam filters) - Users must have email access during authentication - - Email account compromise = domain compromise + - Email account compromise still a risk (mitigated by DNS requirement) -2. **User Experience**: - - Extra step vs. provider OAuth (more clicks) - - Requires checking email inbox +3. **User Experience**: + - More setup steps vs. simpler alternatives + - Requires checking email inbox during login - Potential delay (email delivery time) - Code expiration can frustrate users + - Both verifications must succeed (no fallback) -3. **Security Limitations**: - - Email interception risk (mitigated by TLS) - - Email account compromise risk (user responsibility) - - Weaker than hardware-based auth (WebAuthn) +4. **HTML Parsing Complexity**: + - Must parse potentially malformed HTML + - Multiple possible HTML formats for rel="me" + - Case sensitivity issues + - Must handle various link formats (mailto: vs https://) -4. **Scalability Concerns**: - - Email delivery at scale (future concern) - - SMTP rate limits (future concern) - - Email provider blocking (spam prevention) +5. **Failure Points**: + - DNS lookup failure blocks authentication + - Site unavailable blocks authentication + - Email send failure blocks authentication + - No fallback mechanism (both required) ### Mitigation Strategies -**Email Delivery Reliability**: -```python -# Robust SMTP configuration -SMTP_CONFIG = { - 'host': os.environ['SMTP_HOST'], - 'port': int(os.environ.get('SMTP_PORT', '587')), - 'use_tls': True, # STARTTLS required - 'username': os.environ['SMTP_USERNAME'], - 'password': os.environ['SMTP_PASSWORD'], - 'from_email': os.environ['SMTP_FROM'], - 'timeout': 10, # Fail fast -} +**Clear Setup Instructions**: +```markdown +## Domain Verification Setup -# Comprehensive error handling -try: - send_email(to=email, code=code) -except SMTPException as e: - logger.error(f"Email send failed: {e}") - # Display user-friendly error - raise HTTPException(500, "Email delivery failed. Try again or contact admin.") +Gondulf requires two verifications to prove domain ownership: + +### Step 1: Add DNS TXT Record +Add this DNS record to your domain: +- Type: TXT +- Name: _gondulf.example.com +- Value: verified + +This proves you control DNS for your domain. + +### Step 2: Add rel="me" Link to Your Homepage +Add this HTML to your homepage (e.g., https://example.com/index.html): + + +This declares your email address publicly on your site. + +### Step 3: Verify Email Access +During login: +- We'll discover your email from the rel="me" link +- We'll send a verification code to that email +- Enter the code to complete authentication + +Setup time: ~5 minutes ``` -**Code Security**: +**Robust HTML Parsing**: +```python +from bs4 import BeautifulSoup +from urllib.parse import urlparse + +def discover_email_from_site(domain_url: str) -> Optional[str]: + """ + Fetch site and discover email from rel="me" link. + + Returns: email address or None if not found + """ + try: + # Fetch homepage + response = requests.get(domain_url, timeout=10, allow_redirects=True) + response.raise_for_status() + + # Parse HTML (handle malformed HTML gracefully) + soup = BeautifulSoup(response.content, 'html.parser') + + # Find all rel="me" links + me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + + # Look for mailto: links + for link in me_links: + href = link.get('href', '') + if href.startswith('mailto:'): + email = href.replace('mailto:', '').strip() + # Validate email format + if validate_email_format(email): + logger.info(f"Discovered email via rel='me' for {domain_url}") + return email + + logger.warning(f"No rel='me' mailto: link found for {domain_url}") + return None + + except Exception as e: + logger.error(f"Failed to discover email for {domain_url}: {e}") + return None +``` + +**DNS Verification**: +```python +def verify_dns_txt(domain: str) -> bool: + """ + Verify _gondulf.{domain} TXT record exists. + + Returns: True if verified, False otherwise + """ + try: + import dns.resolver + + # Query multiple resolvers for redundancy + resolvers = ['8.8.8.8', '1.1.1.1'] + verified_count = 0 + + for resolver_ip in resolvers: + resolver = dns.resolver.Resolver() + resolver.nameservers = [resolver_ip] + resolver.timeout = 5 + + answers = resolver.resolve(f'_gondulf.{domain}', 'TXT') + for rdata in answers: + if rdata.to_text().strip('"') == 'verified': + verified_count += 1 + break + + # Require consensus from multiple resolvers + return verified_count >= 2 + + except Exception as e: + logger.warning(f"DNS verification failed for {domain}: {e}") + return False +``` + +**Helpful Error Messages**: +```python +# DNS TXT not found +if not dns_verified: + return ErrorResponse(""" + DNS verification failed. + + Please add this TXT record to your domain: + - Type: TXT + - Name: _gondulf.{domain} + - Value: verified + + DNS changes may take up to 24 hours to propagate. + """) + +# rel="me" not found +if not email_discovered: + return ErrorResponse(""" + Could not find rel="me" link on your site. + + Please add this to your homepage: + + + See: https://indieweb.org/rel-me for more information. + """) + +# Email send failure +if not email_sent: + return ErrorResponse(""" + Failed to send verification code to {email}. + + Please check: + - Email address is correct in your rel="me" link + - Email server is accepting mail + - Check spam/junk folder + """) +``` + +**Code Security** (unchanged): ```python # Sufficient entropy code = ''.join(secrets.choice('0123456789') for _ in range(6)) @@ -209,107 +356,182 @@ code = ''.join(secrets.choice('0123456789') for _ in range(6)) # Rate limiting MAX_ATTEMPTS = 3 # Per email -MAX_CODES = 3 # Per hour per email +MAX_CODES = 3 # Per hour per domain # Expiration CODE_LIFETIME = timedelta(minutes=15) -# Attempt tracking -attempts = code_storage.get_attempts(email) -if attempts >= MAX_ATTEMPTS: - raise HTTPException(429, "Too many attempts. Try again in 15 minutes.") -``` - -**Email Interception**: -```python -# Require TLS for email delivery -smtp.starttls() - -# Clear warning to users -""" -We've sent a verification code to your email. -Only enter this code if you initiated this login. -The code expires in 15 minutes. -""" - -# Log suspicious activity -if time_between_send_and_verify < 1_second: - logger.warning(f"Suspiciously fast verification: {domain}") -``` - -**DNS TXT Fast-Path**: -```python -# Check DNS first, skip email if verified -txt_record = dns.query(f'_gondulf.{domain}', 'TXT') -if txt_record == 'verified': - logger.info(f"DNS verification successful: {domain}") - # Use cached verification, skip email - return verified_domain(domain) - -# Fall back to email -logger.info(f"DNS verification not found, using email: {domain}") -return email_verification_flow(domain) -``` - -**User Education**: -```markdown -## Domain Verification - -Gondulf offers two ways to verify domain ownership: - -### Option 1: DNS TXT Record (Recommended) -Add this DNS record to skip email verification: -- Type: TXT -- Name: _gondulf.example.com -- Value: verified - -Benefits: -- Faster authentication (no email required) -- Verify once, use forever -- More secure (DNS control = domain control) - -### Option 2: Email Verification -- Enter an email address at your domain -- We'll send a 6-digit code -- Enter the code to verify - -Benefits: -- No DNS configuration needed -- Works immediately -- Familiar process +# Single-use enforcement +code_storage.mark_used(code_id) ``` ## Implementation -### Email Verification Flow +### Complete Authentication Flow (v1.0.0) ```python from datetime import datetime, timedelta import secrets import smtplib +import requests +import dns.resolver from email.message import EmailMessage +from bs4 import BeautifulSoup +from typing import Optional, Tuple -class EmailVerificationService: +class DomainVerificationService: + """ + Two-factor domain verification: DNS TXT + Email via rel="me" + """ def __init__(self, smtp_config: dict): self.smtp = smtp_config - self.codes = {} # In-memory storage (short-lived) + self.codes = {} # In-memory storage for verification codes - def request_code(self, email: str, domain: str) -> None: + def verify_domain_ownership(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]: """ - Generate and send verification code. + Perform two-factor domain verification. - Raises: - ValueError: If email domain doesn't match requested domain - HTTPException: If rate limit exceeded or email send fails + Returns: (success, email_discovered, error_message) + + Steps: + 1. Verify DNS TXT record + 2. Discover email from rel="me" link + 3. Send verification code to email + 4. User enters code (handled separately) """ - # Validate email matches domain - email_domain = email.split('@')[1].lower() - if email_domain != domain.lower(): - raise ValueError(f"Email must be at {domain}") + # Step 1: Verify DNS TXT record + dns_verified = self._verify_dns_txt(domain) + if not dns_verified: + return False, None, "DNS TXT record not found. Please add _gondulf.{domain} = verified" + # Step 2: Discover email from site's rel="me" link + email = self._discover_email_from_site(f"https://{domain}") + if not email: + return False, None, 'No rel="me" mailto: link found on homepage. Please add ' + + # Step 3: Generate and send verification code + code_sent = self._send_verification_code(email, domain) + if not code_sent: + return False, email, f"Failed to send verification code to {email}" + + # Return success with discovered email + return True, email, None + + def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, str]: + """ + Verify submitted code. + + Returns: (success, domain or error_message) + """ + code_data = self.codes.get(email) + + if not code_data: + return False, "No verification code found. Please request a new code." + + # Check expiration + if datetime.utcnow() > code_data['expires_at']: + del self.codes[email] + return False, "Code expired. Please request a new code." + + # Check attempts + code_data['attempts'] += 1 + if code_data['attempts'] > 3: + del self.codes[email] + return False, "Too many attempts. Please restart authentication." + + # Verify code (constant-time comparison) + if not secrets.compare_digest(submitted_code, code_data['code']): + return False, "Invalid code. Please try again." + + # Success: Clean up and return domain + domain = code_data['domain'] + del self.codes[email] # Single-use code + + logger.info(f"Domain verified: {domain} (DNS + Email)") + return True, domain + + def _verify_dns_txt(self, domain: str) -> bool: + """ + Verify _gondulf.{domain} TXT record exists with value 'verified'. + + Returns: True if verified, False otherwise + """ + record_name = f'_gondulf.{domain}' + + # Use multiple resolvers for redundancy + resolvers = ['8.8.8.8', '1.1.1.1'] + verified_count = 0 + + for resolver_ip in resolvers: + try: + resolver = dns.resolver.Resolver() + resolver.nameservers = [resolver_ip] + resolver.timeout = 5 + + answers = resolver.resolve(record_name, 'TXT') + + for rdata in answers: + if rdata.to_text().strip('"') == 'verified': + verified_count += 1 + break + + except Exception as e: + logger.debug(f"DNS query failed (resolver {resolver_ip}): {e}") + continue + + # Require consensus from at least 2 resolvers + if verified_count >= 2: + logger.info(f"DNS TXT verified: {domain}") + return True + + logger.warning(f"DNS TXT verification failed: {domain}") + return False + + def _discover_email_from_site(self, domain_url: str) -> Optional[str]: + """ + Fetch domain homepage and discover email from rel="me" link. + + Returns: email address or None if not found + """ + try: + # Fetch homepage + response = requests.get(domain_url, timeout=10, allow_redirects=True) + response.raise_for_status() + + # Parse HTML (BeautifulSoup handles malformed HTML) + soup = BeautifulSoup(response.content, 'html.parser') + + # Find all rel="me" links (both and ) + me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + + # Look for mailto: links + for link in me_links: + href = link.get('href', '') + if href.startswith('mailto:'): + email = href.replace('mailto:', '').strip() + + # Basic email validation + if '@' in email and '.' in email.split('@')[1]: + logger.info(f"Discovered email via rel='me': {domain_url}") + return email + + logger.warning(f"No rel='me' mailto: link found: {domain_url}") + return None + + except Exception as e: + logger.error(f"Failed to discover email for {domain_url}: {e}") + return None + + def _send_verification_code(self, email: str, domain: str) -> bool: + """ + Generate and send verification code to email. + + Returns: True if sent successfully, False otherwise + """ # Check rate limit - if self._is_rate_limited(email): - raise HTTPException(429, "Too many requests. Try again in 1 hour.") + if self._is_rate_limited(domain): + logger.warning(f"Rate limit exceeded for domain: {domain}") + return False # Generate 6-digit code code = ''.join(secrets.choice('0123456789') for _ in range(6)) @@ -323,56 +545,14 @@ class EmailVerificationService: 'attempts': 0, } - # Send email + # Send email via SMTP try: - self._send_code_email(email, code) - logger.info(f"Verification code sent to {email[:3]}***@{email_domain}") - except Exception as e: - logger.error(f"Failed to send email to {email_domain}: {e}") - raise HTTPException(500, "Email delivery failed") + msg = EmailMessage() + msg['From'] = self.smtp['from_email'] + msg['To'] = email + msg['Subject'] = 'Gondulf Verification Code' - def verify_code(self, email: str, submitted_code: str) -> str: - """ - Verify submitted code. - - Returns: domain if valid - Raises: HTTPException if invalid/expired - """ - code_data = self.codes.get(email) - - if not code_data: - raise HTTPException(400, "No verification code found") - - # Check expiration - if datetime.utcnow() > code_data['expires_at']: - del self.codes[email] - raise HTTPException(400, "Code expired. Request a new one.") - - # Check attempts - code_data['attempts'] += 1 - if code_data['attempts'] > 3: - del self.codes[email] - raise HTTPException(429, "Too many attempts") - - # Verify code (constant-time comparison) - if not secrets.compare_digest(submitted_code, code_data['code']): - raise HTTPException(400, "Invalid code") - - # Success: Clean up and return domain - domain = code_data['domain'] - del self.codes[email] # Single-use code - - logger.info(f"Domain verified via email: {domain}") - return domain - - def _send_code_email(self, to: str, code: str) -> None: - """Send verification code via SMTP.""" - msg = EmailMessage() - msg['From'] = self.smtp['from_email'] - msg['To'] = to - msg['Subject'] = 'Gondulf Verification Code' - - msg.set_content(f""" + msg.set_content(f""" Your Gondulf verification code is: {code} @@ -381,96 +561,34 @@ This code expires in 15 minutes. Only enter this code if you initiated this login. If you did not request this code, ignore this email. - """) + """) - with smtplib.SMTP(self.smtp['host'], self.smtp['port'], timeout=10) as smtp: - smtp.starttls() - smtp.login(self.smtp['username'], self.smtp['password']) - smtp.send_message(msg) + with smtplib.SMTP(self.smtp['host'], self.smtp['port'], timeout=10) as smtp: + smtp.starttls() + smtp.login(self.smtp['username'], self.smtp['password']) + smtp.send_message(msg) - def _is_rate_limited(self, email: str) -> bool: - """Check if email is rate limited.""" - # Simple in-memory tracking (for v1.0.0) - # Future: Redis-based rate limiting + logger.info(f"Verification code sent to {email[:3]}***@{email.split('@')[1]}") + return True + + except Exception as e: + logger.error(f"Failed to send email to {email}: {e}") + return False + + def _is_rate_limited(self, domain: str) -> bool: + """ + Check if domain is rate limited (max 3 codes per hour). + + Returns: True if rate limited, False otherwise + """ recent_codes = [ code for code in self.codes.values() - if code.get('email') == email + if code.get('domain') == domain and datetime.utcnow() - code['created_at'] < timedelta(hours=1) ] return len(recent_codes) >= 3 ``` -### DNS TXT Record Verification - -```python -import dns.resolver - -class DNSVerificationService: - def __init__(self, cache_storage): - self.cache = cache_storage - - def verify_domain(self, domain: str) -> bool: - """ - Check if domain has valid DNS TXT record. - - Returns: True if verified, False otherwise - """ - # Check cache first - cached = self.cache.get(domain) - if cached and cached['verified']: - logger.info(f"Using cached DNS verification: {domain}") - return True - - # Query DNS - try: - verified = self._query_txt_record(domain) - - # Cache result - self.cache.set(domain, { - 'verified': verified, - 'verified_at': datetime.utcnow(), - 'method': 'txt_record' - }) - - return verified - - except Exception as e: - logger.warning(f"DNS verification failed for {domain}: {e}") - return False - - def _query_txt_record(self, domain: str) -> bool: - """ - Query _gondulf.{domain} TXT record. - - Returns: True if record exists with value 'verified' - """ - record_name = f'_gondulf.{domain}' - - # Use multiple resolvers for redundancy - resolvers = ['8.8.8.8', '1.1.1.1'] - - for resolver_ip in resolvers: - try: - resolver = dns.resolver.Resolver() - resolver.nameservers = [resolver_ip] - resolver.timeout = 5 - resolver.lifetime = 5 - - answers = resolver.resolve(record_name, 'TXT') - - for rdata in answers: - txt_value = rdata.to_text().strip('"') - if txt_value == 'verified': - logger.info(f"DNS TXT verified: {domain} (resolver: {resolver_ip})") - return True - - except Exception as e: - logger.debug(f"DNS query failed (resolver {resolver_ip}): {e}") - continue - - return False -``` - ## Future Enhancements ### v1.1.0+: Additional Authentication Methods @@ -561,13 +679,22 @@ These will be additive (user chooses method), not replacing email. ## References +- IndieWeb rel="me": https://indieweb.org/rel-me +- Example Implementation: https://thesatelliteoflove.com (Phil Skents' identity page) - SMTP Protocol (RFC 5321): https://datatracker.ietf.org/doc/html/rfc5321 - Email Security (STARTTLS): https://datatracker.ietf.org/doc/html/rfc3207 - DNS TXT Records (RFC 1035): https://datatracker.ietf.org/doc/html/rfc1035 +- HTML Link Relations: https://www.w3.org/TR/html5/links.html#linkTypes +- BeautifulSoup (HTML parsing): https://www.crummy.com/software/BeautifulSoup/ - WebAuthn (W3C): https://www.w3.org/TR/webauthn/ (future) ## Decision History -- 2025-11-20: Proposed (Architect) -- 2025-11-20: Accepted (Architect) +- 2025-11-20: Proposed (Architect) - Email primary, DNS optional +- 2025-11-20: Accepted (Architect) - Email primary, DNS optional +- 2025-11-20: **UPDATED** (Architect) - BOTH required (DNS + Email via rel="me") + - Changed from single-factor (email OR DNS) to two-factor (email AND DNS) + - Added rel="me" email discovery (IndieWeb standard) + - Removed user-provided email input (security improvement) + - Enhanced security model with dual verification - TBD: Review after v1.0.0 deployment (gather user feedback) diff --git a/docs/decisions/ADR-008-rel-me-email-discovery.md b/docs/decisions/ADR-008-rel-me-email-discovery.md new file mode 100644 index 0000000..5c2cf6b --- /dev/null +++ b/docs/decisions/ADR-008-rel-me-email-discovery.md @@ -0,0 +1,516 @@ +# ADR-008: rel="me" Email Discovery Pattern + +Date: 2025-11-20 + +## Status +Accepted + +## Context + +Gondulf's authentication flow requires email verification as part of two-factor domain verification (see ADR-005). This raises the question: How do we obtain the user's email address? + +### Email Acquisition Methods Evaluated + +**1. User-Provided Email Input** +- User manually enters their email address +- Server validates email domain matches identity domain +- Simple UX pattern (familiar from many sites) + +**2. DNS TXT Record** +- Email address stored in DNS: `_email.example.com` TXT `user@example.com` +- Server queries DNS to discover email +- Requires DNS configuration + +**3. rel="me" Link Discovery (IndieWeb Standard)** +- User publishes email on their site: `` +- Server fetches site and parses HTML for rel="me" links +- Follows IndieWeb standards for identity claims + +**4. WebFinger Protocol** +- Server queries `/.well-known/webfinger?resource={domain}` +- Standard protocol for identity discovery +- Requires additional endpoint implementation + +### Requirements + +From the user requirement and IndieAuth ecosystem: +- **Security**: Prevent social engineering and email spoofing +- **Simplicity**: Keep v1.0.0 implementation straightforward +- **Standards**: Align with IndieWeb/IndieAuth community practices +- **Self-Documenting**: Users should understand what they're publishing + +### IndieWeb Context + +The IndieWeb community uses `rel="me"` as a standard way to assert identity relationships: +- Users publish rel="me" links on their homepage to various profiles (GitHub, Twitter, email, etc.) +- Other tools can discover these relationships by parsing the page +- Well-established pattern in the IndieWeb ecosystem +- Reference implementation: https://thesatelliteoflove.com + +## Decision + +**Gondulf v1.0.0 will discover email addresses from rel="me" links published on the user's homepage, following the IndieWeb standard.** + +### Implementation Approach + +1. **Fetch User's Homepage** + - When user initiates authentication with domain (e.g., `https://example.com`) + - Server fetches the homepage over HTTPS + - Timeout: 10 seconds + - Follow redirects (max 5) + - Verify SSL certificate + +2. **Parse HTML for rel="me" Links** + - Use BeautifulSoup for robust HTML parsing (handles malformed HTML) + - Search for `` tags + - Also check `` tags + - Extract first matching mailto: link + - Case-insensitive rel attribute matching + +3. **Validate Email Format** + - Basic RFC 5322 format validation + - Length checks (max 254 characters per RFC 5321) + - Format: `user@domain.tld` + +4. **Use Discovered Email** + - Send verification code to discovered email + - Display partially masked email to user: `u***@example.com` + - User cannot modify email (discovered automatically) + +5. **Error Handling** + - If no rel="me" link found: Display setup instructions + - If multiple mailto: links: Use first one + - If site unreachable: Display error with retry option + - If SSL verification fails: Reject (security) + +### Example HTML + +User adds this to their homepage: + +```html + + + + Phil Skents + + + + + + + + +

Phil Skents

+

This is my personal website.

+ + +``` + +Or visible link: + +```html +
Email me +``` + +## Rationale + +### Follows IndieWeb Standards + +**IndieWeb Alignment**: +- rel="me" is the standard way to assert identity in IndieWeb +- Users familiar with IndieAuth likely already have rel="me" configured +- Interoperability with other IndieWeb tools +- Well-documented pattern: https://indieweb.org/rel-me + +**Community Expectations**: +- IndieAuth ecosystem uses rel="me" extensively +- Users understand the pattern +- Existing tutorials and documentation available +- Aligns with decentralized identity principles + +### Security Benefits + +**Prevents Social Engineering**: +- User cannot claim arbitrary email addresses +- Email must be published on the user's own site +- Attacker cannot trick user into entering wrong email +- Self-attested identity (user declares on their domain) + +**Reduces Attack Surface**: +- No user input field for email (no typos, no XSS) +- No email enumeration via guessing +- Email discovery transparent and auditable +- User controls what email is published + +**Transparency**: +- User explicitly publishes email on their site +- Public declaration of email relationship +- User aware they're making email public +- No hidden or implicit email collection + +### Implementation Simplicity + +**Standard Libraries**: +- BeautifulSoup: Robust HTML parsing (handles malformed HTML) +- requests: HTTP client (widely used, well-tested) +- No custom protocols or complex parsing +- Python standard library for email validation + +**Error Handling**: +- Clear error messages with setup instructions +- Graceful degradation (site unavailable, etc.) +- Standard HTTP status codes +- No complex state management + +**Testing**: +- Easy to mock HTTP responses +- Straightforward unit tests +- BeautifulSoup handles edge cases (malformed HTML) +- No external service dependencies + +### User Experience + +**Self-Documenting**: +- User adds one HTML tag to their site +- Clear relationship between domain and email +- User understands what they're publishing +- No hidden configuration + +**Familiar Pattern**: +- Similar to verifying site ownership (Google Search Console, etc.) +- Adding meta tags is common web practice +- Many users already have rel="me" for other purposes +- Works with static sites (no backend required) + +**Setup Time**: +- ~1 minute to add link tag +- No waiting (unlike DNS propagation) +- Immediate verification possible +- Can be combined with other rel="me" links + +## Consequences + +### Positive Consequences + +1. **IndieWeb Standard Compliance**: + - Follows established rel="me" pattern + - Interoperability with IndieWeb tools + - Community-vetted approach + - Well-documented standard + +2. **Enhanced Security**: + - No user-provided email input (prevents social engineering) + - Email explicitly published by user + - Transparent and auditable + - Reduces phishing risk + +3. **Implementation Simplicity**: + - Standard libraries (BeautifulSoup, requests) + - No complex protocols + - Easy to test and maintain + - Handles malformed HTML gracefully + +4. **User Control**: + - User explicitly declares email on their site + - Can change email by updating HTML + - No hidden email collection + - User aware of public email + +5. **Flexibility**: + - Works with static sites (no backend needed) + - Can use any email provider + - Email can be at different domain (e.g., Gmail) + - Supports multiple rel="me" links + +### Negative Consequences + +1. **Public Email Requirement**: + - User must publish email publicly on their site + - Not suitable for users who want private email + - Email harvesters can discover address + - Spam risk (mitigated: users can use spam filters) + +2. **HTML Parsing Complexity**: + - Must handle various HTML formats + - Malformed HTML can cause issues (mitigated: BeautifulSoup) + - Case sensitivity considerations + - Multiple possible HTML structures + +3. **Website Dependency**: + - User's site must be available during authentication + - Site downtime blocks authentication + - No fallback if site unreachable + - Requires HTTPS (not all sites have valid certificates) + +4. **Discovery Failures**: + - User may not have rel="me" configured + - Link may be in wrong format + - Email may be invalid format + - Clear error messages required + +5. **Privacy Considerations**: + - Email addresses visible to anyone + - Cannot use email verification without public disclosure + - Users must accept public email + - May deter privacy-conscious users + +### Mitigation Strategies + +**For Public Email Concern**: +- Document clearly that email will be public +- Suggest using dedicated email for IndieAuth +- Recommend spam filtering +- Note: Email is user's choice (they publish it) + +**For HTML Parsing**: +```python +from bs4 import BeautifulSoup + +# BeautifulSoup handles malformed HTML gracefully +soup = BeautifulSoup(html_content, 'html.parser') + +# Case-insensitive attribute matching +me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + +# Multiple link formats supported +# +# Email +``` + +**For Website Dependency**: +- Clear error messages with retry option +- Suggest checking site availability +- Timeout limits (10 seconds) +- Log errors for debugging + +**For Discovery Failures**: +```markdown +Error: No rel="me" email link found + +Please add this to your homepage: + + +See: https://indieweb.org/rel-me for more information. +``` + +## Implementation + +### Email Discovery Service + +```python +from bs4 import BeautifulSoup +import requests +from typing import Optional +import re + +class RelMeEmailDiscovery: + """ + Discover email addresses from rel="me" links on user's homepage. + """ + + def discover_email(self, domain: str) -> Optional[str]: + """ + Fetch domain homepage and discover email from rel="me" link. + + Args: + domain: User's domain (e.g., "example.com") + + Returns: + Email address or None if not found + """ + url = f"https://{domain}" + + try: + # Fetch homepage with safety limits + response = requests.get( + url, + timeout=10, + allow_redirects=True, + max_redirects=5, + verify=True # Verify SSL certificate + ) + response.raise_for_status() + + # Parse HTML (handles malformed HTML) + soup = BeautifulSoup(response.content, 'html.parser') + + # Find all rel="me" links + # Both and tags supported + me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + + # Look for mailto: links + for link in me_links: + href = link.get('href', '') + if href.startswith('mailto:'): + email = href.replace('mailto:', '').strip() + + # Validate email format + if self._validate_email_format(email): + logger.info(f"Discovered email via rel='me' for {domain}") + return email + + logger.warning(f"No rel='me' mailto: link found on {domain}") + return None + + except requests.exceptions.SSLError as e: + logger.error(f"SSL verification failed for {domain}: {e}") + return None + except requests.exceptions.Timeout: + logger.error(f"Timeout fetching {domain}") + return None + except requests.exceptions.HTTPError as e: + logger.error(f"HTTP error fetching {domain}: {e}") + return None + except Exception as e: + logger.error(f"Failed to discover email for {domain}: {e}") + return None + + def _validate_email_format(self, email: str) -> bool: + """ + Validate email address format. + + Args: + email: Email address to validate + + Returns: + True if valid format, False otherwise + """ + # Basic RFC 5322 format check + email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' + if not re.match(email_regex, email): + return False + + # Length check (RFC 5321) + if len(email) > 254: + return False + + # Must have exactly one @ + if email.count('@') != 1: + return False + + return True +``` + +### Error Messages + +```python +# DNS TXT found, but no rel="me" link +error_message = """ +Domain verified via DNS, but no email found on your site. + +Please add this to your homepage: + + +This allows us to discover your email address automatically. + +Learn more: https://indieweb.org/rel-me +""" + +# Site unreachable +error_message = """ +Could not fetch your site at https://{domain} + +Please check: +- Site is accessible via HTTPS +- SSL certificate is valid +- No firewall blocking requests + +Try again once your site is accessible. +""" + +# Invalid email format in rel="me" +error_message = """ +Found rel="me" link, but email format is invalid: {email} + +Please check your rel="me" link uses valid email format: + +""" +``` + +## Alternatives Considered + +### Alternative 1: User-Provided Email Input + +**Pros**: +- Simpler implementation (no HTTP fetch, no parsing) +- Works even if site is down +- User can use private email (not public) +- Immediate (no HTTP round-trip) + +**Cons**: +- Social engineering risk (attacker tricks user into entering wrong email) +- Typo risk (user enters incorrect email) +- No self-attestation (email not on user's site) +- Not aligned with IndieWeb standards + +**Rejected**: Security risks outweigh simplicity benefits. rel="me" provides self-attestation and prevents social engineering. + +--- + +### Alternative 2: DNS TXT Record for Email + +**Pros**: +- Stronger proof of domain control (DNS) +- No website dependency +- Machine-readable format +- Fast lookups (DNS cache) + +**Cons**: +- Requires DNS configuration (more complex than HTML) +- DNS propagation delays (can be hours) +- Not user-friendly for non-technical users +- Not standard IndieWeb practice + +**Rejected**: DNS configuration is more complex than adding HTML tag. rel="me" is more aligned with IndieWeb standards. + +--- + +### Alternative 3: WebFinger Protocol + +**Pros**: +- Standard protocol (RFC 7033) +- Machine-readable format (JSON) +- Supports multiple identities +- Well-defined spec + +**Cons**: +- Requires server-side endpoint (not for static sites) +- More complex implementation +- Not common in IndieWeb ecosystem +- Overkill for email discovery + +**Rejected**: Too complex for v1.0.0 MVP. Doesn't work with static sites. rel="me" is simpler and more aligned with IndieWeb. + +--- + +### Alternative 4: Well-Known URI + +**Pros**: +- Standard approach (`/.well-known/email`) +- Simple file-based implementation +- No HTML parsing required +- Fast lookups + +**Cons**: +- Not an established standard for email +- Requires server configuration +- Not aligned with IndieWeb practices +- Duplicate effort (rel="me" already exists) + +**Rejected**: Not standard practice. rel="me" is already established in IndieWeb ecosystem. + +## References + +- IndieWeb rel="me": https://indieweb.org/rel-me +- Example Implementation: https://thesatelliteoflove.com (Phil Skents' identity page) +- HTML Link Relations (W3C): https://www.w3.org/TR/html5/links.html#linkTypes +- BeautifulSoup Documentation: https://www.crummy.com/software/BeautifulSoup/ +- RFC 5322 (Email Format): https://datatracker.ietf.org/doc/html/rfc5322 +- RFC 5321 (SMTP): https://datatracker.ietf.org/doc/html/rfc5321 +- WebFinger (RFC 7033): https://datatracker.ietf.org/doc/html/rfc7033 (alternative considered) + +## Decision History + +- 2025-11-20: Proposed (Architect) +- 2025-11-20: Accepted (Architect) +- Related to ADR-005 (Two-Factor Domain Verification) diff --git a/docs/designs/phase-2-domain-verification.md b/docs/designs/phase-2-domain-verification.md new file mode 100644 index 0000000..71cd7f3 --- /dev/null +++ b/docs/designs/phase-2-domain-verification.md @@ -0,0 +1,2559 @@ +# Phase 2 Design: Domain Verification & Authorization Endpoint + +**Date**: 2025-11-20 +**Architect**: Claude (Architect Agent) +**Status**: Ready for Implementation +**Design Version**: 1.0 + +## Overview + +### What Phase 2 Builds + +Phase 2 implements the complete two-factor domain verification flow and the IndieAuth authorization endpoint, building on Phase 1's foundational services. + +**Core Functionality**: +1. HTML fetching service to retrieve user's homepage +2. rel="me" email discovery service to parse HTML for email links +3. Domain verification service to orchestrate two-factor verification (DNS TXT + Email) +4. HTTP endpoints for verification flow +5. Authorization endpoint to start IndieAuth authentication flow + +**Connection to IndieAuth Protocol**: Phase 2 implements steps 1-7 of the IndieAuth authorization flow (see `/docs/architecture/indieauth-protocol.md` lines 165-174), completing the domain verification and authorization code generation. + +**Connection to Phase 1**: Phase 2 uses all Phase 1 services: +- Configuration (SMTP, DNS, database settings) +- Database (to store verified domains) +- In-memory storage (for authorization codes) +- Email service (to send verification codes) +- DNS service (to verify TXT records) +- Logging (structured logging throughout) + +### Authentication Security Model + +Per ADR-005 and ADR-008, Phase 2 implements two-factor domain verification: + +**Factor 1: DNS TXT Record** (proves DNS control) +- Required: `_gondulf.{domain}` TXT record = `verified` +- Verified via Phase 1 DNS service +- Consensus from multiple resolvers + +**Factor 2: Email Verification via rel="me"** (proves email control) +- Discover email from `` on user's site +- Send 6-digit code to discovered email +- User enters code to complete verification + +**Combined Security**: Attacker must compromise BOTH DNS and email to authenticate fraudulently. + +## Components + +### 1. HTML Fetching Service + +**File**: `src/gondulf/html_fetcher.py` + +**Purpose**: Fetch user's homepage over HTTPS to discover rel="me" links. + +**Public Interface**: + +```python +from typing import Optional +import requests + +class HTMLFetcherService: + """ + Fetch user's homepage over HTTPS with security safeguards. + """ + + def __init__( + self, + timeout: int = 10, + max_redirects: int = 5, + max_size: int = 5 * 1024 * 1024 # 5MB + ): + """ + Initialize HTML fetcher service. + + Args: + timeout: HTTP request timeout in seconds (default: 10) + max_redirects: Maximum redirects to follow (default: 5) + max_size: Maximum response size in bytes (default: 5MB) + """ + self.timeout = timeout + self.max_redirects = max_redirects + self.max_size = max_size + + def fetch_site(self, domain: str) -> Optional[str]: + """ + Fetch site HTML content over HTTPS. + + Args: + domain: Domain to fetch (e.g., "example.com") + + Returns: + HTML content as string, or None if fetch fails + + Raises: + No exceptions raised - all errors logged and None returned + """ +``` + +**Implementation Details**: + +```python +def fetch_site(self, domain: str) -> Optional[str]: + """Fetch site HTML content over HTTPS.""" + url = f"https://{domain}" + + try: + # Fetch with security limits + response = requests.get( + url, + timeout=self.timeout, + allow_redirects=True, + max_redirects=self.max_redirects, + verify=True, # SECURITY: Enforce SSL certificate verification + headers={ + 'User-Agent': 'Gondulf/1.0.0 IndieAuth (+https://github.com/yourusername/gondulf)' + } + ) + response.raise_for_status() + + # SECURITY: Check response size to prevent memory exhaustion + content_length = int(response.headers.get('Content-Length', 0)) + if content_length > self.max_size: + logger.warning(f"Response too large for {domain}: {content_length} bytes") + return None + + # Check actual content size (Content-Length may be absent) + if len(response.content) > self.max_size: + logger.warning(f"Response content too large for {domain}: {len(response.content)} bytes") + return None + + logger.info(f"Successfully fetched {domain}: {len(response.content)} bytes") + return response.text + + except requests.exceptions.SSLError as e: + logger.error(f"SSL verification failed for {domain}: {e}") + return None + except requests.exceptions.Timeout: + logger.error(f"Timeout fetching {domain} after {self.timeout}s") + return None + except requests.exceptions.TooManyRedirects: + logger.error(f"Too many redirects for {domain}") + return None + except requests.exceptions.HTTPError as e: + logger.error(f"HTTP error fetching {domain}: {e}") + return None + except Exception as e: + logger.error(f"Unexpected error fetching {domain}: {e}") + return None +``` + +**Dependencies**: +- `requests` library (already in pyproject.toml) +- Python standard library: typing +- Phase 1 logging configuration + +**Error Handling**: +- SSL verification failure: Log error, return None (security: reject invalid certificates) +- Timeout: Log error, return None (configurable timeout via __init__) +- HTTP errors (404, 500, etc.): Log error with status code, return None +- Size limit exceeded: Log warning, return None (prevent DoS) +- Too many redirects: Log error, return None (prevent redirect loops) +- Generic exceptions: Log error, return None (fail-safe) + +**Security Considerations**: +- HTTPS only (hardcoded in URL) +- SSL certificate verification enforced (verify=True, cannot be disabled) +- Response size limit (5MB default, configurable) +- Timeout to prevent hanging (10s default, configurable) +- Redirect limit (5 max, configurable) +- User-Agent header identifies Gondulf for server logs + +**Testing Requirements**: +- ✅ Successful HTTPS fetch returns HTML content +- ✅ SSL verification failure returns None +- ✅ Timeout returns None +- ✅ HTTP error codes (404, 500) return None +- ✅ Redirects followed (up to max_redirects) +- ✅ Too many redirects returns None +- ✅ Content-Length exceeds max_size returns None +- ✅ Actual content exceeds max_size returns None +- ✅ Custom User-Agent sent in request + +--- + +### 2. rel="me" Email Discovery Service + +**File**: `src/gondulf/relme.py` + +**Purpose**: Parse HTML to discover email addresses from rel="me" links following IndieWeb standards. + +**Public Interface**: + +```python +from typing import Optional +from bs4 import BeautifulSoup +import re + +class RelMeDiscoveryService: + """ + Discover email addresses from rel="me" links in HTML. + + Follows IndieWeb rel="me" standard: https://indieweb.org/rel-me + """ + + def discover_email(self, html_content: str) -> Optional[str]: + """ + Parse HTML and discover email from rel="me" link. + + Args: + html_content: HTML content as string + + Returns: + Email address or None if not found + + Raises: + No exceptions raised - all errors logged and None returned + """ + + def validate_email_format(self, email: str) -> bool: + """ + Validate email address format (RFC 5322 simplified). + + Args: + email: Email address to validate + + Returns: + True if valid format, False otherwise + """ +``` + +**Implementation Details**: + +```python +def discover_email(self, html_content: str) -> Optional[str]: + """Parse HTML and discover email from rel='me' link.""" + try: + # Parse HTML (BeautifulSoup handles malformed HTML gracefully) + soup = BeautifulSoup(html_content, 'html.parser') + + # Find all rel="me" links - both and tags + # Case-insensitive matching via BeautifulSoup + me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + + # Look for mailto: links + for link in me_links: + href = link.get('href', '') + if href.startswith('mailto:'): + # Extract email from mailto: URL + email = href.replace('mailto:', '').strip() + + # Remove query parameters if present (e.g., mailto:user@example.com?subject=Hello) + if '?' in email: + email = email.split('?')[0] + + # Validate email format + if self.validate_email_format(email): + logger.info(f"Discovered email via rel='me': {email[:3]}***@{email.split('@')[1]}") + return email + else: + logger.warning(f"Found rel='me' mailto link with invalid email format: {email}") + + logger.warning("No rel='me' mailto: link found in HTML") + return None + + except Exception as e: + logger.error(f"Failed to parse HTML for rel='me' links: {e}") + return None + +def validate_email_format(self, email: str) -> bool: + """Validate email address format (RFC 5322 simplified).""" + # Basic format validation + email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' + + if not re.match(email_regex, email): + return False + + # Length check (RFC 5321 maximum) + if len(email) > 254: + return False + + # Must have exactly one @ + if email.count('@') != 1: + return False + + # Domain must have at least one dot + local, domain = email.split('@') + if '.' not in domain: + return False + + return True +``` + +**Dependencies**: +- `beautifulsoup4>=4.12.0` (NEW - add to pyproject.toml) +- `html.parser` (Python standard library, used by BeautifulSoup) +- `re` (Python standard library) +- Phase 1 logging configuration + +**Error Handling**: +- Malformed HTML: BeautifulSoup handles gracefully, continues parsing +- Missing rel="me" links: Log warning, return None +- Invalid email format in link: Log warning, skip link, continue searching +- Multiple rel="me" mailto links: Return first valid one +- Empty href attribute: Skip link, continue searching +- Exception during parsing: Log error, return None + +**Security Considerations**: +- No script execution: BeautifulSoup only extracts attributes, never executes JavaScript +- Email validation: Strict format checking prevents injection +- Link extraction only: No rendering or evaluation of HTML +- Partial masking in logs: Only log first 3 chars of email (privacy) + +**Testing Requirements**: +- ✅ Discovery from `` tag +- ✅ Discovery from `` tag +- ✅ Multiple rel="me" links: select first mailto +- ✅ Malformed HTML handled gracefully +- ✅ Missing rel="me" links returns None +- ✅ Invalid email format in link returns None (but logs warning) +- ✅ Empty href returns None +- ✅ Non-mailto rel="me" links ignored (e.g., https:// links) +- ✅ mailto with query parameters (e.g., ?subject=Hi) strips params +- ✅ Email validation: valid formats accepted +- ✅ Email validation: invalid formats rejected (no @, no domain, too long, etc.) + +--- + +### 3. Domain Verification Service + +**File**: `src/gondulf/domain_verification.py` + +**Purpose**: Orchestrate two-factor domain verification (DNS TXT + Email via rel="me"). + +**Public Interface**: + +```python +from typing import Tuple, Optional +from .dns import DNSService +from .html_fetcher import HTMLFetcherService +from .relme import RelMeDiscoveryService +from .email import EmailService +from .storage import CodeStorage +from .database.connection import DatabaseConnection +import secrets + +class DomainVerificationService: + """ + Two-factor domain verification service. + + Verifies domain ownership through: + 1. DNS TXT record verification (_gondulf.{domain} = verified) + 2. Email verification via rel="me" discovery + """ + + def __init__( + self, + dns_service: DNSService, + html_fetcher: HTMLFetcherService, + relme_discovery: RelMeDiscoveryService, + email_service: EmailService, + code_storage: CodeStorage, + database: DatabaseConnection, + code_ttl: int = 900 # 15 minutes + ): + """ + Initialize domain verification service. + + Args: + dns_service: DNS service for TXT record verification + html_fetcher: HTML fetcher service + relme_discovery: rel="me" email discovery service + email_service: Email service for sending codes + code_storage: In-memory storage for verification codes + database: Database connection for storing verified domains + code_ttl: Verification code TTL in seconds (default: 900 = 15 min) + """ + + def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]: + """ + Start domain verification process. + + Steps: + 1. Verify DNS TXT record exists + 2. Fetch user's homepage + 3. Discover email from rel="me" link + 4. Generate and send verification code + + Args: + domain: Domain to verify (e.g., "example.com") + + Returns: + Tuple of (success, discovered_email_masked, error_message) + - success: True if code sent, False if verification cannot start + - discovered_email_masked: Email with partial masking (e.g., "u***@example.com") + - error_message: Error description if success=False, None otherwise + """ + + def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, Optional[str], Optional[str]]: + """ + Verify submitted code. + + Args: + email: Email address (discovered from rel="me") + submitted_code: 6-digit code entered by user + + Returns: + Tuple of (success, domain, error_message) + - success: True if code valid, False otherwise + - domain: User's verified domain if success=True + - error_message: Error description if success=False + """ + + def is_domain_verified(self, domain: str) -> bool: + """ + Check if domain is already verified (cached in database). + + Args: + domain: Domain to check + + Returns: + True if domain previously verified, False otherwise + """ +``` + +**Implementation Details**: + +```python +def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]: + """Start domain verification process.""" + logger.info(f"Starting domain verification: {domain}") + + # Step 1: Verify DNS TXT record (first factor) + logger.debug(f"Verifying DNS TXT record for {domain}") + dns_verified = self.dns_service.verify_txt_record(domain, "verified") + + if not dns_verified: + error = ( + f"DNS verification failed. TXT record not found for _gondulf.{domain}. " + f"Please add: Type=TXT, Name=_gondulf.{domain}, Value=verified" + ) + logger.warning(f"DNS verification failed: {domain}") + return False, None, error + + logger.info(f"DNS TXT record verified: {domain}") + + # Step 2: Fetch site homepage + logger.debug(f"Fetching homepage for {domain}") + html = self.html_fetcher.fetch_site(domain) + + if html is None: + error = ( + f"Could not fetch site at https://{domain}. " + f"Please ensure site is accessible via HTTPS with valid SSL certificate." + ) + logger.warning(f"Site fetch failed: {domain}") + return False, None, error + + logger.info(f"Successfully fetched homepage: {domain}") + + # Step 3: Discover email from rel="me" (second factor discovery) + logger.debug(f"Discovering email via rel='me' for {domain}") + email = self.relme_discovery.discover_email(html) + + if email is None: + error = ( + 'No rel="me" mailto: link found on homepage. ' + f'Please add to https://{domain}: ' + '' + ) + logger.warning(f"rel='me' discovery failed: {domain}") + return False, None, error + + logger.info(f"Email discovered via rel='me' for {domain}: {email[:3]}***") + + # Step 4: Check rate limiting + if self._is_rate_limited(domain): + error = ( + f"Rate limit exceeded for {domain}. " + f"Please wait before requesting another verification code." + ) + logger.warning(f"Rate limit exceeded: {domain}") + return False, email, error + + # Step 5: Generate verification code + code = self._generate_code() + + # Step 6: Store code with metadata + self.code_storage.store(email, code, ttl=self.code_ttl) + + # Store metadata for rate limiting and domain association + self._store_code_metadata(email, domain) + + logger.debug(f"Verification code generated and stored for {email[:3]}***") + + # Step 7: Send verification email (second factor verification) + logger.debug(f"Sending verification email to {email[:3]}***") + email_sent = self.email_service.send_verification_email(email, code) + + if not email_sent: + # Clean up stored code if email fails + self.code_storage.delete(email) + error = ( + f"Failed to send verification code to {email}. " + f"Please check email address in rel='me' link and try again." + ) + logger.error(f"Email send failed: {email[:3]}***") + return False, email, error + + logger.info(f"Verification code sent successfully to {email[:3]}***") + + # Mask email for display: u***@example.com + email_masked = self._mask_email(email) + + return True, email_masked, None + +def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, Optional[str], Optional[str]]: + """Verify submitted code.""" + logger.info(f"Verifying code for {email[:3]}***") + + # Retrieve stored code + stored_code = self.code_storage.get(email) + + if stored_code is None: + logger.warning(f"No verification code found for {email[:3]}***") + return False, None, "No verification code found. Please request a new code." + + # Get code metadata + metadata = self._get_code_metadata(email) + if metadata is None: + logger.error(f"Code found but metadata missing for {email[:3]}***") + return False, None, "Verification error. Please request a new code." + + domain = metadata['domain'] + attempts = metadata.get('attempts', 0) + + # Check attempt limit (prevent brute force) + if attempts >= 3: + logger.warning(f"Too many attempts for {email[:3]}***") + self.code_storage.delete(email) + self._delete_code_metadata(email) + return False, None, "Too many attempts. Please request a new code." + + # Increment attempt counter + self._increment_attempts(email) + + # Verify code using constant-time comparison (SECURITY: prevent timing attacks) + if not secrets.compare_digest(submitted_code, stored_code): + logger.warning(f"Invalid code submitted for {email[:3]}***") + return False, None, f"Invalid code. {3 - attempts - 1} attempts remaining." + + # Code is valid - clean up and mark domain as verified + logger.info(f"Code verified successfully for {domain}") + + self.code_storage.delete(email) + self._delete_code_metadata(email) + + # Store verified domain in database + self._store_verified_domain(domain) + + return True, domain, None + +def is_domain_verified(self, domain: str) -> bool: + """Check if domain already verified.""" + with self.database.get_connection() as conn: + result = conn.execute( + "SELECT verified FROM domains WHERE domain = ?", + (domain,) + ).fetchone() + + if result and result['verified']: + logger.debug(f"Domain already verified: {domain}") + return True + + return False + +def _generate_code(self) -> str: + """Generate 6-digit verification code.""" + return ''.join(secrets.choice('0123456789') for _ in range(6)) + +def _mask_email(self, email: str) -> str: + """Mask email for display: u***@example.com""" + local, domain = email.split('@') + if len(local) <= 1: + return f"{local[0]}***@{domain}" + return f"{local[0]}***@{domain}" + +def _is_rate_limited(self, domain: str) -> bool: + """ + Check if domain is rate limited. + + Rate limit: Max 3 codes per domain per hour. + """ + # TODO: Implement rate limiting using code metadata + # For Phase 2, we'll implement simple in-memory tracking + # Future: Use Redis for distributed rate limiting + return False # Placeholder - implement in actual code + +def _store_code_metadata(self, email: str, domain: str) -> None: + """Store code metadata for rate limiting and domain association.""" + # TODO: Implement metadata storage + # Store: email -> {domain, created_at, attempts} + pass + +def _get_code_metadata(self, email: str) -> Optional[dict]: + """Retrieve code metadata.""" + # TODO: Implement metadata retrieval + # Return: {domain, created_at, attempts} + return {'domain': 'example.com', 'attempts': 0} # Placeholder + +def _delete_code_metadata(self, email: str) -> None: + """Delete code metadata.""" + # TODO: Implement metadata deletion + pass + +def _increment_attempts(self, email: str) -> None: + """Increment attempt counter for email.""" + # TODO: Implement attempt increment + pass + +def _store_verified_domain(self, domain: str) -> None: + """Store verified domain in database.""" + from datetime import datetime + + with self.database.get_connection() as conn: + conn.execute( + """ + INSERT OR REPLACE INTO domains (domain, verification_method, verified, verified_at, last_dns_check) + VALUES (?, ?, ?, ?, ?) + """, + (domain, 'two_factor', True, datetime.utcnow(), datetime.utcnow()) + ) + conn.commit() + + logger.info(f"Domain verification stored in database: {domain}") +``` + +**Dependencies**: +- All Phase 1 services (DNS, Email, Storage, Database) +- HTML fetcher service (Phase 2) +- rel="me" discovery service (Phase 2) +- Python standard library: secrets, datetime + +**Error Handling**: +- DNS verification failure: Return error with setup instructions +- Site fetch failure: Return error with troubleshooting steps +- rel="me" discovery failure: Return error with HTML example +- Email send failure: Return error, clean up stored code +- Code not found: Return error, suggest requesting new code +- Code expired: Handled by CodeStorage TTL +- Too many attempts: Return error, invalidate code +- Invalid code: Return error with remaining attempts +- Rate limit exceeded: Return error, suggest waiting + +**Security Considerations**: +- Two-factor verification: Both DNS and email required +- Constant-time code comparison: Prevent timing attacks (secrets.compare_digest) +- Rate limiting: Max 3 codes per domain per hour (prevents abuse) +- Attempt limiting: Max 3 code submission attempts (prevents brute force) +- Single-use codes: Deleted after successful verification +- Email masking in logs: Only log partial email (privacy) +- No email storage: Email used only during verification, never persisted + +**Testing Requirements**: +- ✅ Full verification flow: DNS → rel="me" → email → code verification +- ✅ DNS verification failure blocks flow +- ✅ Site fetch failure blocks flow +- ✅ rel="me" discovery failure blocks flow +- ✅ Email send failure cleans up stored code +- ✅ Code verification success stores domain in database +- ✅ Code verification failure decrements remaining attempts +- ✅ Too many attempts invalidates code +- ✅ Invalid code returns error with attempts remaining +- ✅ Code expiration handled by storage layer +- ✅ Rate limiting prevents excessive code requests +- ✅ Already verified domain check works +- ✅ Email masking works correctly + +--- + +### 4. Domain Verification Endpoints + +**File**: `src/gondulf/routers/verification.py` + +**Purpose**: HTTP API endpoints for user interaction during verification flow. + +**Public Interface**: + +```python +from fastapi import APIRouter, HTTPException, Depends +from pydantic import BaseModel, Field +from typing import Optional + +router = APIRouter(prefix="/api/verify", tags=["verification"]) + +# Request/Response Models +class VerificationStartRequest(BaseModel): + """Request to start domain verification.""" + domain: str = Field( + ..., + min_length=3, + max_length=253, + description="Domain to verify (e.g., 'example.com')" + ) + +class VerificationStartResponse(BaseModel): + """Response from starting verification.""" + success: bool + email_masked: Optional[str] = Field(None, description="Partially masked email (e.g., 'u***@example.com')") + error: Optional[str] = Field(None, description="Error message if success=False") + +class VerificationCodeRequest(BaseModel): + """Request to verify code.""" + email: str = Field(..., description="Email address discovered from rel='me'") + code: str = Field(..., min_length=6, max_length=6, pattern="^[0-9]{6}$", description="6-digit verification code") + +class VerificationCodeResponse(BaseModel): + """Response from code verification.""" + success: bool + domain: Optional[str] = Field(None, description="Verified domain if success=True") + error: Optional[str] = Field(None, description="Error message if success=False") + +# Endpoints +@router.post("/start", response_model=VerificationStartResponse) +async def start_verification( + request: VerificationStartRequest, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> VerificationStartResponse: + """ + Start domain verification process. + + Steps: + 1. Verify DNS TXT record exists + 2. Discover email from rel="me" link + 3. Send verification code to discovered email + + Returns masked email on success, error message on failure. + """ + +@router.post("/code", response_model=VerificationCodeResponse) +async def verify_code( + request: VerificationCodeRequest, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> VerificationCodeResponse: + """ + Verify submitted code. + + Returns verified domain on success, error message on failure. + """ +``` + +**Implementation Details**: + +```python +@router.post("/start", response_model=VerificationStartResponse) +async def start_verification( + request: VerificationStartRequest, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> VerificationStartResponse: + """Start domain verification process.""" + logger.info(f"Verification start request: {request.domain}") + + # Normalize domain (lowercase, remove trailing slash) + domain = request.domain.lower().rstrip('/') + + # Remove protocol if present + if domain.startswith('http://') or domain.startswith('https://'): + domain = domain.split('://', 1)[1] + + # Remove path if present + if '/' in domain: + domain = domain.split('/')[0] + + # Validate domain format (basic validation) + if not domain or '.' not in domain: + logger.warning(f"Invalid domain format: {request.domain}") + return VerificationStartResponse( + success=False, + email_masked=None, + error="Invalid domain format. Please provide a valid domain (e.g., 'example.com')." + ) + + # Start verification + success, email_masked, error = domain_verification.start_verification(domain) + + if not success: + logger.warning(f"Verification start failed for {domain}: {error}") + return VerificationStartResponse( + success=False, + email_masked=email_masked, + error=error + ) + + logger.info(f"Verification started successfully for {domain}") + return VerificationStartResponse( + success=True, + email_masked=email_masked, + error=None + ) + +@router.post("/code", response_model=VerificationCodeResponse) +async def verify_code( + request: VerificationCodeRequest, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> VerificationCodeResponse: + """Verify submitted code.""" + logger.info(f"Code verification request for email: {request.email[:3]}***") + + # Verify code + success, domain, error = domain_verification.verify_code(request.email, request.code) + + if not success: + logger.warning(f"Code verification failed for {request.email[:3]}***: {error}") + return VerificationCodeResponse( + success=False, + domain=None, + error=error + ) + + logger.info(f"Code verified successfully for domain: {domain}") + return VerificationCodeResponse( + success=True, + domain=domain, + error=None + ) +``` + +**Dependencies**: +- FastAPI router and dependency injection +- Pydantic models for request/response validation +- Domain verification service (injected via Depends) +- Phase 1 logging configuration + +**Error Handling**: +- Invalid domain format: Return 200 with success=False, descriptive error +- Pydantic validation errors: Automatic 422 response with validation details +- Service errors: Propagated via success=False in response +- All errors logged at WARNING level +- No 500 errors expected (all errors handled gracefully) + +**Security Considerations**: +- Input validation: Pydantic models enforce constraints +- Domain normalization: Prevent URL injection +- No authentication required: Public endpoints (verification is the authentication) +- Rate limiting: Handled by DomainVerificationService (not endpoint level) +- Email not validated at endpoint level: Service handles validation + +**Testing Requirements**: +- ✅ POST /api/verify/start with valid domain returns success +- ✅ POST /api/verify/start with invalid domain format returns error +- ✅ POST /api/verify/start with DNS failure returns error +- ✅ POST /api/verify/start with rel="me" failure returns error +- ✅ POST /api/verify/start with email send failure returns error +- ✅ POST /api/verify/code with valid code returns domain +- ✅ POST /api/verify/code with invalid code returns error +- ✅ POST /api/verify/code with expired code returns error +- ✅ POST /api/verify/code with missing code returns error +- ✅ POST /api/verify/code with too many attempts returns error +- ✅ Pydantic validation errors return 422 + +--- + +### 5. Authorization Endpoint + +**File**: `src/gondulf/routers/authorization.py` + +**Purpose**: Implement IndieAuth authorization endpoint (`/authorize`) per W3C spec. + +**Public Interface**: + +```python +from fastapi import APIRouter, Request, HTTPException, Depends +from fastapi.responses import RedirectResponse, HTMLResponse +from pydantic import BaseModel, HttpUrl, Field +from typing import Optional, Literal + +router = APIRouter(tags=["indieauth"]) + +# Request Models +class AuthorizeRequest(BaseModel): + """ + IndieAuth authorization request parameters. + + Per W3C IndieAuth specification (Section 5.1): + https://www.w3.org/TR/indieauth/#authorization-request + """ + me: HttpUrl = Field(..., description="User's profile URL (domain identity)") + client_id: HttpUrl = Field(..., description="Client application URL") + redirect_uri: HttpUrl = Field(..., description="Where to redirect after authorization") + state: str = Field(..., min_length=1, max_length=512, description="CSRF protection token") + response_type: Literal["code"] = Field(..., description="Must be 'code' for authorization code flow") + scope: Optional[str] = Field(None, description="Requested scopes (ignored in v1.0.0)") + code_challenge: Optional[str] = Field(None, description="PKCE challenge (not supported in v1.0.0)") + code_challenge_method: Optional[str] = Field(None, description="PKCE method (not supported in v1.0.0)") + +# Endpoints +@router.get("/authorize") +async def authorize( + request: Request, + me: str, + client_id: str, + redirect_uri: str, + state: str, + response_type: str, + scope: Optional[str] = None, + code_challenge: Optional[str] = None, + code_challenge_method: Optional[str] = None, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> HTMLResponse: + """ + IndieAuth authorization endpoint. + + Per W3C IndieAuth specification: + https://www.w3.org/TR/indieauth/#authorization-request + + Flow: + 1. Validate all parameters + 2. Check if domain already verified (skip verification if cached) + 3. If not verified, initiate two-factor verification flow + 4. Display consent screen with client info + 5. On approval, generate authorization code + 6. Redirect to client with code + state + """ +``` + +**Implementation Details** (High-Level - Full implementation too long for this doc): + +```python +@router.get("/authorize") +async def authorize( + request: Request, + me: str, + client_id: str, + redirect_uri: str, + state: str, + response_type: str, + # ... other parameters +) -> HTMLResponse: + """IndieAuth authorization endpoint.""" + + # STEP 1: Validate response_type + if response_type != "code": + # Return error (redirect if possible) + return _error_response( + redirect_uri=redirect_uri, + state=state, + error="unsupported_response_type", + description="Only response_type=code is supported" + ) + + # STEP 2: Validate and normalize 'me' parameter + me_normalized = _validate_and_normalize_me(me) + if me_normalized is None: + return _error_response( + redirect_uri=redirect_uri, + state=state, + error="invalid_request", + description="Invalid 'me' parameter format" + ) + + # STEP 3: Validate client_id + client_valid = _validate_client_id(client_id) + if not client_valid: + return _error_response( + redirect_uri=redirect_uri, + state=state, + error="invalid_client", + description="Invalid client_id" + ) + + # STEP 4: Validate redirect_uri + redirect_valid = _validate_redirect_uri(redirect_uri, client_id) + if not redirect_valid: + # SECURITY: Cannot redirect to invalid URI - display error page + return _error_page("Invalid redirect_uri") + + # STEP 5: Check if domain already verified + domain = _extract_domain_from_me(me_normalized) + + if domain_verification.is_domain_verified(domain): + # Skip verification, go directly to consent + logger.info(f"Domain already verified: {domain}") + return await _show_consent_screen( + me=me_normalized, + client_id=client_id, + redirect_uri=redirect_uri, + state=state + ) + + # STEP 6: Domain not verified - start verification flow + logger.info(f"Starting verification for new domain: {domain}") + + success, email_masked, error = domain_verification.start_verification(domain) + + if not success: + # Verification failed - show error with instructions + return _verification_error_page(domain, error) + + # STEP 7: Show code entry form + return _code_entry_page( + domain=domain, + email_masked=email_masked, + me=me_normalized, + client_id=client_id, + redirect_uri=redirect_uri, + state=state + ) + +# Additional endpoints for verification flow +@router.post("/authorize/verify-code") +async def verify_code_and_consent( + request: Request, + email: str, + code: str, + me: str, + client_id: str, + redirect_uri: str, + state: str, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> HTMLResponse: + """ + Verify code and show consent screen. + + Called when user submits verification code during authorization flow. + """ + # Verify code + success, domain, error = domain_verification.verify_code(email, code) + + if not success: + # Code invalid - show error, allow retry + return _code_entry_page_with_error( + domain=_extract_domain_from_me(me), + email_masked=_mask_email(email), + error=error, + me=me, + client_id=client_id, + redirect_uri=redirect_uri, + state=state + ) + + # Code valid - show consent screen + return await _show_consent_screen( + me=me, + client_id=client_id, + redirect_uri=redirect_uri, + state=state + ) + +@router.post("/authorize/consent") +async def handle_consent( + request: Request, + action: Literal["approve", "deny"], + me: str, + client_id: str, + redirect_uri: str, + state: str, + code_storage: CodeStorage = Depends(get_code_storage) +) -> RedirectResponse: + """ + Handle user consent decision. + + Called when user approves or denies authorization. + """ + if action == "deny": + # User denied - redirect with error + return RedirectResponse( + url=f"{redirect_uri}?error=access_denied&error_description=User denied authorization&state={state}", + status_code=302 + ) + + # User approved - generate authorization code + auth_code = _generate_authorization_code() + + # Store code in memory with metadata + code_storage.store(auth_code, { + 'me': me, + 'client_id': client_id, + 'redirect_uri': redirect_uri, + 'state': state, + 'created_at': datetime.utcnow() + }, ttl=600) # 10 minutes + + logger.info(f"Authorization code generated for {me} / {client_id}") + + # Redirect to client with code + state + return RedirectResponse( + url=f"{redirect_uri}?code={auth_code}&state={state}", + status_code=302 + ) + +# Helper functions (implementations not shown for brevity) +def _validate_and_normalize_me(me: str) -> Optional[str]: + """Validate and normalize 'me' parameter per IndieAuth spec.""" + pass + +def _validate_client_id(client_id: str) -> bool: + """Validate client_id is a valid URL.""" + pass + +def _validate_redirect_uri(redirect_uri: str, client_id: str) -> bool: + """Validate redirect_uri against client_id.""" + pass + +def _extract_domain_from_me(me: str) -> str: + """Extract domain from 'me' URL.""" + pass + +async def _show_consent_screen(...) -> HTMLResponse: + """Render consent screen HTML.""" + pass + +def _code_entry_page(...) -> HTMLResponse: + """Render code entry page HTML.""" + pass + +def _error_response(...) -> RedirectResponse: + """Generate OAuth 2.0 error redirect.""" + pass + +def _generate_authorization_code() -> str: + """Generate cryptographically secure authorization code.""" + return secrets.token_urlsafe(32) # 256 bits +``` + +**Dependencies**: +- FastAPI router, Request, Response types +- Pydantic models for validation +- Domain verification service (Phase 2) +- Code storage (Phase 1) +- HTML templates (new - Jinja2) +- Python standard library: secrets, datetime + +**Error Handling**: +- Invalid response_type: Redirect with `unsupported_response_type` error +- Invalid me parameter: Redirect with `invalid_request` error +- Invalid client_id: Redirect with `invalid_client` error +- Invalid redirect_uri: Display error page (cannot redirect) +- DNS verification failure: Display error page with setup instructions +- rel="me" discovery failure: Display error page with HTML example +- Email send failure: Display error page with troubleshooting +- Code verification failure: Display code entry page with error, allow retry +- User denies consent: Redirect with `access_denied` error +- All errors follow OAuth 2.0 error response format + +**Security Considerations**: +- HTTPS only: Enforced by middleware (production) +- redirect_uri validation: Prevent open redirect attacks +- State parameter: Passed through, client validates (CSRF protection) +- Authorization code: Cryptographically secure (256 bits) +- Code single-use: Enforced by token endpoint (Phase 3) +- Code expiration: 10 minutes TTL +- Domain verification: Two-factor required before code generation +- No client secrets: All clients are public per IndieAuth spec + +**Testing Requirements**: +- ✅ GET /authorize with valid parameters shows verification or consent +- ✅ GET /authorize with invalid response_type returns error +- ✅ GET /authorize with invalid me parameter returns error +- ✅ GET /authorize with invalid client_id returns error +- ✅ GET /authorize with invalid redirect_uri shows error page +- ✅ GET /authorize with already verified domain skips to consent +- ✅ POST /authorize/verify-code with valid code shows consent +- ✅ POST /authorize/verify-code with invalid code shows error +- ✅ POST /authorize/consent with action=approve generates code and redirects +- ✅ POST /authorize/consent with action=deny redirects with access_denied +- ✅ Authorization code stored in memory with correct metadata +- ✅ Authorization code expires after 10 minutes +- ✅ State parameter passed through all steps + +--- + +## Data Flow + +### Complete Two-Factor Verification Flow + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ User / Client Application │ +└───────────────────────────────┬─────────────────────────────────┘ + │ + │ GET /authorize?me=example.com&... + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Authorization Endpoint │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ 1. Validate parameters (me, client_id, redirect_uri, │ │ +│ │ state, response_type) │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────────────▼───────────────────────────────┐ │ +│ │ 2. Check if domain already verified in database │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌────────┴────────┐ │ +│ │ │ │ +│ │ Verified? │ │ +│ │ │ │ +│ ┌─────────┴─────No─────────┴─────────┐ │ +│ │ │ │ +│ │ YES │ NO │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────────────┐ │ +│ │ Skip to Consent │ │ Start Verification Flow │ │ +│ │ (Step 9) │ │ (Step 3) │ │ +│ └──────────────────┘ └─────────┬────────────────┘ │ +│ │ │ +└───────────────────────────────────────────────┼──────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Domain Verification Service (Two-Factor) │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ 3. Verify DNS TXT Record (First Factor) │ │ +│ │ Query: _gondulf.example.com TXT │ │ +│ │ Expected: "verified" │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌────────┴────────┐ │ +│ │ TXT found? │ │ +│ ┌─────────┴─────No─────────┴─────────┐ │ +│ │ YES │ NO │ +│ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────────────┐ │ +│ │ Continue to │ │ FAIL: Display error │ │ +│ │ Step 4 │ │ "Add DNS TXT record" │ │ +│ └─────────┬────────┘ └──────────────────────────┘ │ +│ │ │ +│ ┌─────────▼────────────────────────────────────────────────┐ │ +│ │ 4. Fetch User's Homepage via HTTPS │ │ +│ │ URL: https://example.com │ │ +│ │ Timeout: 10s, Max size: 5MB, Verify SSL │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌────────┴────────┐ │ +│ │ Fetch success? │ │ +│ ┌─────────┴─────No─────────┴─────────┐ │ +│ │ YES │ NO │ +│ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────────────┐ │ +│ │ Continue to │ │ FAIL: Display error │ │ +│ │ Step 5 │ │ "Site unreachable" │ │ +│ └─────────┬────────┘ └──────────────────────────┘ │ +│ │ │ +│ ┌─────────▼────────────────────────────────────────────────┐ │ +│ │ 5. Discover Email via rel="me" (Second Factor Discovery)│ │ +│ │ Parse HTML for: │ │ +│ │ Extract and validate email format │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌────────┴────────┐ │ +│ │ Email found? │ │ +│ ┌─────────┴─────No─────────┴─────────┐ │ +│ │ YES │ NO │ +│ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────────────┐ │ +│ │ Continue to │ │ FAIL: Display error │ │ +│ │ Step 6 │ │ "Add rel='me' link" │ │ +│ └─────────┬────────┘ └──────────────────────────┘ │ +│ │ │ +│ ┌─────────▼────────────────────────────────────────────────┐ │ +│ │ 6. Generate and Send Verification Code │ │ +│ │ (Second Factor Verification) │ │ +│ │ - Generate 6-digit code (cryptographically secure) │ │ +│ │ - Store code in memory (TTL: 15 minutes) │ │ +│ │ - Send code to discovered email via SMTP │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +└─────────────────────────────┼────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Display Code Entry Form │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ "Verification code sent to u***@example.com" │ │ +│ │ [Enter 6-digit code: ______] │ │ +│ │ [Submit] │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +└─────────────────────────────┼────────────────────────────────────┘ + │ + │ POST /authorize/verify-code + │ {email, code, me, client_id, ...} + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Domain Verification Service (Continued) │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ 7. Verify Submitted Code │ │ +│ │ - Retrieve stored code from memory │ │ +│ │ - Check expiration (15 min TTL) │ │ +│ │ - Check attempts (max 3) │ │ +│ │ - Constant-time compare submitted vs stored │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌────────┴────────┐ │ +│ │ Code valid? │ │ +│ ┌─────────┴─────No─────────┴─────────┐ │ +│ │ YES │ NO │ +│ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────────────┐ │ +│ │ Store verified │ │ Show error, allow retry │ │ +│ │ domain in DB │ │ (if attempts remaining) │ │ +│ └─────────┬────────┘ └──────────────────────────┘ │ +│ │ │ +│ ┌─────────▼────────────────────────────────────────────────┐ │ +│ │ 8. Domain Verified (Two-Factor Complete) │ │ +│ │ - DNS TXT verified ✓ │ │ +│ │ - Email verified ✓ │ │ +│ │ - Store in database: verification_method='two_factor' │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +└─────────────────────────────┼────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Display Consent Screen │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ "Sign in to [App Name] as example.com" │ │ +│ │ │ │ +│ │ Client: https://client.example.com │ │ +│ │ Redirect: https://client.example.com/callback │ │ +│ │ │ │ +│ │ [Approve] [Deny] │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +└─────────────────────────────┼────────────────────────────────────┘ + │ + │ POST /authorize/consent + │ {action: "approve", ...} + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Authorization Endpoint (Continued) │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ 9. Generate Authorization Code │ │ +│ │ - Generate cryptographically secure code (256 bits) │ │ +│ │ - Store in memory with metadata: │ │ +│ │ • me (user's domain) │ │ +│ │ • client_id │ │ +│ │ • redirect_uri │ │ +│ │ • state │ │ +│ │ • TTL: 10 minutes │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────────────▼───────────────────────────────┐ │ +│ │ 10. Redirect to Client with Code │ │ +│ │ {redirect_uri}?code={code}&state={state} │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +└─────────────────────────────┼────────────────────────────────────┘ + │ + │ HTTP 302 Redirect + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Client Application │ +│ • Receives authorization code │ +│ • Validates state parameter (CSRF protection) │ +│ • Exchanges code for token (Phase 3: Token Endpoint) │ +└─────────────────────────────────────────────────────────────────┘ +``` + +### State Transitions + +**Domain Verification States**: +1. **Unverified**: Domain never seen before +2. **DNS Verified**: TXT record confirmed +3. **Email Discovered**: rel="me" link found +4. **Code Sent**: Verification code sent to email +5. **Fully Verified**: Code verified, stored in database +6. **Cached**: Domain verification cached (skip steps 1-5 on future auth) + +**Authorization Flow States**: +1. **Request Received**: Parameters validated +2. **Domain Check**: Checking if domain verified +3. **Verification In Progress**: User entering code +4. **Consent Pending**: User viewing consent screen +5. **Approved**: User approved, code generated +6. **Denied**: User denied, error redirect +7. **Complete**: Redirected to client with code + +### Error Paths + +**DNS Verification Failure**: +``` +/authorize → Validate params → Check DNS TXT → [NOT FOUND] + → Display error page with instructions + → User adds TXT record, clicks "Retry" + → Loop back to Check DNS TXT +``` + +**rel="me" Discovery Failure**: +``` +/authorize → DNS verified → Fetch site → Discover email → [NOT FOUND] + → Display error page with HTML example + → User adds , clicks "Retry" + → Loop back to Fetch site +``` + +**Email Send Failure**: +``` +/authorize → DNS + rel="me" OK → Send email → [SMTP ERROR] + → Display error page with troubleshooting + → User checks SMTP config, clicks "Retry" + → Loop back to Send email +``` + +**Invalid Code**: +``` +/authorize/verify-code → Verify code → [INVALID] + → Display code entry form with error + → "Invalid code. 2 attempts remaining." + → User enters code again + → Loop back to Verify code +``` + +**Rate Limit Exceeded**: +``` +/authorize → Start verification → Check rate limit → [EXCEEDED] + → Display error: "Too many attempts, wait 1 hour" + → User waits, tries again later +``` + +## API Endpoints + +### POST /api/verify/start + +**Purpose**: Start domain verification process. + +**Request**: +```json +{ + "domain": "example.com" +} +``` + +**Success Response** (200 OK): +```json +{ + "success": true, + "email_masked": "u***@example.com", + "error": null +} +``` + +**Error Response** (200 OK with success=false): +```json +{ + "success": false, + "email_masked": null, + "error": "DNS TXT record not found for _gondulf.example.com. Please add: Type=TXT, Name=_gondulf.example.com, Value=verified" +} +``` + +**Validation Errors** (422 Unprocessable Entity): +```json +{ + "detail": [ + { + "loc": ["body", "domain"], + "msg": "field required", + "type": "value_error.missing" + } + ] +} +``` + +**Rate Limiting**: +- Max 3 requests per domain per hour +- Enforced by DomainVerificationService + +**Authentication**: None required (public endpoint) + +--- + +### POST /api/verify/code + +**Purpose**: Verify submitted 6-digit code. + +**Request**: +```json +{ + "email": "user@example.com", + "code": "123456" +} +``` + +**Success Response** (200 OK): +```json +{ + "success": true, + "domain": "example.com", + "error": null +} +``` + +**Error Response** (200 OK with success=false): +```json +{ + "success": false, + "domain": null, + "error": "Invalid code. 2 attempts remaining." +} +``` + +**Validation Errors** (422 Unprocessable Entity): +```json +{ + "detail": [ + { + "loc": ["body", "code"], + "msg": "string does not match regex \"^[0-9]{6}$\"", + "type": "value_error.str.regex" + } + ] +} +``` + +**Rate Limiting**: +- Max 3 attempts per email per code +- Enforced by code verification logic + +**Authentication**: None required (code is the authentication) + +--- + +### GET /authorize + +**Purpose**: IndieAuth authorization endpoint. + +**Query Parameters**: +- `me` (required): User's profile URL (e.g., "https://example.com") +- `client_id` (required): Client application URL +- `redirect_uri` (required): Where to redirect after authorization +- `state` (required): CSRF protection token +- `response_type` (required): Must be "code" +- `scope` (optional): Requested scopes (ignored in v1.0.0) +- `code_challenge` (optional): PKCE challenge (not supported in v1.0.0) +- `code_challenge_method` (optional): PKCE method (not supported in v1.0.0) + +**Success Response**: HTML page (verification form or consent screen) + +**Error Redirect** (302 Found): +``` +{redirect_uri}?error=invalid_request&error_description=Invalid+me+parameter&state={state} +``` + +**Error Codes** (OAuth 2.0 standard): +- `invalid_request`: Missing or invalid parameter +- `unauthorized_client`: Client not authorized +- `access_denied`: User denied authorization +- `unsupported_response_type`: response_type not "code" +- `server_error`: Internal server error + +**Error Page** (when redirect not possible): +```html + + +Authorization Error + +

Authorization Error

+

Invalid redirect_uri. Cannot redirect safely.

+ + +``` + +**Rate Limiting**: None at endpoint level (handled by verification service) + +**Authentication**: None initially (domain verification IS the authentication) + +--- + +### POST /authorize/verify-code + +**Purpose**: Verify code during authorization flow. + +**Form Data**: +- `email` (required): Email address from rel="me" +- `code` (required): 6-digit verification code +- `me` (required): User's profile URL +- `client_id` (required): Client application URL +- `redirect_uri` (required): Redirect URI +- `state` (required): State parameter + +**Success Response**: HTML page (consent screen) + +**Error Response**: HTML page (code entry form with error message) + +--- + +### POST /authorize/consent + +**Purpose**: Handle user consent decision. + +**Form Data**: +- `action` (required): "approve" or "deny" +- `me` (required): User's profile URL +- `client_id` (required): Client application URL +- `redirect_uri` (required): Redirect URI +- `state` (required): State parameter + +**Success Response (Approve)** (302 Found): +``` +{redirect_uri}?code={authorization_code}&state={state} +``` + +**Success Response (Deny)** (302 Found): +``` +{redirect_uri}?error=access_denied&error_description=User+denied+authorization&state={state} +``` + +## Data Models + +### Verified Domain (Database Table) + +**Table**: `domains` + +**Schema** (from Phase 1): +```sql +CREATE TABLE domains ( + domain TEXT PRIMARY KEY, + verification_method TEXT NOT NULL, -- 'two_factor' for v1.0.0 + verified BOOLEAN NOT NULL DEFAULT FALSE, + verified_at TIMESTAMP, + last_dns_check TIMESTAMP, + last_email_check TIMESTAMP +); +``` + +**Updated in Phase 2**: Change `verification_method` values from `'email'` / `'txt_record'` to `'two_factor'`. + +**Migration**: `002_update_verification_method.sql`: +```sql +-- Update verification_method values to reflect two-factor requirement +UPDATE domains +SET verification_method = 'two_factor' +WHERE verification_method IN ('email', 'txt_record'); +``` + +**Indexes** (from Phase 1): +```sql +CREATE INDEX idx_domains_domain ON domains(domain); +CREATE INDEX idx_domains_verified ON domains(verified); +``` + +--- + +### Authorization Code (In-Memory) + +**Storage**: Phase 1 CodeStorage with metadata + +**Structure**: +```python +{ + "code": "abc123...", # 43-char base64url (32 bytes) + "me": "https://example.com", + "client_id": "https://client.example.com", + "redirect_uri": "https://client.example.com/callback", + "state": "client-provided-state", + "created_at": datetime, + "expires_at": datetime, # created_at + 10 minutes + "used": False # For Phase 3 token endpoint +} +``` + +**TTL**: 10 minutes (per W3C spec: "shortly after") + +**Storage Location**: Phase 1 CodeStorage service + +--- + +### Verification Code Metadata (In-Memory) + +**Storage**: Additional metadata alongside verification codes + +**Structure**: +```python +{ + "email": "user@example.com", + "domain": "example.com", + "attempts": 0, # Increment on each failed attempt + "created_at": datetime +} +``` + +**Purpose**: Track attempts and associate email with domain for rate limiting. + +**TTL**: Same as verification code (15 minutes) + +## Security Requirements + +### Input Validation + +**Domain Parameter**: +```python +def validate_domain(domain: str) -> Tuple[bool, Optional[str], Optional[str]]: + """ + Validate domain parameter. + + Returns: (is_valid, normalized_domain, error_message) + """ + # Remove protocol if present + if domain.startswith('http://') or domain.startswith('https://'): + domain = domain.split('://', 1)[1] + + # Remove path if present + if '/' in domain: + domain = domain.split('/')[0] + + # Lowercase + domain = domain.lower().strip() + + # Must contain at least one dot + if '.' not in domain: + return False, None, "Domain must contain at least one dot (e.g., example.com)" + + # Must not be empty + if not domain: + return False, None, "Domain cannot be empty" + + # Must not contain invalid characters + if any(c in domain for c in [' ', '@', ':', '?', '#']): + return False, None, "Domain contains invalid characters" + + # Length check + if len(domain) > 253: + return False, None, "Domain too long (max 253 characters)" + + return True, domain, None +``` + +**Email Parameter**: +```python +def validate_email(email: str) -> bool: + """ + Validate email format (RFC 5322 simplified). + + Used by rel="me" discovery service. + """ + email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' + + if not re.match(email_regex, email): + return False + + if len(email) > 254: # RFC 5321 maximum + return False + + if email.count('@') != 1: + return False + + local, domain = email.split('@') + if '.' not in domain: + return False + + return True +``` + +**URL Parameters** (me, client_id, redirect_uri): +```python +def validate_url(url: str, param_name: str) -> Tuple[bool, Optional[str]]: + """ + Validate URL parameter. + + Returns: (is_valid, error_message) + """ + from urllib.parse import urlparse + + try: + parsed = urlparse(url) + except Exception: + return False, f"{param_name} must be a valid URL" + + # Must have scheme and netloc + if not parsed.scheme or not parsed.netloc: + return False, f"{param_name} must be a complete URL (e.g., https://example.com)" + + # Must be http or https + if parsed.scheme not in ['http', 'https']: + return False, f"{param_name} must use http or https" + + # No fragments for 'me' parameter + if param_name == "me" and parsed.fragment: + return False, "me parameter must not contain fragment" + + # No credentials + if parsed.username or parsed.password: + return False, f"{param_name} must not contain credentials" + + return True, None +``` + +--- + +### HTTPS Enforcement + +**Configuration**: +```python +# In production config +if not DEBUG: + # Enforce HTTPS + app.add_middleware(HTTPSRedirectMiddleware) + + # Reject HTTP redirect_uri (except localhost) + if redirect_uri.startswith('http://'): + parsed = urlparse(redirect_uri) + if parsed.hostname not in ['localhost', '127.0.0.1']: + return error_response("redirect_uri must use HTTPS in production") +``` + +**HTML Fetching**: +- HTTPS only (hardcoded `https://` in URL) +- SSL certificate verification enforced (`verify=True`, no option to disable) +- Reject sites with invalid certificates + +--- + +### HTML Parsing Security + +**BeautifulSoup Configuration**: +```python +# Use html.parser (Python standard library, safe for untrusted HTML) +soup = BeautifulSoup(html_content, 'html.parser') +``` + +**Why html.parser**: +- Part of Python standard library (no external C dependencies) +- Designed for untrusted HTML +- No script execution +- No external resource loading +- Handles malformed HTML gracefully + +**Size Limits**: +- Maximum response size: 5MB (configurable) +- Checked both in Content-Length header and actual content + +**Timeout**: +- HTTP request timeout: 10 seconds (configurable) +- Prevents hanging on slow sites + +--- + +### Protection Against Open Redirects + +**redirect_uri Validation**: +```python +def validate_redirect_uri(redirect_uri: str, client_id: str) -> Tuple[bool, Optional[str]]: + """ + Validate redirect_uri against client_id. + + Returns: (is_valid, warning_message) + """ + from urllib.parse import urlparse + + redirect_parsed = urlparse(redirect_uri) + client_parsed = urlparse(client_id) + + # Must be HTTPS (except localhost) + if redirect_parsed.scheme != 'https': + if redirect_parsed.hostname not in ['localhost', '127.0.0.1']: + return False, "redirect_uri must use HTTPS" + + # Must have valid hostname + if not redirect_parsed.hostname: + return False, "redirect_uri must have valid hostname" + + redirect_domain = redirect_parsed.hostname.lower() + client_domain = client_parsed.hostname.lower() + + # Exact match: OK + if redirect_domain == client_domain: + return True, None + + # Subdomain of client: OK + if redirect_domain.endswith('.' + client_domain): + return True, None + + # Different domain: WARNING (display to user, but allow) + warning = ( + f"Warning: Redirect to different domain ({redirect_domain}) " + f"than client ({client_domain}). Ensure you trust this application." + ) + return True, warning +``` + +**Display Warning to User**: +- If redirect_uri domain differs from client_id domain, show warning on consent screen +- User must explicitly approve redirect to different domain +- Prevents phishing via redirect URI manipulation + +--- + +### CSRF Protection + +**State Parameter**: +- Required in authorization request +- Stored with authorization code +- Passed through verification and consent steps +- Returned unchanged in redirect +- Client validates state matches original (client responsibility per OAuth 2.0) + +**Gondulf does NOT validate state** - This is intentional per OAuth 2.0: +- State is opaque to authorization server +- Client generates state, client validates state +- Gondulf only passes it through unchanged + +--- + +### Code Replay Prevention + +**Authorization Code**: +- Single-use enforcement (Phase 3 token endpoint marks as used) +- 10-minute expiration +- Bound to client_id, redirect_uri, me +- Stored in memory (Phase 1 CodeStorage) + +**Verification Code**: +- Single-use: Deleted after successful verification +- 15-minute expiration +- Max 3 attempts before invalidation +- Constant-time comparison (prevent timing attacks) + +## Testing Requirements + +### Unit Tests + +**HTML Fetcher Service** (9 tests): +- ✅ Successful HTTPS fetch returns content +- ✅ SSL verification failure returns None +- ✅ Timeout returns None +- ✅ HTTP error codes (404, 500) return None +- ✅ Redirects followed (up to max) +- ✅ Too many redirects returns None +- ✅ Content-Length exceeds limit returns None +- ✅ Actual content exceeds limit returns None +- ✅ Custom User-Agent sent + +**rel="me" Discovery Service** (12 tests): +- ✅ Discovery from `` tag +- ✅ Discovery from `
` tag +- ✅ Multiple rel="me" links: first mailto selected +- ✅ Malformed HTML handled +- ✅ Missing rel="me" returns None +- ✅ Invalid email in link returns None +- ✅ Empty href returns None +- ✅ Non-mailto links ignored +- ✅ mailto with query params strips params +- ✅ Email validation: valid formats +- ✅ Email validation: invalid formats +- ✅ Exception during parsing returns None + +**Domain Verification Service** (15 tests): +- ✅ Full flow: DNS → rel="me" → email → code +- ✅ DNS failure blocks flow +- ✅ Site fetch failure blocks flow +- ✅ rel="me" failure blocks flow +- ✅ Email send failure cleans up code +- ✅ Code verification success stores domain +- ✅ Code verification failure decrements attempts +- ✅ Too many attempts invalidates code +- ✅ Invalid code returns error +- ✅ Code expiration handled +- ✅ Rate limiting works +- ✅ Already verified domain check +- ✅ Email masking correct +- ✅ Constant-time comparison used +- ✅ Metadata tracking works + +**Estimated Unit Test Count**: ~36 tests + +--- + +### Integration Tests + +**Verification Endpoints** (10 tests): +- ✅ POST /api/verify/start success case +- ✅ POST /api/verify/start with invalid domain +- ✅ POST /api/verify/start with DNS failure +- ✅ POST /api/verify/start with rel="me" failure +- ✅ POST /api/verify/start with email send failure +- ✅ POST /api/verify/code success case +- ✅ POST /api/verify/code with invalid code +- ✅ POST /api/verify/code with expired code +- ✅ POST /api/verify/code with missing code +- ✅ POST /api/verify/code with too many attempts + +**Authorization Endpoint** (15 tests): +- ✅ GET /authorize with valid params (already verified domain) +- ✅ GET /authorize with valid params (new domain) +- ✅ GET /authorize with invalid response_type +- ✅ GET /authorize with invalid me parameter +- ✅ GET /authorize with invalid client_id +- ✅ GET /authorize with invalid redirect_uri +- ✅ GET /authorize with missing state +- ✅ POST /authorize/verify-code with valid code +- ✅ POST /authorize/verify-code with invalid code +- ✅ POST /authorize/consent with action=approve +- ✅ POST /authorize/consent with action=deny +- ✅ Authorization code stored with metadata +- ✅ Authorization code expires after 10 min +- ✅ State parameter passed through +- ✅ redirect_uri domain mismatch shows warning + +**Estimated Integration Test Count**: ~25 tests + +--- + +### End-to-End Tests + +**Complete Flows** (5 tests): +- ✅ Full auth flow: /authorize → verify → consent → redirect with code +- ✅ Full auth flow with cached domain (skip verification) +- ✅ User denies consent → redirect with access_denied +- ✅ DNS verification failure → error page → retry → success +- ✅ Invalid code × 3 → error "too many attempts" + +**Estimated E2E Test Count**: ~5 tests + +--- + +### Security Tests + +**Input Validation** (8 tests): +- ✅ Malformed domain rejected +- ✅ Malformed email rejected (during validation) +- ✅ Malformed URL (me, client_id, redirect_uri) rejected +- ✅ URL with credentials rejected +- ✅ URL with fragment rejected (me parameter) +- ✅ Oversized HTML (>5MB) rejected +- ✅ Invalid email in rel="me" logged and skipped +- ✅ SQL injection attempts in domain parameter (should be parameterized) + +**Authentication Security** (5 tests): +- ✅ Expired code rejected +- ✅ Used code rejected (Phase 3) +- ✅ Invalid code rejected +- ✅ Brute force prevented (max 3 attempts) +- ✅ Constant-time comparison used (verify via timing analysis - difficult to test) + +**TLS/HTTPS** (4 tests): +- ✅ HTTP redirect_uri rejected in production +- ✅ Invalid SSL certificate rejected +- ✅ Site fetch over HTTPS only +- ✅ HTTP allowed for localhost only + +**Open Redirect** (3 tests): +- ✅ redirect_uri domain mismatch shows warning +- ✅ Invalid redirect_uri shows error page (no redirect) +- ✅ redirect_uri without hostname rejected + +**Estimated Security Test Count**: ~20 tests + +--- + +### Coverage Target + +**Phase 2 Overall**: 80%+ coverage (same as Phase 1) + +**Critical Code** (95%+ coverage): +- Domain verification service (orchestration logic) +- rel="me" discovery (email extraction) +- Authorization endpoint (parameter validation) +- Security functions (validation, constant-time comparison) + +**Total Estimated Test Count**: ~86 tests + +## Error Handling + +### DNS Verification Failure + +**Error Message**: +``` +DNS Verification Failed + +The DNS TXT record was not found for your domain. + +Please add the following TXT record to your DNS: + Type: TXT + Name: _gondulf.example.com + Value: verified + +DNS changes may take up to 24 hours to propagate. + +[Retry] +``` + +**HTTP Response**: 200 OK (HTML error page) + +**Logging**: WARNING level with domain + +--- + +### rel="me" Discovery Failure + +**Error Message**: +``` +Email Discovery Failed + +No rel="me" email link was found on your homepage. + +Please add the following to https://example.com: + + +This allows us to discover your email address automatically. + +Learn more: https://indieweb.org/rel-me + +[Retry] +``` + +**HTTP Response**: 200 OK (HTML error page) + +**Logging**: WARNING level with domain + +--- + +### Site Unreachable + +**Error Message**: +``` +Site Fetch Failed + +Could not fetch your site at https://example.com + +Please check: +• Site is accessible via HTTPS +• SSL certificate is valid +• No firewall blocking requests + +[Retry] +``` + +**HTTP Response**: 200 OK (HTML error page) + +**Logging**: ERROR level with domain and error details + +--- + +### Email Send Failure + +**Error Message**: +``` +Email Delivery Failed + +Failed to send verification code to u***@example.com + +Please check: +• Email address is correct in your rel="me" link +• Email server is accepting mail +• Check spam/junk folder + +[Retry] +``` + +**HTTP Response**: 200 OK (HTML error page) + +**Logging**: ERROR level with masked email + +--- + +### Invalid Code + +**Error Message**: +``` +Invalid code. 2 attempts remaining. +``` + +**HTTP Response**: 200 OK (code entry form with error) + +**Logging**: WARNING level with masked email + +--- + +### Too Many Attempts + +**Error Message**: +``` +Too Many Attempts + +You have exceeded the maximum number of attempts. + +Please request a new verification code. + +[Request New Code] +``` + +**HTTP Response**: 200 OK (error page with retry link) + +**Logging**: WARNING level with masked email + +--- + +### Rate Limit Exceeded + +**Error Message**: +``` +Rate Limit Exceeded + +Too many verification requests for this domain. + +Please wait 1 hour before requesting another code. +``` + +**HTTP Response**: 200 OK (error page) + +**Logging**: WARNING level with domain + +--- + +### OAuth 2.0 Errors (Authorization Endpoint) + +**Error Redirect Format**: +``` +{redirect_uri}?error={error_code}&error_description={description}&state={state} +``` + +**Error Codes**: +- `invalid_request`: Missing or invalid parameter +- `unauthorized_client`: Client not authorized +- `access_denied`: User denied authorization +- `unsupported_response_type`: response_type not "code" +- `server_error`: Internal server error + +**Example**: +``` +https://client.example.com/callback?error=invalid_request&error_description=Missing+state+parameter&state=abc123 +``` + +**Logging**: WARNING or ERROR level depending on error type + +--- + +### Error Logging Standards + +**Log Levels**: +- **DEBUG**: Normal operations, detailed flow +- **INFO**: Successful operations (code sent, domain verified) +- **WARNING**: Expected errors (invalid code, DNS not found) +- **ERROR**: Unexpected errors (SMTP failure, site unreachable) +- **CRITICAL**: System failures (should not occur in Phase 2) + +**What to Log**: +- ✅ Domain (public information) +- ✅ Email (partial mask: first 3 chars) +- ✅ Error details (for debugging) +- ✅ Request IDs (for correlation) + +**What NOT to Log**: +- ❌ Full email addresses +- ❌ Verification codes +- ❌ Authorization codes +- ❌ User-Agent (GDPR) +- ❌ IP addresses (GDPR) + +## Dependencies + +### New Python Packages + +**Add to pyproject.toml**: +```toml +[project] +dependencies = [ + # ... existing dependencies from Phase 1 + "beautifulsoup4>=4.12.0", # HTML parsing for rel="me" discovery +] +``` + +**Why beautifulsoup4**: +- Robust HTML parsing (handles malformed HTML) +- Safe for untrusted content (no script execution) +- Standard in Python ecosystem +- Pure Python (no C dependencies with html.parser) + +### Phase 1 Dependencies Used + +- `requests` (HTTP fetching - already in pyproject.toml) +- `dnspython` (DNS queries - Phase 1) +- `smtplib` (Email sending - Python stdlib, used by Phase 1) +- `sqlalchemy` (Database - Phase 1) +- `fastapi` (Web framework - Phase 1) +- `pydantic` (Data validation - Phase 1) + +### Configuration Additions + +**Optional new environment variables**: +```bash +# HTML Fetching (optional - has defaults) +GONDULF_HTML_FETCH_TIMEOUT=10 # seconds +GONDULF_HTML_MAX_SIZE=5242880 # bytes (5MB) +GONDULF_HTML_MAX_REDIRECTS=5 + +# Rate Limiting (optional - has defaults) +GONDULF_VERIFICATION_RATE_LIMIT=3 # codes per domain per hour +``` + +**Add to .env.example**: +```bash +# HTML Fetching Configuration (optional) +GONDULF_HTML_FETCH_TIMEOUT=10 +GONDULF_HTML_MAX_SIZE=5242880 +GONDULF_HTML_MAX_REDIRECTS=5 + +# Rate Limiting (optional) +GONDULF_VERIFICATION_RATE_LIMIT=3 +``` + +## Implementation Notes + +### Suggested Implementation Order + +1. **HTML Fetcher Service** (0.5 days) + - Straightforward HTTP fetching + - Few dependencies + - Easy to test in isolation + +2. **rel="me" Discovery Service** (0.5 days) + - Pure parsing logic + - No external dependencies (besides HTML input) + - Easy to test with mock HTML + +3. **Domain Verification Service** (1 day) + - Orchestrates all services + - More complex logic + - Needs all previous services complete + +4. **Database Migration** (0.5 days) + - Simple UPDATE query + - Apply before verification endpoints + +5. **Verification Endpoints** (0.5 days) + - Thin API layer over service + - FastAPI makes this straightforward + +6. **Authorization Endpoint** (3-4 days) + - Most complex component + - HTML templates needed + - Multiple sub-endpoints + - Needs comprehensive testing + +7. **Integration Testing** (1 day) + - Test all components together + - End-to-end flow verification + +**Total**: ~7-8 days (matches estimate in phase-1-impact-assessment.md) + +--- + +### Risks and Mitigations + +**Risk 1: HTML Parsing Edge Cases** +- **Mitigation**: BeautifulSoup handles malformed HTML gracefully +- **Testing**: Include malformed HTML in test cases +- **Fallback**: Clear error messages guide users to fix HTML + +**Risk 2: Email Delivery Failures** +- **Mitigation**: Comprehensive SMTP error handling +- **Testing**: Mock SMTP failures in tests +- **Fallback**: Clear troubleshooting instructions in error messages + +**Risk 3: DNS TXT Record Setup Complexity** +- **Mitigation**: Clear setup instructions with examples +- **User Education**: Document common DNS providers +- **Support**: Provide example DNS configurations + +**Risk 4: Authorization Endpoint Complexity** +- **Mitigation**: Break into smaller sub-endpoints (verify-code, consent) +- **Testing**: Comprehensive integration tests +- **Design**: Keep state management simple (use forms, avoid complex sessions) + +**Risk 5: Rate Limiting Implementation** +- **Mitigation**: Start with simple in-memory tracking (Phase 2) +- **Future**: Migrate to Redis for distributed rate limiting (Phase 3+) +- **Placeholder**: Implement rate limit check, return False for now + +--- + +### Performance Considerations + +**HTML Fetching**: +- Timeout: 10 seconds (prevent hanging) +- Size limit: 5MB (prevent memory exhaustion) +- Concurrent requests: Not needed in Phase 2 (one request per auth flow) + +**Database Queries**: +- Index on domains.domain ensures fast lookups +- Simple SELECT queries (no joins in Phase 2) +- Consider adding index on domains.verified if needed + +**In-Memory Storage**: +- Verification codes: ~100 bytes each +- Authorization codes: ~200 bytes each +- Expected load: 10s of users, <100 concurrent verifications +- Memory impact: Negligible (<10KB) + +**rel="me" Parsing**: +- BeautifulSoup is pure Python (not fastest, but sufficient) +- HTML size limited to 5MB (parse time <1 second) +- No performance issues expected for typical homepages + +--- + +### Future Extensibility + +**Redis Integration** (Phase 3+): +- Replace in-memory CodeStorage with Redis +- Enables distributed deployment (multiple Gondulf instances) +- No code changes needed (CodeStorage interface unchanged) + +**Client Metadata Caching** (Phase 3): +- Cache client_id fetch results +- Reduces HTTP requests during authorization +- Store in database or Redis + +**PKCE Support** (v1.1.0): +- Add code_challenge validation in authorization endpoint +- Add code_verifier validation in token endpoint (Phase 3) +- No breaking changes to v1.0.0 clients + +**Additional Authentication Methods** (v1.2.0+): +- GitHub/GitLab OAuth providers +- WebAuthn support +- All additive (user chooses method) + +## Acceptance Criteria + +Phase 2 is complete when ALL of the following criteria are met: + +### Functionality + +- [ ] HTML fetcher service fetches user homepages successfully +- [ ] rel="me" discovery service discovers email from HTML +- [ ] Domain verification service orchestrates two-factor verification +- [ ] DNS TXT verification required and working +- [ ] Email verification via rel="me" required and working +- [ ] Verification endpoints (/api/verify/start, /api/verify/code) working +- [ ] Authorization endpoint (/authorize) validates all parameters +- [ ] Authorization endpoint checks domain verification status +- [ ] Authorization endpoint shows verification form for unverified domains +- [ ] Authorization endpoint shows consent screen after verification +- [ ] Authorization code generated and stored on approval +- [ ] User can deny consent (redirects with access_denied) +- [ ] State parameter passed through all steps + +### Testing + +- [ ] All unit tests passing (estimated ~36 tests) +- [ ] All integration tests passing (estimated ~25 tests) +- [ ] All end-to-end tests passing (estimated ~5 tests) +- [ ] All security tests passing (estimated ~20 tests) +- [ ] Test coverage ≥80% overall +- [ ] Test coverage ≥95% for domain verification service +- [ ] Test coverage ≥95% for authorization endpoint +- [ ] No known bugs or failing tests + +### Security + +- [ ] HTTPS enforcement working (production) +- [ ] SSL certificate validation enforced (HTML fetching) +- [ ] HTML parsing secure (BeautifulSoup with html.parser) +- [ ] Input validation comprehensive (domain, email, URLs) +- [ ] Open redirect protection working (redirect_uri validation) +- [ ] Constant-time code comparison used +- [ ] Rate limiting implemented (basic in-memory) +- [ ] Attempt limiting working (max 3 per code) +- [ ] No PII in logs (email masked, no full addresses) +- [ ] Authorization codes single-use (marked for Phase 3) + +### Error Handling + +- [ ] DNS verification failure shows clear instructions +- [ ] rel="me" discovery failure shows HTML example +- [ ] Site unreachable shows troubleshooting steps +- [ ] Email send failure shows error with retry +- [ ] Invalid code shows attempts remaining +- [ ] Too many attempts invalidates code +- [ ] Rate limit exceeded shows wait time +- [ ] OAuth 2.0 errors formatted correctly +- [ ] All errors logged appropriately + +### Documentation + +- [ ] All new services have docstrings +- [ ] All public methods have type hints +- [ ] API endpoints documented (this design doc) +- [ ] Error messages user-friendly +- [ ] Setup instructions clear (DNS + rel="me") +- [ ] Database migration documented + +### Dependencies + +- [ ] beautifulsoup4 added to pyproject.toml +- [ ] No new system dependencies (all Python) +- [ ] Configuration updated (.env.example) + +### Database + +- [ ] Migration 002 applied successfully +- [ ] domains.verification_method updated to 'two_factor' +- [ ] No schema changes needed (existing schema works) + +### Integration + +- [ ] All Phase 1 services integrated successfully +- [ ] DNS service used for TXT verification +- [ ] Email service used for code sending +- [ ] Database service used for storing verified domains +- [ ] In-memory storage used for codes +- [ ] Logging used throughout + +### Performance + +- [ ] HTML fetching completes within 10 seconds +- [ ] rel="me" parsing completes within 1 second +- [ ] Full verification flow completes within 30 seconds +- [ ] Authorization endpoint responds within 2 seconds +- [ ] No memory leaks (codes expire and clean up) + +## Timeline Estimate + +**Phase 2 Implementation**: 7-9 days + +**Breakdown**: +- HTML Fetcher Service: 0.5 days +- rel="me" Discovery Service: 0.5 days +- Domain Verification Service: 1 day +- Database Migration: 0.5 days +- Verification Endpoints: 0.5 days +- Authorization Endpoint: 3-4 days +- Integration Testing: 1 day +- Documentation: 0.5 days (included in parallel) + +**Dependencies**: Phase 1 complete and approved + +**Risk Buffer**: +2 days (for unforeseen issues with HTML parsing or authorization flow complexity) + +## Sign-off + +**Design Status**: Complete and ready for implementation + +**Architect**: Claude (Architect Agent) +**Date**: 2025-11-20 + +**Next Steps**: +1. Developer reviews design document +2. Developer asks clarification questions if needed +3. Architect updates design based on feedback +4. Developer begins implementation following design +5. Developer creates implementation report upon completion +6. Architect reviews implementation report + +**Related Documents**: +- `/docs/architecture/overview.md` - System architecture +- `/docs/architecture/indieauth-protocol.md` - IndieAuth protocol implementation +- `/docs/architecture/security.md` - Security architecture +- `/docs/architecture/phase-1-impact-assessment.md` - Phase 2 requirements +- `/docs/decisions/ADR-005-email-based-authentication-v1-0-0.md` - Two-factor verification decision +- `/docs/decisions/ADR-008-rel-me-email-discovery.md` - rel="me" pattern decision +- `/docs/reports/2025-11-20-phase-1-foundation.md` - Phase 1 implementation +- `/docs/roadmap/v1.0.0.md` - Version plan + +--- + +**DESIGN READY: Phase 2 Domain Verification - Please review /docs/designs/phase-2-domain-verification.md** diff --git a/docs/designs/phase-2-implementation-guide.md b/docs/designs/phase-2-implementation-guide.md new file mode 100644 index 0000000..df5746e --- /dev/null +++ b/docs/designs/phase-2-implementation-guide.md @@ -0,0 +1,739 @@ +# Phase 2 Implementation Guide - Specific Details + +**Date**: 2024-11-20 +**Architect**: Claude (Architect Agent) +**Status**: Supplementary to Phase 2 Design +**Purpose**: Provide specific implementation details for Developer clarification questions + +This document supplements `/docs/designs/phase-2-domain-verification.md` with specific implementation decisions from ADR-0004. + +## 1. Rate Limiting Implementation + +### Approach +Implement actual in-memory rate limiting with timestamp tracking. + +### Implementation Specifications + +**Service Structure**: +```python +# src/gondulf/rate_limiter.py +from typing import Dict, List +import time + +class RateLimiter: + """In-memory rate limiter for domain verification attempts.""" + + def __init__(self, max_attempts: int = 3, window_hours: int = 1): + """ + Args: + max_attempts: Maximum attempts per domain in time window (default: 3) + window_hours: Time window in hours (default: 1) + """ + self.max_attempts = max_attempts + self.window_seconds = window_hours * 3600 + self._attempts: Dict[str, List[int]] = {} # domain -> [timestamp1, timestamp2, ...] + + def check_rate_limit(self, domain: str) -> bool: + """ + Check if domain has exceeded rate limit. + + Args: + domain: Domain to check + + Returns: + True if within rate limit, False if exceeded + """ + # Clean old timestamps first + self._clean_old_attempts(domain) + + # Check current count + if domain not in self._attempts: + return True + + return len(self._attempts[domain]) < self.max_attempts + + def record_attempt(self, domain: str) -> None: + """Record a verification attempt for domain.""" + now = int(time.time()) + if domain not in self._attempts: + self._attempts[domain] = [] + self._attempts[domain].append(now) + + def _clean_old_attempts(self, domain: str) -> None: + """Remove timestamps older than window.""" + if domain not in self._attempts: + return + + now = int(time.time()) + cutoff = now - self.window_seconds + self._attempts[domain] = [ts for ts in self._attempts[domain] if ts > cutoff] + + # Remove domain entirely if no recent attempts + if not self._attempts[domain]: + del self._attempts[domain] +``` + +**Usage in Endpoints**: +```python +# In verification endpoint +rate_limiter = get_rate_limiter() +if not rate_limiter.check_rate_limit(domain): + return {"success": False, "error": "rate_limit_exceeded"} + +rate_limiter.record_attempt(domain) +# ... proceed with verification +``` + +**Consequences**: +- State lost on restart (acceptable trade-off for simplicity) +- No persistence needed +- Simple dictionary-based implementation + +## 2. Authorization Code Metadata Structure + +### Approach +Use Phase 1's `CodeStorage` service with complete metadata structure from the start. + +### Data Structure Specification + +**Authorization Code Metadata**: +```python +{ + "client_id": "https://client.example.com/", + "redirect_uri": "https://client.example.com/callback", + "state": "client_state_value", + "code_challenge": "base64url_encoded_challenge", + "code_challenge_method": "S256", + "scope": "profile email", + "me": "https://user.example.com/", + "created_at": 1700000000, # epoch integer + "expires_at": 1700000600, # epoch integer (created_at + 600) + "used": False # Include now, consume in Phase 3 +} +``` + +**Storage Implementation**: +```python +# Use Phase 1's CodeStorage +code_storage = get_code_storage() +authorization_code = generate_random_code() +metadata = { + "client_id": client_id, + "redirect_uri": redirect_uri, + "state": state, + "code_challenge": code_challenge, + "code_challenge_method": code_challenge_method, + "scope": scope, + "me": me, + "created_at": int(time.time()), + "expires_at": int(time.time()) + 600, + "used": False +} +code_storage.store(f"authz:{authorization_code}", metadata, ttl=600) +``` + +**Rationale**: +- Epoch integers simpler than datetime objects +- Include `used` field now (Phase 3 will check/update it) +- Reuse existing `CodeStorage` infrastructure +- Key prefix `authz:` distinguishes from verification codes + +## 3. HTML Template Implementation + +### Approach +Use Jinja2 templates with separate template files. + +### Directory Structure +``` +src/gondulf/templates/ +├── base.html # Shared layout +├── verify_email.html # Email verification form +├── verify_totp.html # TOTP verification form (future) +├── authorize.html # Authorization consent page +└── error.html # Generic error page +``` + +### Base Template +```html + + + + + + + {% block title %}Gondulf IndieAuth{% endblock %} + + + + {% block content %}{% endblock %} + + +``` + +### Email Verification Template +```html + +{% extends "base.html" %} + +{% block title %}Verify Email - Gondulf{% endblock %} + +{% block content %} +

Verify Your Email

+

A verification code has been sent to {{ masked_email }}

+

Please enter the 6-digit code to complete verification:

+ +{% if error %} +

{{ error }}

+{% endif %} + +
+ + + +
+{% endblock %} +``` + +### FastAPI Integration +```python +from fastapi import FastAPI, Request +from fastapi.templating import Jinja2Templates + +templates = Jinja2Templates(directory="src/gondulf/templates") + +@app.get("/verify/email") +async def verify_email_page(request: Request, domain: str): + masked = mask_email(discovered_email) + return templates.TemplateResponse("verify_email.html", { + "request": request, + "domain": domain, + "masked_email": masked + }) +``` + +**Dependencies**: +- Add to `pyproject.toml`: `jinja2 = "^3.1.0"` + +## 4. Database Migration Timing + +### Approach +Apply migration 002 immediately as part of Phase 2 setup. + +### Execution Order +1. Developer runs migration: `alembic upgrade head` +2. Migration 002 adds `two_factor` column with default value `false` +3. All Phase 2 code assumes column exists +4. New domains inserted with explicit `two_factor` value + +### Migration File (if not already created) +```python +# migrations/versions/002_add_two_factor_column.py +"""Add two_factor column to domains table + +Revision ID: 002 +Revises: 001 +Create Date: 2024-11-20 +""" +from alembic import op +import sqlalchemy as sa + +def upgrade(): + op.add_column('domains', + sa.Column('two_factor', sa.Boolean(), nullable=False, server_default='false') + ) + +def downgrade(): + op.drop_column('domains', 'two_factor') +``` + +**Rationale**: +- Keep database schema current with code expectations +- No conditional logic needed in Phase 2 code +- Clean separation: migration handles existing data, new code uses new schema + +## 5. Client Validation Helper Functions + +### Approach +Standalone utility functions in shared module. + +### Module Structure +```python +# src/gondulf/utils/validation.py +"""Client validation and utility functions.""" +from urllib.parse import urlparse +import re + +def mask_email(email: str) -> str: + """ + Mask email for display: user@example.com -> u***@example.com + + Args: + email: Email address to mask + + Returns: + Masked email string + """ + if '@' not in email: + return email + + local, domain = email.split('@', 1) + if len(local) <= 1: + return email + + masked_local = local[0] + '***' + return f"{masked_local}@{domain}" + + +def normalize_client_id(client_id: str) -> str: + """ + Normalize client_id URL to canonical form. + + Rules: + - Ensure https:// scheme + - Remove default port (443) + - Preserve path + + Args: + client_id: Client ID URL + + Returns: + Normalized client_id + """ + parsed = urlparse(client_id) + + # Ensure https + if parsed.scheme != 'https': + raise ValueError("client_id must use https scheme") + + # Remove default HTTPS port + netloc = parsed.netloc + if netloc.endswith(':443'): + netloc = netloc[:-4] + + # Reconstruct + normalized = f"https://{netloc}{parsed.path}" + if parsed.query: + normalized += f"?{parsed.query}" + if parsed.fragment: + normalized += f"#{parsed.fragment}" + + return normalized + + +def validate_redirect_uri(redirect_uri: str, client_id: str) -> bool: + """ + Validate redirect_uri against client_id per IndieAuth spec. + + Rules: + - Must use https scheme (except localhost) + - Must share same origin as client_id OR + - Must be subdomain of client_id domain + + Args: + redirect_uri: Redirect URI to validate + client_id: Client ID for comparison + + Returns: + True if valid, False otherwise + """ + try: + redirect_parsed = urlparse(redirect_uri) + client_parsed = urlparse(client_id) + + # Check scheme (allow http for localhost only) + if redirect_parsed.scheme != 'https': + if redirect_parsed.hostname not in ('localhost', '127.0.0.1'): + return False + + # Same origin check + if (redirect_parsed.scheme == client_parsed.scheme and + redirect_parsed.netloc == client_parsed.netloc): + return True + + # Subdomain check + redirect_host = redirect_parsed.hostname or '' + client_host = client_parsed.hostname or '' + + # Must end with .{client_host} + if redirect_host.endswith(f".{client_host}"): + return True + + return False + + except Exception: + return False +``` + +**Usage**: +```python +from gondulf.utils.validation import mask_email, validate_redirect_uri, normalize_client_id + +# In verification endpoint +masked = mask_email(discovered_email) + +# In authorization endpoint +normalized_client = normalize_client_id(client_id) +if not validate_redirect_uri(redirect_uri, normalized_client): + return error_response("invalid_redirect_uri") +``` + +## 6. Error Response Format Consistency + +### Approach +Use format appropriate to endpoint type. + +### Format Rules by Endpoint Type + +**Verification Endpoints** (`/verify/email`, `/verify/totp`): +```python +# Always return 200 OK with JSON +return JSONResponse( + status_code=200, + content={"success": False, "error": "invalid_code"} +) +``` + +**Authorization Endpoint - Pre-Client Validation**: +```python +# Return HTML error page if client_id not yet validated +return templates.TemplateResponse("error.html", { + "request": request, + "error": "Missing required parameter: client_id", + "error_code": "invalid_request" +}, status_code=400) +``` + +**Authorization Endpoint - Post-Client Validation**: +```python +# Return OAuth redirect with error parameter +from urllib.parse import urlencode +error_params = { + "error": "invalid_request", + "error_description": "Missing code_challenge parameter", + "state": request.query_params.get("state", "") +} +redirect_url = f"{redirect_uri}?{urlencode(error_params)}" +return RedirectResponse(url=redirect_url, status_code=302) +``` + +**Token Endpoint** (Phase 3): +```python +# Always return JSON with appropriate status code +return JSONResponse( + status_code=400, + content={ + "error": "invalid_grant", + "error_description": "Authorization code has expired" + } +) +``` + +### Error Flow Decision Tree +``` +Is this a verification endpoint? + YES -> Return JSON (200 OK) with success:false + NO -> Continue + +Has client_id been validated yet? + NO -> Return HTML error page + YES -> Continue + +Is redirect_uri valid? + NO -> Return HTML error page (can't redirect safely) + YES -> Return OAuth redirect with error +``` + +## 7. Dependency Injection Pattern + +### Approach +Singleton services instantiated at startup in `dependencies.py`. + +### Implementation Structure + +**Dependencies Module**: +```python +# src/gondulf/dependencies.py +"""FastAPI dependency injection for services.""" +from functools import lru_cache +from gondulf.config import get_config +from gondulf.database import DatabaseService +from gondulf.code_storage import CodeStorage +from gondulf.email_service import EmailService +from gondulf.dns_service import DNSService +from gondulf.html_fetcher import HTMLFetcherService +from gondulf.relme_parser import RelMeParser +from gondulf.verification_service import DomainVerificationService +from gondulf.rate_limiter import RateLimiter + +# Configuration +@lru_cache() +def get_config_singleton(): + """Get singleton configuration instance.""" + return get_config() + +# Phase 1 Services +@lru_cache() +def get_database(): + """Get singleton database service.""" + config = get_config_singleton() + return DatabaseService(config.database_url) + +@lru_cache() +def get_code_storage(): + """Get singleton code storage service.""" + return CodeStorage() + +@lru_cache() +def get_email_service(): + """Get singleton email service.""" + config = get_config_singleton() + return EmailService( + smtp_host=config.smtp_host, + smtp_port=config.smtp_port, + smtp_username=config.smtp_username, + smtp_password=config.smtp_password, + from_address=config.smtp_from_address + ) + +@lru_cache() +def get_dns_service(): + """Get singleton DNS service.""" + config = get_config_singleton() + return DNSService(nameservers=config.dns_nameservers) + +# Phase 2 Services +@lru_cache() +def get_html_fetcher(): + """Get singleton HTML fetcher service.""" + return HTMLFetcherService() + +@lru_cache() +def get_relme_parser(): + """Get singleton rel=me parser service.""" + return RelMeParser() + +@lru_cache() +def get_rate_limiter(): + """Get singleton rate limiter service.""" + return RateLimiter(max_attempts=3, window_hours=1) + +@lru_cache() +def get_verification_service(): + """Get singleton domain verification service.""" + return DomainVerificationService( + dns_service=get_dns_service(), + email_service=get_email_service(), + code_storage=get_code_storage(), + html_fetcher=get_html_fetcher(), + relme_parser=get_relme_parser() + ) +``` + +**Usage in Endpoints**: +```python +from fastapi import Depends +from gondulf.dependencies import get_verification_service, get_rate_limiter + +@app.post("/verify/email") +async def verify_email( + domain: str, + code: str, + verification_service: DomainVerificationService = Depends(get_verification_service), + rate_limiter: RateLimiter = Depends(get_rate_limiter) +): + # Use injected services + if not rate_limiter.check_rate_limit(domain): + return {"success": False, "error": "rate_limit_exceeded"} + + result = verification_service.verify_email_code(domain, code) + return {"success": result} +``` + +**Rationale**: +- `@lru_cache()` ensures single instance per function +- Services configured once at startup +- Consistent with Phase 1 pattern +- Simple to test (can override dependencies in tests) + +## 8. Test Organization for Authorization Endpoint + +### Approach +Separate test files per major endpoint with shared fixtures. + +### File Structure +``` +tests/ +├── conftest.py # Shared fixtures and configuration +├── test_verification_endpoints.py # Email/TOTP verification tests +└── test_authorization_endpoint.py # Authorization flow tests +``` + +### Shared Fixtures Module +```python +# tests/conftest.py +import pytest +from fastapi.testclient import TestClient +from gondulf.main import app +from gondulf.dependencies import get_database, get_code_storage, get_rate_limiter + +@pytest.fixture +def client(): + """FastAPI test client.""" + return TestClient(app) + +@pytest.fixture +def mock_database(): + """Mock database service for testing.""" + # Create in-memory test database + from gondulf.database import DatabaseService + db = DatabaseService("sqlite:///:memory:") + db.initialize() + return db + +@pytest.fixture +def mock_code_storage(): + """Mock code storage for testing.""" + from gondulf.code_storage import CodeStorage + return CodeStorage() + +@pytest.fixture +def mock_rate_limiter(): + """Mock rate limiter with clean state.""" + from gondulf.rate_limiter import RateLimiter + return RateLimiter() + +@pytest.fixture +def verified_domain(mock_database): + """Fixture providing a pre-verified domain.""" + domain = "example.com" + mock_database.store_verified_domain( + domain=domain, + email="user@example.com", + two_factor=True + ) + return domain + +@pytest.fixture +def override_dependencies(mock_database, mock_code_storage, mock_rate_limiter): + """Override FastAPI dependencies with test mocks.""" + app.dependency_overrides[get_database] = lambda: mock_database + app.dependency_overrides[get_code_storage] = lambda: mock_code_storage + app.dependency_overrides[get_rate_limiter] = lambda: mock_rate_limiter + yield + app.dependency_overrides.clear() +``` + +### Verification Endpoints Tests +```python +# tests/test_verification_endpoints.py +import pytest + +class TestEmailVerification: + """Tests for /verify/email endpoint.""" + + def test_email_verification_success(self, client, override_dependencies): + """Test successful email verification.""" + # Test implementation + pass + + def test_email_verification_invalid_code(self, client, override_dependencies): + """Test email verification with invalid code.""" + pass + + def test_email_verification_rate_limit(self, client, override_dependencies): + """Test rate limiting on email verification.""" + pass + +class TestTOTPVerification: + """Tests for /verify/totp endpoint (future).""" + pass +``` + +### Authorization Endpoint Tests +```python +# tests/test_authorization_endpoint.py +import pytest +from urllib.parse import parse_qs, urlparse + +class TestAuthorizationEndpoint: + """Tests for /authorize endpoint.""" + + def test_authorize_missing_client_id(self, client, override_dependencies): + """Test authorization with missing client_id parameter.""" + response = client.get("/authorize") + assert response.status_code == 400 + assert "client_id" in response.text + + def test_authorize_invalid_redirect_uri(self, client, override_dependencies): + """Test authorization with mismatched redirect_uri.""" + params = { + "client_id": "https://client.example.com/", + "redirect_uri": "https://evil.com/callback", + "response_type": "code", + "state": "test_state" + } + response = client.get("/authorize", params=params) + assert response.status_code == 400 + + def test_authorize_success_flow(self, client, override_dependencies, verified_domain): + """Test complete successful authorization flow.""" + # Full flow test with verified domain + params = { + "client_id": "https://client.example.com/", + "redirect_uri": "https://client.example.com/callback", + "response_type": "code", + "state": "test_state", + "code_challenge": "test_challenge", + "code_challenge_method": "S256", + "me": f"https://{verified_domain}/" + } + response = client.get("/authorize", params=params, allow_redirects=False) + assert response.status_code == 302 + + # Verify redirect contains authorization code + redirect_url = response.headers["location"] + parsed = urlparse(redirect_url) + query_params = parse_qs(parsed.query) + assert "code" in query_params + assert query_params["state"][0] == "test_state" +``` + +### Test Organization Rules +1. **One test class per major functionality** (email verification, authorization flow) +2. **Test complete flows, not internal methods** (black box testing) +3. **Use shared fixtures** for common setup (verified domains, mock services) +4. **Test both success and error paths** +5. **Test security boundaries** (rate limiting, invalid inputs, unauthorized access) + +## Summary + +These implementation decisions provide the Developer with unambiguous direction for Phase 2 implementation. All decisions prioritize simplicity while maintaining security and specification compliance. + +**Key Principles Applied**: +- Real implementations over stubs (rate limiting, validation) +- Reuse existing infrastructure (CodeStorage, dependency pattern) +- Standard tools over custom solutions (Jinja2 templates) +- Simple data structures (epoch integers, dictionaries) +- Clear separation of concerns (utility functions, test organization) + +**Next Steps for Developer**: +1. Review this guide alongside Phase 2 design document +2. Implement in the order specified by Phase 2 design +3. Follow patterns and structures defined here +4. Ask clarification questions if any ambiguity remains before implementation + +All architectural decisions are now documented and ready for implementation. diff --git a/docs/roadmap/backlog.md b/docs/roadmap/backlog.md index dcbeb98..9386463 100644 --- a/docs/roadmap/backlog.md +++ b/docs/roadmap/backlog.md @@ -568,9 +568,86 @@ These features are REQUIRED for the first production-ready release. Technical debt items are tracked here with a DEBT: prefix. Per project standards, each release must allocate at least 10% of effort to technical debt reduction. -### DEBT: Add Redis for session storage (M) +### DEBT: TD-001 - FastAPI Lifespan Migration (XS) +**Created**: 2025-11-20 (Phase 1 review) +**Priority**: P2 +**Component**: Core Infrastructure + +**Issue**: Using deprecated `@app.on_event()` decorators instead of lifespan context manager. + +**Impact**: +- Deprecation warnings in FastAPI 0.109+ +- Will break in future FastAPI version +- Not following current best practices + +**Current Mitigation**: Still works in current FastAPI version. + +**Effort to Fix**: < 1 day +- Replace `@app.on_event("startup")` with lifespan context manager +- Update database initialization to use lifespan +- Update tests if needed + +**Plan**: Address in v1.1.0 or during FastAPI upgrade. + +**References**: FastAPI lifespan documentation + +--- + +### DEBT: TD-002 - Database Migration Rollback Safety (S) +**Created**: 2025-11-20 (Phase 1 review) +**Priority**: P2 +**Component**: Database Layer + +**Issue**: No migration rollback capability. Migrations are one-way only. + +**Impact**: +- Cannot easily roll back schema changes +- Requires manual SQL to undo migrations +- Risk during production deployments + +**Current Mitigation**: Simple schema, manual SQL backups acceptable for v1.0.0. + +**Effort to Fix**: 1-2 days +- Integrate Alembic for migration management +- Create rollback scripts for existing migrations +- Update deployment documentation + +**Plan**: Address before v1.1.0 when schema changes become more frequent. + +**References**: Alembic documentation + +--- + +### DEBT: TD-003 - Async Email Support (S) +**Created**: 2025-11-20 (Phase 1 review) +**Priority**: P2 +**Component**: Email Service + +**Issue**: Synchronous SMTP blocks request thread during email sending. + +**Impact**: +- Email sending delays response to user (1-5 seconds) +- Thread blocked during SMTP operation +- Poor UX during slow email delivery + +**Current Mitigation**: Acceptable for low-volume v1.0.0. Timeout limits (10s) prevent long blocks. + +**Effort to Fix**: 1-2 days +- Implement background task queue (FastAPI BackgroundTasks or Celery) +- Make email sending non-blocking +- Update UX to show "Sending email..." message +- Add retry logic for failed sends + +**Plan**: Address in v1.1.0 when user volume increases or when UX feedback indicates issue. + +**Alternative**: Use async SMTP library (aiosmtplib) + +--- + +### DEBT: TD-004 - Add Redis for Session Storage (M) **Created**: 2025-11-20 (architectural decision) **Priority**: P2 +**Component**: Storage Layer **Issue**: In-memory storage doesn't survive restarts. @@ -584,22 +661,6 @@ Technical debt items are tracked here with a DEBT: prefix. Per project standards --- -### DEBT: Implement schema migrations (S) -**Created**: 2025-11-20 (architectural decision) -**Priority**: P2 - -**Issue**: No formal migration system, using raw SQL files. - -**Impact**: Schema changes require manual intervention. - -**Mitigation (current)**: Simple schema, infrequent changes acceptable for v1.0.0. - -**Effort to Fix**: 1-2 days (Alembic integration) - -**Plan**: Address before v1.1.0 when schema changes become more frequent. - ---- - ## Backlog Management ### Adding New Features