This is my personal website.
+ + +``` + +Or visible link: + +```html +Email me +``` + +## Rationale + +### Follows IndieWeb Standards + +**IndieWeb Alignment**: +- rel="me" is the standard way to assert identity in IndieWeb +- Users familiar with IndieAuth likely already have rel="me" configured +- Interoperability with other IndieWeb tools +- Well-documented pattern: https://indieweb.org/rel-me + +**Community Expectations**: +- IndieAuth ecosystem uses rel="me" extensively +- Users understand the pattern +- Existing tutorials and documentation available +- Aligns with decentralized identity principles + +### Security Benefits + +**Prevents Social Engineering**: +- User cannot claim arbitrary email addresses +- Email must be published on the user's own site +- Attacker cannot trick user into entering wrong email +- Self-attested identity (user declares on their domain) + +**Reduces Attack Surface**: +- No user input field for email (no typos, no XSS) +- No email enumeration via guessing +- Email discovery transparent and auditable +- User controls what email is published + +**Transparency**: +- User explicitly publishes email on their site +- Public declaration of email relationship +- User aware they're making email public +- No hidden or implicit email collection + +### Implementation Simplicity + +**Standard Libraries**: +- BeautifulSoup: Robust HTML parsing (handles malformed HTML) +- requests: HTTP client (widely used, well-tested) +- No custom protocols or complex parsing +- Python standard library for email validation + +**Error Handling**: +- Clear error messages with setup instructions +- Graceful degradation (site unavailable, etc.) +- Standard HTTP status codes +- No complex state management + +**Testing**: +- Easy to mock HTTP responses +- Straightforward unit tests +- BeautifulSoup handles edge cases (malformed HTML) +- No external service dependencies + +### User Experience + +**Self-Documenting**: +- User adds one HTML tag to their site +- Clear relationship between domain and email +- User understands what they're publishing +- No hidden configuration + +**Familiar Pattern**: +- Similar to verifying site ownership (Google Search Console, etc.) +- Adding meta tags is common web practice +- Many users already have rel="me" for other purposes +- Works with static sites (no backend required) + +**Setup Time**: +- ~1 minute to add link tag +- No waiting (unlike DNS propagation) +- Immediate verification possible +- Can be combined with other rel="me" links + +## Consequences + +### Positive Consequences + +1. **IndieWeb Standard Compliance**: + - Follows established rel="me" pattern + - Interoperability with IndieWeb tools + - Community-vetted approach + - Well-documented standard + +2. **Enhanced Security**: + - No user-provided email input (prevents social engineering) + - Email explicitly published by user + - Transparent and auditable + - Reduces phishing risk + +3. **Implementation Simplicity**: + - Standard libraries (BeautifulSoup, requests) + - No complex protocols + - Easy to test and maintain + - Handles malformed HTML gracefully + +4. **User Control**: + - User explicitly declares email on their site + - Can change email by updating HTML + - No hidden email collection + - User aware of public email + +5. **Flexibility**: + - Works with static sites (no backend needed) + - Can use any email provider + - Email can be at different domain (e.g., Gmail) + - Supports multiple rel="me" links + +### Negative Consequences + +1. **Public Email Requirement**: + - User must publish email publicly on their site + - Not suitable for users who want private email + - Email harvesters can discover address + - Spam risk (mitigated: users can use spam filters) + +2. **HTML Parsing Complexity**: + - Must handle various HTML formats + - Malformed HTML can cause issues (mitigated: BeautifulSoup) + - Case sensitivity considerations + - Multiple possible HTML structures + +3. **Website Dependency**: + - User's site must be available during authentication + - Site downtime blocks authentication + - No fallback if site unreachable + - Requires HTTPS (not all sites have valid certificates) + +4. **Discovery Failures**: + - User may not have rel="me" configured + - Link may be in wrong format + - Email may be invalid format + - Clear error messages required + +5. **Privacy Considerations**: + - Email addresses visible to anyone + - Cannot use email verification without public disclosure + - Users must accept public email + - May deter privacy-conscious users + +### Mitigation Strategies + +**For Public Email Concern**: +- Document clearly that email will be public +- Suggest using dedicated email for IndieAuth +- Recommend spam filtering +- Note: Email is user's choice (they publish it) + +**For HTML Parsing**: +```python +from bs4 import BeautifulSoup + +# BeautifulSoup handles malformed HTML gracefully +soup = BeautifulSoup(html_content, 'html.parser') + +# Case-insensitive attribute matching +me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + +# Multiple link formats supported +# +# Email +``` + +**For Website Dependency**: +- Clear error messages with retry option +- Suggest checking site availability +- Timeout limits (10 seconds) +- Log errors for debugging + +**For Discovery Failures**: +```markdown +Error: No rel="me" email link found + +Please add this to your homepage: + + +See: https://indieweb.org/rel-me for more information. +``` + +## Implementation + +### Email Discovery Service + +```python +from bs4 import BeautifulSoup +import requests +from typing import Optional +import re + +class RelMeEmailDiscovery: + """ + Discover email addresses from rel="me" links on user's homepage. + """ + + def discover_email(self, domain: str) -> Optional[str]: + """ + Fetch domain homepage and discover email from rel="me" link. + + Args: + domain: User's domain (e.g., "example.com") + + Returns: + Email address or None if not found + """ + url = f"https://{domain}" + + try: + # Fetch homepage with safety limits + response = requests.get( + url, + timeout=10, + allow_redirects=True, + max_redirects=5, + verify=True # Verify SSL certificate + ) + response.raise_for_status() + + # Parse HTML (handles malformed HTML) + soup = BeautifulSoup(response.content, 'html.parser') + + # Find all rel="me" links + # Both and tags supported + me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + + # Look for mailto: links + for link in me_links: + href = link.get('href', '') + if href.startswith('mailto:'): + email = href.replace('mailto:', '').strip() + + # Validate email format + if self._validate_email_format(email): + logger.info(f"Discovered email via rel='me' for {domain}") + return email + + logger.warning(f"No rel='me' mailto: link found on {domain}") + return None + + except requests.exceptions.SSLError as e: + logger.error(f"SSL verification failed for {domain}: {e}") + return None + except requests.exceptions.Timeout: + logger.error(f"Timeout fetching {domain}") + return None + except requests.exceptions.HTTPError as e: + logger.error(f"HTTP error fetching {domain}: {e}") + return None + except Exception as e: + logger.error(f"Failed to discover email for {domain}: {e}") + return None + + def _validate_email_format(self, email: str) -> bool: + """ + Validate email address format. + + Args: + email: Email address to validate + + Returns: + True if valid format, False otherwise + """ + # Basic RFC 5322 format check + email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' + if not re.match(email_regex, email): + return False + + # Length check (RFC 5321) + if len(email) > 254: + return False + + # Must have exactly one @ + if email.count('@') != 1: + return False + + return True +``` + +### Error Messages + +```python +# DNS TXT found, but no rel="me" link +error_message = """ +Domain verified via DNS, but no email found on your site. + +Please add this to your homepage: + + +This allows us to discover your email address automatically. + +Learn more: https://indieweb.org/rel-me +""" + +# Site unreachable +error_message = """ +Could not fetch your site at https://{domain} + +Please check: +- Site is accessible via HTTPS +- SSL certificate is valid +- No firewall blocking requests + +Try again once your site is accessible. +""" + +# Invalid email format in rel="me" +error_message = """ +Found rel="me" link, but email format is invalid: {email} + +Please check your rel="me" link uses valid email format: + +""" +``` + +## Alternatives Considered + +### Alternative 1: User-Provided Email Input + +**Pros**: +- Simpler implementation (no HTTP fetch, no parsing) +- Works even if site is down +- User can use private email (not public) +- Immediate (no HTTP round-trip) + +**Cons**: +- Social engineering risk (attacker tricks user into entering wrong email) +- Typo risk (user enters incorrect email) +- No self-attestation (email not on user's site) +- Not aligned with IndieWeb standards + +**Rejected**: Security risks outweigh simplicity benefits. rel="me" provides self-attestation and prevents social engineering. + +--- + +### Alternative 2: DNS TXT Record for Email + +**Pros**: +- Stronger proof of domain control (DNS) +- No website dependency +- Machine-readable format +- Fast lookups (DNS cache) + +**Cons**: +- Requires DNS configuration (more complex than HTML) +- DNS propagation delays (can be hours) +- Not user-friendly for non-technical users +- Not standard IndieWeb practice + +**Rejected**: DNS configuration is more complex than adding HTML tag. rel="me" is more aligned with IndieWeb standards. + +--- + +### Alternative 3: WebFinger Protocol + +**Pros**: +- Standard protocol (RFC 7033) +- Machine-readable format (JSON) +- Supports multiple identities +- Well-defined spec + +**Cons**: +- Requires server-side endpoint (not for static sites) +- More complex implementation +- Not common in IndieWeb ecosystem +- Overkill for email discovery + +**Rejected**: Too complex for v1.0.0 MVP. Doesn't work with static sites. rel="me" is simpler and more aligned with IndieWeb. + +--- + +### Alternative 4: Well-Known URI + +**Pros**: +- Standard approach (`/.well-known/email`) +- Simple file-based implementation +- No HTML parsing required +- Fast lookups + +**Cons**: +- Not an established standard for email +- Requires server configuration +- Not aligned with IndieWeb practices +- Duplicate effort (rel="me" already exists) + +**Rejected**: Not standard practice. rel="me" is already established in IndieWeb ecosystem. + +## References + +- IndieWeb rel="me": https://indieweb.org/rel-me +- Example Implementation: https://thesatelliteoflove.com (Phil Skents' identity page) +- HTML Link Relations (W3C): https://www.w3.org/TR/html5/links.html#linkTypes +- BeautifulSoup Documentation: https://www.crummy.com/software/BeautifulSoup/ +- RFC 5322 (Email Format): https://datatracker.ietf.org/doc/html/rfc5322 +- RFC 5321 (SMTP): https://datatracker.ietf.org/doc/html/rfc5321 +- WebFinger (RFC 7033): https://datatracker.ietf.org/doc/html/rfc7033 (alternative considered) + +## Decision History + +- 2025-11-20: Proposed (Architect) +- 2025-11-20: Accepted (Architect) +- Related to ADR-005 (Two-Factor Domain Verification) diff --git a/docs/designs/phase-2-domain-verification.md b/docs/designs/phase-2-domain-verification.md new file mode 100644 index 0000000..71cd7f3 --- /dev/null +++ b/docs/designs/phase-2-domain-verification.md @@ -0,0 +1,2559 @@ +# Phase 2 Design: Domain Verification & Authorization Endpoint + +**Date**: 2025-11-20 +**Architect**: Claude (Architect Agent) +**Status**: Ready for Implementation +**Design Version**: 1.0 + +## Overview + +### What Phase 2 Builds + +Phase 2 implements the complete two-factor domain verification flow and the IndieAuth authorization endpoint, building on Phase 1's foundational services. + +**Core Functionality**: +1. HTML fetching service to retrieve user's homepage +2. rel="me" email discovery service to parse HTML for email links +3. Domain verification service to orchestrate two-factor verification (DNS TXT + Email) +4. HTTP endpoints for verification flow +5. Authorization endpoint to start IndieAuth authentication flow + +**Connection to IndieAuth Protocol**: Phase 2 implements steps 1-7 of the IndieAuth authorization flow (see `/docs/architecture/indieauth-protocol.md` lines 165-174), completing the domain verification and authorization code generation. + +**Connection to Phase 1**: Phase 2 uses all Phase 1 services: +- Configuration (SMTP, DNS, database settings) +- Database (to store verified domains) +- In-memory storage (for authorization codes) +- Email service (to send verification codes) +- DNS service (to verify TXT records) +- Logging (structured logging throughout) + +### Authentication Security Model + +Per ADR-005 and ADR-008, Phase 2 implements two-factor domain verification: + +**Factor 1: DNS TXT Record** (proves DNS control) +- Required: `_gondulf.{domain}` TXT record = `verified` +- Verified via Phase 1 DNS service +- Consensus from multiple resolvers + +**Factor 2: Email Verification via rel="me"** (proves email control) +- Discover email from `` on user's site +- Send 6-digit code to discovered email +- User enters code to complete verification + +**Combined Security**: Attacker must compromise BOTH DNS and email to authenticate fraudulently. + +## Components + +### 1. HTML Fetching Service + +**File**: `src/gondulf/html_fetcher.py` + +**Purpose**: Fetch user's homepage over HTTPS to discover rel="me" links. + +**Public Interface**: + +```python +from typing import Optional +import requests + +class HTMLFetcherService: + """ + Fetch user's homepage over HTTPS with security safeguards. + """ + + def __init__( + self, + timeout: int = 10, + max_redirects: int = 5, + max_size: int = 5 * 1024 * 1024 # 5MB + ): + """ + Initialize HTML fetcher service. + + Args: + timeout: HTTP request timeout in seconds (default: 10) + max_redirects: Maximum redirects to follow (default: 5) + max_size: Maximum response size in bytes (default: 5MB) + """ + self.timeout = timeout + self.max_redirects = max_redirects + self.max_size = max_size + + def fetch_site(self, domain: str) -> Optional[str]: + """ + Fetch site HTML content over HTTPS. + + Args: + domain: Domain to fetch (e.g., "example.com") + + Returns: + HTML content as string, or None if fetch fails + + Raises: + No exceptions raised - all errors logged and None returned + """ +``` + +**Implementation Details**: + +```python +def fetch_site(self, domain: str) -> Optional[str]: + """Fetch site HTML content over HTTPS.""" + url = f"https://{domain}" + + try: + # Fetch with security limits + response = requests.get( + url, + timeout=self.timeout, + allow_redirects=True, + max_redirects=self.max_redirects, + verify=True, # SECURITY: Enforce SSL certificate verification + headers={ + 'User-Agent': 'Gondulf/1.0.0 IndieAuth (+https://github.com/yourusername/gondulf)' + } + ) + response.raise_for_status() + + # SECURITY: Check response size to prevent memory exhaustion + content_length = int(response.headers.get('Content-Length', 0)) + if content_length > self.max_size: + logger.warning(f"Response too large for {domain}: {content_length} bytes") + return None + + # Check actual content size (Content-Length may be absent) + if len(response.content) > self.max_size: + logger.warning(f"Response content too large for {domain}: {len(response.content)} bytes") + return None + + logger.info(f"Successfully fetched {domain}: {len(response.content)} bytes") + return response.text + + except requests.exceptions.SSLError as e: + logger.error(f"SSL verification failed for {domain}: {e}") + return None + except requests.exceptions.Timeout: + logger.error(f"Timeout fetching {domain} after {self.timeout}s") + return None + except requests.exceptions.TooManyRedirects: + logger.error(f"Too many redirects for {domain}") + return None + except requests.exceptions.HTTPError as e: + logger.error(f"HTTP error fetching {domain}: {e}") + return None + except Exception as e: + logger.error(f"Unexpected error fetching {domain}: {e}") + return None +``` + +**Dependencies**: +- `requests` library (already in pyproject.toml) +- Python standard library: typing +- Phase 1 logging configuration + +**Error Handling**: +- SSL verification failure: Log error, return None (security: reject invalid certificates) +- Timeout: Log error, return None (configurable timeout via __init__) +- HTTP errors (404, 500, etc.): Log error with status code, return None +- Size limit exceeded: Log warning, return None (prevent DoS) +- Too many redirects: Log error, return None (prevent redirect loops) +- Generic exceptions: Log error, return None (fail-safe) + +**Security Considerations**: +- HTTPS only (hardcoded in URL) +- SSL certificate verification enforced (verify=True, cannot be disabled) +- Response size limit (5MB default, configurable) +- Timeout to prevent hanging (10s default, configurable) +- Redirect limit (5 max, configurable) +- User-Agent header identifies Gondulf for server logs + +**Testing Requirements**: +- ✅ Successful HTTPS fetch returns HTML content +- ✅ SSL verification failure returns None +- ✅ Timeout returns None +- ✅ HTTP error codes (404, 500) return None +- ✅ Redirects followed (up to max_redirects) +- ✅ Too many redirects returns None +- ✅ Content-Length exceeds max_size returns None +- ✅ Actual content exceeds max_size returns None +- ✅ Custom User-Agent sent in request + +--- + +### 2. rel="me" Email Discovery Service + +**File**: `src/gondulf/relme.py` + +**Purpose**: Parse HTML to discover email addresses from rel="me" links following IndieWeb standards. + +**Public Interface**: + +```python +from typing import Optional +from bs4 import BeautifulSoup +import re + +class RelMeDiscoveryService: + """ + Discover email addresses from rel="me" links in HTML. + + Follows IndieWeb rel="me" standard: https://indieweb.org/rel-me + """ + + def discover_email(self, html_content: str) -> Optional[str]: + """ + Parse HTML and discover email from rel="me" link. + + Args: + html_content: HTML content as string + + Returns: + Email address or None if not found + + Raises: + No exceptions raised - all errors logged and None returned + """ + + def validate_email_format(self, email: str) -> bool: + """ + Validate email address format (RFC 5322 simplified). + + Args: + email: Email address to validate + + Returns: + True if valid format, False otherwise + """ +``` + +**Implementation Details**: + +```python +def discover_email(self, html_content: str) -> Optional[str]: + """Parse HTML and discover email from rel='me' link.""" + try: + # Parse HTML (BeautifulSoup handles malformed HTML gracefully) + soup = BeautifulSoup(html_content, 'html.parser') + + # Find all rel="me" links - both and tags + # Case-insensitive matching via BeautifulSoup + me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me') + + # Look for mailto: links + for link in me_links: + href = link.get('href', '') + if href.startswith('mailto:'): + # Extract email from mailto: URL + email = href.replace('mailto:', '').strip() + + # Remove query parameters if present (e.g., mailto:user@example.com?subject=Hello) + if '?' in email: + email = email.split('?')[0] + + # Validate email format + if self.validate_email_format(email): + logger.info(f"Discovered email via rel='me': {email[:3]}***@{email.split('@')[1]}") + return email + else: + logger.warning(f"Found rel='me' mailto link with invalid email format: {email}") + + logger.warning("No rel='me' mailto: link found in HTML") + return None + + except Exception as e: + logger.error(f"Failed to parse HTML for rel='me' links: {e}") + return None + +def validate_email_format(self, email: str) -> bool: + """Validate email address format (RFC 5322 simplified).""" + # Basic format validation + email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' + + if not re.match(email_regex, email): + return False + + # Length check (RFC 5321 maximum) + if len(email) > 254: + return False + + # Must have exactly one @ + if email.count('@') != 1: + return False + + # Domain must have at least one dot + local, domain = email.split('@') + if '.' not in domain: + return False + + return True +``` + +**Dependencies**: +- `beautifulsoup4>=4.12.0` (NEW - add to pyproject.toml) +- `html.parser` (Python standard library, used by BeautifulSoup) +- `re` (Python standard library) +- Phase 1 logging configuration + +**Error Handling**: +- Malformed HTML: BeautifulSoup handles gracefully, continues parsing +- Missing rel="me" links: Log warning, return None +- Invalid email format in link: Log warning, skip link, continue searching +- Multiple rel="me" mailto links: Return first valid one +- Empty href attribute: Skip link, continue searching +- Exception during parsing: Log error, return None + +**Security Considerations**: +- No script execution: BeautifulSoup only extracts attributes, never executes JavaScript +- Email validation: Strict format checking prevents injection +- Link extraction only: No rendering or evaluation of HTML +- Partial masking in logs: Only log first 3 chars of email (privacy) + +**Testing Requirements**: +- ✅ Discovery from `` tag +- ✅ Discovery from `` tag +- ✅ Multiple rel="me" links: select first mailto +- ✅ Malformed HTML handled gracefully +- ✅ Missing rel="me" links returns None +- ✅ Invalid email format in link returns None (but logs warning) +- ✅ Empty href returns None +- ✅ Non-mailto rel="me" links ignored (e.g., https:// links) +- ✅ mailto with query parameters (e.g., ?subject=Hi) strips params +- ✅ Email validation: valid formats accepted +- ✅ Email validation: invalid formats rejected (no @, no domain, too long, etc.) + +--- + +### 3. Domain Verification Service + +**File**: `src/gondulf/domain_verification.py` + +**Purpose**: Orchestrate two-factor domain verification (DNS TXT + Email via rel="me"). + +**Public Interface**: + +```python +from typing import Tuple, Optional +from .dns import DNSService +from .html_fetcher import HTMLFetcherService +from .relme import RelMeDiscoveryService +from .email import EmailService +from .storage import CodeStorage +from .database.connection import DatabaseConnection +import secrets + +class DomainVerificationService: + """ + Two-factor domain verification service. + + Verifies domain ownership through: + 1. DNS TXT record verification (_gondulf.{domain} = verified) + 2. Email verification via rel="me" discovery + """ + + def __init__( + self, + dns_service: DNSService, + html_fetcher: HTMLFetcherService, + relme_discovery: RelMeDiscoveryService, + email_service: EmailService, + code_storage: CodeStorage, + database: DatabaseConnection, + code_ttl: int = 900 # 15 minutes + ): + """ + Initialize domain verification service. + + Args: + dns_service: DNS service for TXT record verification + html_fetcher: HTML fetcher service + relme_discovery: rel="me" email discovery service + email_service: Email service for sending codes + code_storage: In-memory storage for verification codes + database: Database connection for storing verified domains + code_ttl: Verification code TTL in seconds (default: 900 = 15 min) + """ + + def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]: + """ + Start domain verification process. + + Steps: + 1. Verify DNS TXT record exists + 2. Fetch user's homepage + 3. Discover email from rel="me" link + 4. Generate and send verification code + + Args: + domain: Domain to verify (e.g., "example.com") + + Returns: + Tuple of (success, discovered_email_masked, error_message) + - success: True if code sent, False if verification cannot start + - discovered_email_masked: Email with partial masking (e.g., "u***@example.com") + - error_message: Error description if success=False, None otherwise + """ + + def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, Optional[str], Optional[str]]: + """ + Verify submitted code. + + Args: + email: Email address (discovered from rel="me") + submitted_code: 6-digit code entered by user + + Returns: + Tuple of (success, domain, error_message) + - success: True if code valid, False otherwise + - domain: User's verified domain if success=True + - error_message: Error description if success=False + """ + + def is_domain_verified(self, domain: str) -> bool: + """ + Check if domain is already verified (cached in database). + + Args: + domain: Domain to check + + Returns: + True if domain previously verified, False otherwise + """ +``` + +**Implementation Details**: + +```python +def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]: + """Start domain verification process.""" + logger.info(f"Starting domain verification: {domain}") + + # Step 1: Verify DNS TXT record (first factor) + logger.debug(f"Verifying DNS TXT record for {domain}") + dns_verified = self.dns_service.verify_txt_record(domain, "verified") + + if not dns_verified: + error = ( + f"DNS verification failed. TXT record not found for _gondulf.{domain}. " + f"Please add: Type=TXT, Name=_gondulf.{domain}, Value=verified" + ) + logger.warning(f"DNS verification failed: {domain}") + return False, None, error + + logger.info(f"DNS TXT record verified: {domain}") + + # Step 2: Fetch site homepage + logger.debug(f"Fetching homepage for {domain}") + html = self.html_fetcher.fetch_site(domain) + + if html is None: + error = ( + f"Could not fetch site at https://{domain}. " + f"Please ensure site is accessible via HTTPS with valid SSL certificate." + ) + logger.warning(f"Site fetch failed: {domain}") + return False, None, error + + logger.info(f"Successfully fetched homepage: {domain}") + + # Step 3: Discover email from rel="me" (second factor discovery) + logger.debug(f"Discovering email via rel='me' for {domain}") + email = self.relme_discovery.discover_email(html) + + if email is None: + error = ( + 'No rel="me" mailto: link found on homepage. ' + f'Please add to https://{domain}: ' + '' + ) + logger.warning(f"rel='me' discovery failed: {domain}") + return False, None, error + + logger.info(f"Email discovered via rel='me' for {domain}: {email[:3]}***") + + # Step 4: Check rate limiting + if self._is_rate_limited(domain): + error = ( + f"Rate limit exceeded for {domain}. " + f"Please wait before requesting another verification code." + ) + logger.warning(f"Rate limit exceeded: {domain}") + return False, email, error + + # Step 5: Generate verification code + code = self._generate_code() + + # Step 6: Store code with metadata + self.code_storage.store(email, code, ttl=self.code_ttl) + + # Store metadata for rate limiting and domain association + self._store_code_metadata(email, domain) + + logger.debug(f"Verification code generated and stored for {email[:3]}***") + + # Step 7: Send verification email (second factor verification) + logger.debug(f"Sending verification email to {email[:3]}***") + email_sent = self.email_service.send_verification_email(email, code) + + if not email_sent: + # Clean up stored code if email fails + self.code_storage.delete(email) + error = ( + f"Failed to send verification code to {email}. " + f"Please check email address in rel='me' link and try again." + ) + logger.error(f"Email send failed: {email[:3]}***") + return False, email, error + + logger.info(f"Verification code sent successfully to {email[:3]}***") + + # Mask email for display: u***@example.com + email_masked = self._mask_email(email) + + return True, email_masked, None + +def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, Optional[str], Optional[str]]: + """Verify submitted code.""" + logger.info(f"Verifying code for {email[:3]}***") + + # Retrieve stored code + stored_code = self.code_storage.get(email) + + if stored_code is None: + logger.warning(f"No verification code found for {email[:3]}***") + return False, None, "No verification code found. Please request a new code." + + # Get code metadata + metadata = self._get_code_metadata(email) + if metadata is None: + logger.error(f"Code found but metadata missing for {email[:3]}***") + return False, None, "Verification error. Please request a new code." + + domain = metadata['domain'] + attempts = metadata.get('attempts', 0) + + # Check attempt limit (prevent brute force) + if attempts >= 3: + logger.warning(f"Too many attempts for {email[:3]}***") + self.code_storage.delete(email) + self._delete_code_metadata(email) + return False, None, "Too many attempts. Please request a new code." + + # Increment attempt counter + self._increment_attempts(email) + + # Verify code using constant-time comparison (SECURITY: prevent timing attacks) + if not secrets.compare_digest(submitted_code, stored_code): + logger.warning(f"Invalid code submitted for {email[:3]}***") + return False, None, f"Invalid code. {3 - attempts - 1} attempts remaining." + + # Code is valid - clean up and mark domain as verified + logger.info(f"Code verified successfully for {domain}") + + self.code_storage.delete(email) + self._delete_code_metadata(email) + + # Store verified domain in database + self._store_verified_domain(domain) + + return True, domain, None + +def is_domain_verified(self, domain: str) -> bool: + """Check if domain already verified.""" + with self.database.get_connection() as conn: + result = conn.execute( + "SELECT verified FROM domains WHERE domain = ?", + (domain,) + ).fetchone() + + if result and result['verified']: + logger.debug(f"Domain already verified: {domain}") + return True + + return False + +def _generate_code(self) -> str: + """Generate 6-digit verification code.""" + return ''.join(secrets.choice('0123456789') for _ in range(6)) + +def _mask_email(self, email: str) -> str: + """Mask email for display: u***@example.com""" + local, domain = email.split('@') + if len(local) <= 1: + return f"{local[0]}***@{domain}" + return f"{local[0]}***@{domain}" + +def _is_rate_limited(self, domain: str) -> bool: + """ + Check if domain is rate limited. + + Rate limit: Max 3 codes per domain per hour. + """ + # TODO: Implement rate limiting using code metadata + # For Phase 2, we'll implement simple in-memory tracking + # Future: Use Redis for distributed rate limiting + return False # Placeholder - implement in actual code + +def _store_code_metadata(self, email: str, domain: str) -> None: + """Store code metadata for rate limiting and domain association.""" + # TODO: Implement metadata storage + # Store: email -> {domain, created_at, attempts} + pass + +def _get_code_metadata(self, email: str) -> Optional[dict]: + """Retrieve code metadata.""" + # TODO: Implement metadata retrieval + # Return: {domain, created_at, attempts} + return {'domain': 'example.com', 'attempts': 0} # Placeholder + +def _delete_code_metadata(self, email: str) -> None: + """Delete code metadata.""" + # TODO: Implement metadata deletion + pass + +def _increment_attempts(self, email: str) -> None: + """Increment attempt counter for email.""" + # TODO: Implement attempt increment + pass + +def _store_verified_domain(self, domain: str) -> None: + """Store verified domain in database.""" + from datetime import datetime + + with self.database.get_connection() as conn: + conn.execute( + """ + INSERT OR REPLACE INTO domains (domain, verification_method, verified, verified_at, last_dns_check) + VALUES (?, ?, ?, ?, ?) + """, + (domain, 'two_factor', True, datetime.utcnow(), datetime.utcnow()) + ) + conn.commit() + + logger.info(f"Domain verification stored in database: {domain}") +``` + +**Dependencies**: +- All Phase 1 services (DNS, Email, Storage, Database) +- HTML fetcher service (Phase 2) +- rel="me" discovery service (Phase 2) +- Python standard library: secrets, datetime + +**Error Handling**: +- DNS verification failure: Return error with setup instructions +- Site fetch failure: Return error with troubleshooting steps +- rel="me" discovery failure: Return error with HTML example +- Email send failure: Return error, clean up stored code +- Code not found: Return error, suggest requesting new code +- Code expired: Handled by CodeStorage TTL +- Too many attempts: Return error, invalidate code +- Invalid code: Return error with remaining attempts +- Rate limit exceeded: Return error, suggest waiting + +**Security Considerations**: +- Two-factor verification: Both DNS and email required +- Constant-time code comparison: Prevent timing attacks (secrets.compare_digest) +- Rate limiting: Max 3 codes per domain per hour (prevents abuse) +- Attempt limiting: Max 3 code submission attempts (prevents brute force) +- Single-use codes: Deleted after successful verification +- Email masking in logs: Only log partial email (privacy) +- No email storage: Email used only during verification, never persisted + +**Testing Requirements**: +- ✅ Full verification flow: DNS → rel="me" → email → code verification +- ✅ DNS verification failure blocks flow +- ✅ Site fetch failure blocks flow +- ✅ rel="me" discovery failure blocks flow +- ✅ Email send failure cleans up stored code +- ✅ Code verification success stores domain in database +- ✅ Code verification failure decrements remaining attempts +- ✅ Too many attempts invalidates code +- ✅ Invalid code returns error with attempts remaining +- ✅ Code expiration handled by storage layer +- ✅ Rate limiting prevents excessive code requests +- ✅ Already verified domain check works +- ✅ Email masking works correctly + +--- + +### 4. Domain Verification Endpoints + +**File**: `src/gondulf/routers/verification.py` + +**Purpose**: HTTP API endpoints for user interaction during verification flow. + +**Public Interface**: + +```python +from fastapi import APIRouter, HTTPException, Depends +from pydantic import BaseModel, Field +from typing import Optional + +router = APIRouter(prefix="/api/verify", tags=["verification"]) + +# Request/Response Models +class VerificationStartRequest(BaseModel): + """Request to start domain verification.""" + domain: str = Field( + ..., + min_length=3, + max_length=253, + description="Domain to verify (e.g., 'example.com')" + ) + +class VerificationStartResponse(BaseModel): + """Response from starting verification.""" + success: bool + email_masked: Optional[str] = Field(None, description="Partially masked email (e.g., 'u***@example.com')") + error: Optional[str] = Field(None, description="Error message if success=False") + +class VerificationCodeRequest(BaseModel): + """Request to verify code.""" + email: str = Field(..., description="Email address discovered from rel='me'") + code: str = Field(..., min_length=6, max_length=6, pattern="^[0-9]{6}$", description="6-digit verification code") + +class VerificationCodeResponse(BaseModel): + """Response from code verification.""" + success: bool + domain: Optional[str] = Field(None, description="Verified domain if success=True") + error: Optional[str] = Field(None, description="Error message if success=False") + +# Endpoints +@router.post("/start", response_model=VerificationStartResponse) +async def start_verification( + request: VerificationStartRequest, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> VerificationStartResponse: + """ + Start domain verification process. + + Steps: + 1. Verify DNS TXT record exists + 2. Discover email from rel="me" link + 3. Send verification code to discovered email + + Returns masked email on success, error message on failure. + """ + +@router.post("/code", response_model=VerificationCodeResponse) +async def verify_code( + request: VerificationCodeRequest, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> VerificationCodeResponse: + """ + Verify submitted code. + + Returns verified domain on success, error message on failure. + """ +``` + +**Implementation Details**: + +```python +@router.post("/start", response_model=VerificationStartResponse) +async def start_verification( + request: VerificationStartRequest, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> VerificationStartResponse: + """Start domain verification process.""" + logger.info(f"Verification start request: {request.domain}") + + # Normalize domain (lowercase, remove trailing slash) + domain = request.domain.lower().rstrip('/') + + # Remove protocol if present + if domain.startswith('http://') or domain.startswith('https://'): + domain = domain.split('://', 1)[1] + + # Remove path if present + if '/' in domain: + domain = domain.split('/')[0] + + # Validate domain format (basic validation) + if not domain or '.' not in domain: + logger.warning(f"Invalid domain format: {request.domain}") + return VerificationStartResponse( + success=False, + email_masked=None, + error="Invalid domain format. Please provide a valid domain (e.g., 'example.com')." + ) + + # Start verification + success, email_masked, error = domain_verification.start_verification(domain) + + if not success: + logger.warning(f"Verification start failed for {domain}: {error}") + return VerificationStartResponse( + success=False, + email_masked=email_masked, + error=error + ) + + logger.info(f"Verification started successfully for {domain}") + return VerificationStartResponse( + success=True, + email_masked=email_masked, + error=None + ) + +@router.post("/code", response_model=VerificationCodeResponse) +async def verify_code( + request: VerificationCodeRequest, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> VerificationCodeResponse: + """Verify submitted code.""" + logger.info(f"Code verification request for email: {request.email[:3]}***") + + # Verify code + success, domain, error = domain_verification.verify_code(request.email, request.code) + + if not success: + logger.warning(f"Code verification failed for {request.email[:3]}***: {error}") + return VerificationCodeResponse( + success=False, + domain=None, + error=error + ) + + logger.info(f"Code verified successfully for domain: {domain}") + return VerificationCodeResponse( + success=True, + domain=domain, + error=None + ) +``` + +**Dependencies**: +- FastAPI router and dependency injection +- Pydantic models for request/response validation +- Domain verification service (injected via Depends) +- Phase 1 logging configuration + +**Error Handling**: +- Invalid domain format: Return 200 with success=False, descriptive error +- Pydantic validation errors: Automatic 422 response with validation details +- Service errors: Propagated via success=False in response +- All errors logged at WARNING level +- No 500 errors expected (all errors handled gracefully) + +**Security Considerations**: +- Input validation: Pydantic models enforce constraints +- Domain normalization: Prevent URL injection +- No authentication required: Public endpoints (verification is the authentication) +- Rate limiting: Handled by DomainVerificationService (not endpoint level) +- Email not validated at endpoint level: Service handles validation + +**Testing Requirements**: +- ✅ POST /api/verify/start with valid domain returns success +- ✅ POST /api/verify/start with invalid domain format returns error +- ✅ POST /api/verify/start with DNS failure returns error +- ✅ POST /api/verify/start with rel="me" failure returns error +- ✅ POST /api/verify/start with email send failure returns error +- ✅ POST /api/verify/code with valid code returns domain +- ✅ POST /api/verify/code with invalid code returns error +- ✅ POST /api/verify/code with expired code returns error +- ✅ POST /api/verify/code with missing code returns error +- ✅ POST /api/verify/code with too many attempts returns error +- ✅ Pydantic validation errors return 422 + +--- + +### 5. Authorization Endpoint + +**File**: `src/gondulf/routers/authorization.py` + +**Purpose**: Implement IndieAuth authorization endpoint (`/authorize`) per W3C spec. + +**Public Interface**: + +```python +from fastapi import APIRouter, Request, HTTPException, Depends +from fastapi.responses import RedirectResponse, HTMLResponse +from pydantic import BaseModel, HttpUrl, Field +from typing import Optional, Literal + +router = APIRouter(tags=["indieauth"]) + +# Request Models +class AuthorizeRequest(BaseModel): + """ + IndieAuth authorization request parameters. + + Per W3C IndieAuth specification (Section 5.1): + https://www.w3.org/TR/indieauth/#authorization-request + """ + me: HttpUrl = Field(..., description="User's profile URL (domain identity)") + client_id: HttpUrl = Field(..., description="Client application URL") + redirect_uri: HttpUrl = Field(..., description="Where to redirect after authorization") + state: str = Field(..., min_length=1, max_length=512, description="CSRF protection token") + response_type: Literal["code"] = Field(..., description="Must be 'code' for authorization code flow") + scope: Optional[str] = Field(None, description="Requested scopes (ignored in v1.0.0)") + code_challenge: Optional[str] = Field(None, description="PKCE challenge (not supported in v1.0.0)") + code_challenge_method: Optional[str] = Field(None, description="PKCE method (not supported in v1.0.0)") + +# Endpoints +@router.get("/authorize") +async def authorize( + request: Request, + me: str, + client_id: str, + redirect_uri: str, + state: str, + response_type: str, + scope: Optional[str] = None, + code_challenge: Optional[str] = None, + code_challenge_method: Optional[str] = None, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> HTMLResponse: + """ + IndieAuth authorization endpoint. + + Per W3C IndieAuth specification: + https://www.w3.org/TR/indieauth/#authorization-request + + Flow: + 1. Validate all parameters + 2. Check if domain already verified (skip verification if cached) + 3. If not verified, initiate two-factor verification flow + 4. Display consent screen with client info + 5. On approval, generate authorization code + 6. Redirect to client with code + state + """ +``` + +**Implementation Details** (High-Level - Full implementation too long for this doc): + +```python +@router.get("/authorize") +async def authorize( + request: Request, + me: str, + client_id: str, + redirect_uri: str, + state: str, + response_type: str, + # ... other parameters +) -> HTMLResponse: + """IndieAuth authorization endpoint.""" + + # STEP 1: Validate response_type + if response_type != "code": + # Return error (redirect if possible) + return _error_response( + redirect_uri=redirect_uri, + state=state, + error="unsupported_response_type", + description="Only response_type=code is supported" + ) + + # STEP 2: Validate and normalize 'me' parameter + me_normalized = _validate_and_normalize_me(me) + if me_normalized is None: + return _error_response( + redirect_uri=redirect_uri, + state=state, + error="invalid_request", + description="Invalid 'me' parameter format" + ) + + # STEP 3: Validate client_id + client_valid = _validate_client_id(client_id) + if not client_valid: + return _error_response( + redirect_uri=redirect_uri, + state=state, + error="invalid_client", + description="Invalid client_id" + ) + + # STEP 4: Validate redirect_uri + redirect_valid = _validate_redirect_uri(redirect_uri, client_id) + if not redirect_valid: + # SECURITY: Cannot redirect to invalid URI - display error page + return _error_page("Invalid redirect_uri") + + # STEP 5: Check if domain already verified + domain = _extract_domain_from_me(me_normalized) + + if domain_verification.is_domain_verified(domain): + # Skip verification, go directly to consent + logger.info(f"Domain already verified: {domain}") + return await _show_consent_screen( + me=me_normalized, + client_id=client_id, + redirect_uri=redirect_uri, + state=state + ) + + # STEP 6: Domain not verified - start verification flow + logger.info(f"Starting verification for new domain: {domain}") + + success, email_masked, error = domain_verification.start_verification(domain) + + if not success: + # Verification failed - show error with instructions + return _verification_error_page(domain, error) + + # STEP 7: Show code entry form + return _code_entry_page( + domain=domain, + email_masked=email_masked, + me=me_normalized, + client_id=client_id, + redirect_uri=redirect_uri, + state=state + ) + +# Additional endpoints for verification flow +@router.post("/authorize/verify-code") +async def verify_code_and_consent( + request: Request, + email: str, + code: str, + me: str, + client_id: str, + redirect_uri: str, + state: str, + domain_verification: DomainVerificationService = Depends(get_domain_verification_service) +) -> HTMLResponse: + """ + Verify code and show consent screen. + + Called when user submits verification code during authorization flow. + """ + # Verify code + success, domain, error = domain_verification.verify_code(email, code) + + if not success: + # Code invalid - show error, allow retry + return _code_entry_page_with_error( + domain=_extract_domain_from_me(me), + email_masked=_mask_email(email), + error=error, + me=me, + client_id=client_id, + redirect_uri=redirect_uri, + state=state + ) + + # Code valid - show consent screen + return await _show_consent_screen( + me=me, + client_id=client_id, + redirect_uri=redirect_uri, + state=state + ) + +@router.post("/authorize/consent") +async def handle_consent( + request: Request, + action: Literal["approve", "deny"], + me: str, + client_id: str, + redirect_uri: str, + state: str, + code_storage: CodeStorage = Depends(get_code_storage) +) -> RedirectResponse: + """ + Handle user consent decision. + + Called when user approves or denies authorization. + """ + if action == "deny": + # User denied - redirect with error + return RedirectResponse( + url=f"{redirect_uri}?error=access_denied&error_description=User denied authorization&state={state}", + status_code=302 + ) + + # User approved - generate authorization code + auth_code = _generate_authorization_code() + + # Store code in memory with metadata + code_storage.store(auth_code, { + 'me': me, + 'client_id': client_id, + 'redirect_uri': redirect_uri, + 'state': state, + 'created_at': datetime.utcnow() + }, ttl=600) # 10 minutes + + logger.info(f"Authorization code generated for {me} / {client_id}") + + # Redirect to client with code + state + return RedirectResponse( + url=f"{redirect_uri}?code={auth_code}&state={state}", + status_code=302 + ) + +# Helper functions (implementations not shown for brevity) +def _validate_and_normalize_me(me: str) -> Optional[str]: + """Validate and normalize 'me' parameter per IndieAuth spec.""" + pass + +def _validate_client_id(client_id: str) -> bool: + """Validate client_id is a valid URL.""" + pass + +def _validate_redirect_uri(redirect_uri: str, client_id: str) -> bool: + """Validate redirect_uri against client_id.""" + pass + +def _extract_domain_from_me(me: str) -> str: + """Extract domain from 'me' URL.""" + pass + +async def _show_consent_screen(...) -> HTMLResponse: + """Render consent screen HTML.""" + pass + +def _code_entry_page(...) -> HTMLResponse: + """Render code entry page HTML.""" + pass + +def _error_response(...) -> RedirectResponse: + """Generate OAuth 2.0 error redirect.""" + pass + +def _generate_authorization_code() -> str: + """Generate cryptographically secure authorization code.""" + return secrets.token_urlsafe(32) # 256 bits +``` + +**Dependencies**: +- FastAPI router, Request, Response types +- Pydantic models for validation +- Domain verification service (Phase 2) +- Code storage (Phase 1) +- HTML templates (new - Jinja2) +- Python standard library: secrets, datetime + +**Error Handling**: +- Invalid response_type: Redirect with `unsupported_response_type` error +- Invalid me parameter: Redirect with `invalid_request` error +- Invalid client_id: Redirect with `invalid_client` error +- Invalid redirect_uri: Display error page (cannot redirect) +- DNS verification failure: Display error page with setup instructions +- rel="me" discovery failure: Display error page with HTML example +- Email send failure: Display error page with troubleshooting +- Code verification failure: Display code entry page with error, allow retry +- User denies consent: Redirect with `access_denied` error +- All errors follow OAuth 2.0 error response format + +**Security Considerations**: +- HTTPS only: Enforced by middleware (production) +- redirect_uri validation: Prevent open redirect attacks +- State parameter: Passed through, client validates (CSRF protection) +- Authorization code: Cryptographically secure (256 bits) +- Code single-use: Enforced by token endpoint (Phase 3) +- Code expiration: 10 minutes TTL +- Domain verification: Two-factor required before code generation +- No client secrets: All clients are public per IndieAuth spec + +**Testing Requirements**: +- ✅ GET /authorize with valid parameters shows verification or consent +- ✅ GET /authorize with invalid response_type returns error +- ✅ GET /authorize with invalid me parameter returns error +- ✅ GET /authorize with invalid client_id returns error +- ✅ GET /authorize with invalid redirect_uri shows error page +- ✅ GET /authorize with already verified domain skips to consent +- ✅ POST /authorize/verify-code with valid code shows consent +- ✅ POST /authorize/verify-code with invalid code shows error +- ✅ POST /authorize/consent with action=approve generates code and redirects +- ✅ POST /authorize/consent with action=deny redirects with access_denied +- ✅ Authorization code stored in memory with correct metadata +- ✅ Authorization code expires after 10 minutes +- ✅ State parameter passed through all steps + +--- + +## Data Flow + +### Complete Two-Factor Verification Flow + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ User / Client Application │ +└───────────────────────────────┬─────────────────────────────────┘ + │ + │ GET /authorize?me=example.com&... + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Authorization Endpoint │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ 1. Validate parameters (me, client_id, redirect_uri, │ │ +│ │ state, response_type) │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────────────▼───────────────────────────────┐ │ +│ │ 2. Check if domain already verified in database │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌────────┴────────┐ │ +│ │ │ │ +│ │ Verified? │ │ +│ │ │ │ +│ ┌─────────┴─────No─────────┴─────────┐ │ +│ │ │ │ +│ │ YES │ NO │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────────────┐ │ +│ │ Skip to Consent │ │ Start Verification Flow │ │ +│ │ (Step 9) │ │ (Step 3) │ │ +│ └──────────────────┘ └─────────┬────────────────┘ │ +│ │ │ +└───────────────────────────────────────────────┼──────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Domain Verification Service (Two-Factor) │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ 3. Verify DNS TXT Record (First Factor) │ │ +│ │ Query: _gondulf.example.com TXT │ │ +│ │ Expected: "verified" │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌────────┴────────┐ │ +│ │ TXT found? │ │ +│ ┌─────────┴─────No─────────┴─────────┐ │ +│ │ YES │ NO │ +│ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────────────┐ │ +│ │ Continue to │ │ FAIL: Display error │ │ +│ │ Step 4 │ │ "Add DNS TXT record" │ │ +│ └─────────┬────────┘ └──────────────────────────┘ │ +│ │ │ +│ ┌─────────▼────────────────────────────────────────────────┐ │ +│ │ 4. Fetch User's Homepage via HTTPS │ │ +│ │ URL: https://example.com │ │ +│ │ Timeout: 10s, Max size: 5MB, Verify SSL │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌────────┴────────┐ │ +│ │ Fetch success? │ │ +│ ┌─────────┴─────No─────────┴─────────┐ │ +│ │ YES │ NO │ +│ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────────────┐ │ +│ │ Continue to │ │ FAIL: Display error │ │ +│ │ Step 5 │ │ "Site unreachable" │ │ +│ └─────────┬────────┘ └──────────────────────────┘ │ +│ │ │ +│ ┌─────────▼────────────────────────────────────────────────┐ │ +│ │ 5. Discover Email via rel="me" (Second Factor Discovery)│ │ +│ │ Parse HTML for: │ │ +│ │ Extract and validate email format │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌────────┴────────┐ │ +│ │ Email found? │ │ +│ ┌─────────┴─────No─────────┴─────────┐ │ +│ │ YES │ NO │ +│ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────────────┐ │ +│ │ Continue to │ │ FAIL: Display error │ │ +│ │ Step 6 │ │ "Add rel='me' link" │ │ +│ └─────────┬────────┘ └──────────────────────────┘ │ +│ │ │ +│ ┌─────────▼────────────────────────────────────────────────┐ │ +│ │ 6. Generate and Send Verification Code │ │ +│ │ (Second Factor Verification) │ │ +│ │ - Generate 6-digit code (cryptographically secure) │ │ +│ │ - Store code in memory (TTL: 15 minutes) │ │ +│ │ - Send code to discovered email via SMTP │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +└─────────────────────────────┼────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Display Code Entry Form │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ "Verification code sent to u***@example.com" │ │ +│ │ [Enter 6-digit code: ______] │ │ +│ │ [Submit] │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +└─────────────────────────────┼────────────────────────────────────┘ + │ + │ POST /authorize/verify-code + │ {email, code, me, client_id, ...} + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Domain Verification Service (Continued) │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ 7. Verify Submitted Code │ │ +│ │ - Retrieve stored code from memory │ │ +│ │ - Check expiration (15 min TTL) │ │ +│ │ - Check attempts (max 3) │ │ +│ │ - Constant-time compare submitted vs stored │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌────────┴────────┐ │ +│ │ Code valid? │ │ +│ ┌─────────┴─────No─────────┴─────────┐ │ +│ │ YES │ NO │ +│ ▼ ▼ │ +│ ┌──────────────────┐ ┌──────────────────────────┐ │ +│ │ Store verified │ │ Show error, allow retry │ │ +│ │ domain in DB │ │ (if attempts remaining) │ │ +│ └─────────┬────────┘ └──────────────────────────┘ │ +│ │ │ +│ ┌─────────▼────────────────────────────────────────────────┐ │ +│ │ 8. Domain Verified (Two-Factor Complete) │ │ +│ │ - DNS TXT verified ✓ │ │ +│ │ - Email verified ✓ │ │ +│ │ - Store in database: verification_method='two_factor' │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +└─────────────────────────────┼────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Display Consent Screen │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ "Sign in to [App Name] as example.com" │ │ +│ │ │ │ +│ │ Client: https://client.example.com │ │ +│ │ Redirect: https://client.example.com/callback │ │ +│ │ │ │ +│ │ [Approve] [Deny] │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +└─────────────────────────────┼────────────────────────────────────┘ + │ + │ POST /authorize/consent + │ {action: "approve", ...} + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Authorization Endpoint (Continued) │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ 9. Generate Authorization Code │ │ +│ │ - Generate cryptographically secure code (256 bits) │ │ +│ │ - Store in memory with metadata: │ │ +│ │ • me (user's domain) │ │ +│ │ • client_id │ │ +│ │ • redirect_uri │ │ +│ │ • state │ │ +│ │ • TTL: 10 minutes │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────────────▼───────────────────────────────┐ │ +│ │ 10. Redirect to Client with Code │ │ +│ │ {redirect_uri}?code={code}&state={state} │ │ +│ └──────────────────────────┬───────────────────────────────┘ │ +└─────────────────────────────┼────────────────────────────────────┘ + │ + │ HTTP 302 Redirect + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Client Application │ +│ • Receives authorization code │ +│ • Validates state parameter (CSRF protection) │ +│ • Exchanges code for token (Phase 3: Token Endpoint) │ +└─────────────────────────────────────────────────────────────────┘ +``` + +### State Transitions + +**Domain Verification States**: +1. **Unverified**: Domain never seen before +2. **DNS Verified**: TXT record confirmed +3. **Email Discovered**: rel="me" link found +4. **Code Sent**: Verification code sent to email +5. **Fully Verified**: Code verified, stored in database +6. **Cached**: Domain verification cached (skip steps 1-5 on future auth) + +**Authorization Flow States**: +1. **Request Received**: Parameters validated +2. **Domain Check**: Checking if domain verified +3. **Verification In Progress**: User entering code +4. **Consent Pending**: User viewing consent screen +5. **Approved**: User approved, code generated +6. **Denied**: User denied, error redirect +7. **Complete**: Redirected to client with code + +### Error Paths + +**DNS Verification Failure**: +``` +/authorize → Validate params → Check DNS TXT → [NOT FOUND] + → Display error page with instructions + → User adds TXT record, clicks "Retry" + → Loop back to Check DNS TXT +``` + +**rel="me" Discovery Failure**: +``` +/authorize → DNS verified → Fetch site → Discover email → [NOT FOUND] + → Display error page with HTML example + → User adds , clicks "Retry" + → Loop back to Fetch site +``` + +**Email Send Failure**: +``` +/authorize → DNS + rel="me" OK → Send email → [SMTP ERROR] + → Display error page with troubleshooting + → User checks SMTP config, clicks "Retry" + → Loop back to Send email +``` + +**Invalid Code**: +``` +/authorize/verify-code → Verify code → [INVALID] + → Display code entry form with error + → "Invalid code. 2 attempts remaining." + → User enters code again + → Loop back to Verify code +``` + +**Rate Limit Exceeded**: +``` +/authorize → Start verification → Check rate limit → [EXCEEDED] + → Display error: "Too many attempts, wait 1 hour" + → User waits, tries again later +``` + +## API Endpoints + +### POST /api/verify/start + +**Purpose**: Start domain verification process. + +**Request**: +```json +{ + "domain": "example.com" +} +``` + +**Success Response** (200 OK): +```json +{ + "success": true, + "email_masked": "u***@example.com", + "error": null +} +``` + +**Error Response** (200 OK with success=false): +```json +{ + "success": false, + "email_masked": null, + "error": "DNS TXT record not found for _gondulf.example.com. Please add: Type=TXT, Name=_gondulf.example.com, Value=verified" +} +``` + +**Validation Errors** (422 Unprocessable Entity): +```json +{ + "detail": [ + { + "loc": ["body", "domain"], + "msg": "field required", + "type": "value_error.missing" + } + ] +} +``` + +**Rate Limiting**: +- Max 3 requests per domain per hour +- Enforced by DomainVerificationService + +**Authentication**: None required (public endpoint) + +--- + +### POST /api/verify/code + +**Purpose**: Verify submitted 6-digit code. + +**Request**: +```json +{ + "email": "user@example.com", + "code": "123456" +} +``` + +**Success Response** (200 OK): +```json +{ + "success": true, + "domain": "example.com", + "error": null +} +``` + +**Error Response** (200 OK with success=false): +```json +{ + "success": false, + "domain": null, + "error": "Invalid code. 2 attempts remaining." +} +``` + +**Validation Errors** (422 Unprocessable Entity): +```json +{ + "detail": [ + { + "loc": ["body", "code"], + "msg": "string does not match regex \"^[0-9]{6}$\"", + "type": "value_error.str.regex" + } + ] +} +``` + +**Rate Limiting**: +- Max 3 attempts per email per code +- Enforced by code verification logic + +**Authentication**: None required (code is the authentication) + +--- + +### GET /authorize + +**Purpose**: IndieAuth authorization endpoint. + +**Query Parameters**: +- `me` (required): User's profile URL (e.g., "https://example.com") +- `client_id` (required): Client application URL +- `redirect_uri` (required): Where to redirect after authorization +- `state` (required): CSRF protection token +- `response_type` (required): Must be "code" +- `scope` (optional): Requested scopes (ignored in v1.0.0) +- `code_challenge` (optional): PKCE challenge (not supported in v1.0.0) +- `code_challenge_method` (optional): PKCE method (not supported in v1.0.0) + +**Success Response**: HTML page (verification form or consent screen) + +**Error Redirect** (302 Found): +``` +{redirect_uri}?error=invalid_request&error_description=Invalid+me+parameter&state={state} +``` + +**Error Codes** (OAuth 2.0 standard): +- `invalid_request`: Missing or invalid parameter +- `unauthorized_client`: Client not authorized +- `access_denied`: User denied authorization +- `unsupported_response_type`: response_type not "code" +- `server_error`: Internal server error + +**Error Page** (when redirect not possible): +```html + + +Invalid redirect_uri. Cannot redirect safely.
+ + +``` + +**Rate Limiting**: None at endpoint level (handled by verification service) + +**Authentication**: None initially (domain verification IS the authentication) + +--- + +### POST /authorize/verify-code + +**Purpose**: Verify code during authorization flow. + +**Form Data**: +- `email` (required): Email address from rel="me" +- `code` (required): 6-digit verification code +- `me` (required): User's profile URL +- `client_id` (required): Client application URL +- `redirect_uri` (required): Redirect URI +- `state` (required): State parameter + +**Success Response**: HTML page (consent screen) + +**Error Response**: HTML page (code entry form with error message) + +--- + +### POST /authorize/consent + +**Purpose**: Handle user consent decision. + +**Form Data**: +- `action` (required): "approve" or "deny" +- `me` (required): User's profile URL +- `client_id` (required): Client application URL +- `redirect_uri` (required): Redirect URI +- `state` (required): State parameter + +**Success Response (Approve)** (302 Found): +``` +{redirect_uri}?code={authorization_code}&state={state} +``` + +**Success Response (Deny)** (302 Found): +``` +{redirect_uri}?error=access_denied&error_description=User+denied+authorization&state={state} +``` + +## Data Models + +### Verified Domain (Database Table) + +**Table**: `domains` + +**Schema** (from Phase 1): +```sql +CREATE TABLE domains ( + domain TEXT PRIMARY KEY, + verification_method TEXT NOT NULL, -- 'two_factor' for v1.0.0 + verified BOOLEAN NOT NULL DEFAULT FALSE, + verified_at TIMESTAMP, + last_dns_check TIMESTAMP, + last_email_check TIMESTAMP +); +``` + +**Updated in Phase 2**: Change `verification_method` values from `'email'` / `'txt_record'` to `'two_factor'`. + +**Migration**: `002_update_verification_method.sql`: +```sql +-- Update verification_method values to reflect two-factor requirement +UPDATE domains +SET verification_method = 'two_factor' +WHERE verification_method IN ('email', 'txt_record'); +``` + +**Indexes** (from Phase 1): +```sql +CREATE INDEX idx_domains_domain ON domains(domain); +CREATE INDEX idx_domains_verified ON domains(verified); +``` + +--- + +### Authorization Code (In-Memory) + +**Storage**: Phase 1 CodeStorage with metadata + +**Structure**: +```python +{ + "code": "abc123...", # 43-char base64url (32 bytes) + "me": "https://example.com", + "client_id": "https://client.example.com", + "redirect_uri": "https://client.example.com/callback", + "state": "client-provided-state", + "created_at": datetime, + "expires_at": datetime, # created_at + 10 minutes + "used": False # For Phase 3 token endpoint +} +``` + +**TTL**: 10 minutes (per W3C spec: "shortly after") + +**Storage Location**: Phase 1 CodeStorage service + +--- + +### Verification Code Metadata (In-Memory) + +**Storage**: Additional metadata alongside verification codes + +**Structure**: +```python +{ + "email": "user@example.com", + "domain": "example.com", + "attempts": 0, # Increment on each failed attempt + "created_at": datetime +} +``` + +**Purpose**: Track attempts and associate email with domain for rate limiting. + +**TTL**: Same as verification code (15 minutes) + +## Security Requirements + +### Input Validation + +**Domain Parameter**: +```python +def validate_domain(domain: str) -> Tuple[bool, Optional[str], Optional[str]]: + """ + Validate domain parameter. + + Returns: (is_valid, normalized_domain, error_message) + """ + # Remove protocol if present + if domain.startswith('http://') or domain.startswith('https://'): + domain = domain.split('://', 1)[1] + + # Remove path if present + if '/' in domain: + domain = domain.split('/')[0] + + # Lowercase + domain = domain.lower().strip() + + # Must contain at least one dot + if '.' not in domain: + return False, None, "Domain must contain at least one dot (e.g., example.com)" + + # Must not be empty + if not domain: + return False, None, "Domain cannot be empty" + + # Must not contain invalid characters + if any(c in domain for c in [' ', '@', ':', '?', '#']): + return False, None, "Domain contains invalid characters" + + # Length check + if len(domain) > 253: + return False, None, "Domain too long (max 253 characters)" + + return True, domain, None +``` + +**Email Parameter**: +```python +def validate_email(email: str) -> bool: + """ + Validate email format (RFC 5322 simplified). + + Used by rel="me" discovery service. + """ + email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' + + if not re.match(email_regex, email): + return False + + if len(email) > 254: # RFC 5321 maximum + return False + + if email.count('@') != 1: + return False + + local, domain = email.split('@') + if '.' not in domain: + return False + + return True +``` + +**URL Parameters** (me, client_id, redirect_uri): +```python +def validate_url(url: str, param_name: str) -> Tuple[bool, Optional[str]]: + """ + Validate URL parameter. + + Returns: (is_valid, error_message) + """ + from urllib.parse import urlparse + + try: + parsed = urlparse(url) + except Exception: + return False, f"{param_name} must be a valid URL" + + # Must have scheme and netloc + if not parsed.scheme or not parsed.netloc: + return False, f"{param_name} must be a complete URL (e.g., https://example.com)" + + # Must be http or https + if parsed.scheme not in ['http', 'https']: + return False, f"{param_name} must use http or https" + + # No fragments for 'me' parameter + if param_name == "me" and parsed.fragment: + return False, "me parameter must not contain fragment" + + # No credentials + if parsed.username or parsed.password: + return False, f"{param_name} must not contain credentials" + + return True, None +``` + +--- + +### HTTPS Enforcement + +**Configuration**: +```python +# In production config +if not DEBUG: + # Enforce HTTPS + app.add_middleware(HTTPSRedirectMiddleware) + + # Reject HTTP redirect_uri (except localhost) + if redirect_uri.startswith('http://'): + parsed = urlparse(redirect_uri) + if parsed.hostname not in ['localhost', '127.0.0.1']: + return error_response("redirect_uri must use HTTPS in production") +``` + +**HTML Fetching**: +- HTTPS only (hardcoded `https://` in URL) +- SSL certificate verification enforced (`verify=True`, no option to disable) +- Reject sites with invalid certificates + +--- + +### HTML Parsing Security + +**BeautifulSoup Configuration**: +```python +# Use html.parser (Python standard library, safe for untrusted HTML) +soup = BeautifulSoup(html_content, 'html.parser') +``` + +**Why html.parser**: +- Part of Python standard library (no external C dependencies) +- Designed for untrusted HTML +- No script execution +- No external resource loading +- Handles malformed HTML gracefully + +**Size Limits**: +- Maximum response size: 5MB (configurable) +- Checked both in Content-Length header and actual content + +**Timeout**: +- HTTP request timeout: 10 seconds (configurable) +- Prevents hanging on slow sites + +--- + +### Protection Against Open Redirects + +**redirect_uri Validation**: +```python +def validate_redirect_uri(redirect_uri: str, client_id: str) -> Tuple[bool, Optional[str]]: + """ + Validate redirect_uri against client_id. + + Returns: (is_valid, warning_message) + """ + from urllib.parse import urlparse + + redirect_parsed = urlparse(redirect_uri) + client_parsed = urlparse(client_id) + + # Must be HTTPS (except localhost) + if redirect_parsed.scheme != 'https': + if redirect_parsed.hostname not in ['localhost', '127.0.0.1']: + return False, "redirect_uri must use HTTPS" + + # Must have valid hostname + if not redirect_parsed.hostname: + return False, "redirect_uri must have valid hostname" + + redirect_domain = redirect_parsed.hostname.lower() + client_domain = client_parsed.hostname.lower() + + # Exact match: OK + if redirect_domain == client_domain: + return True, None + + # Subdomain of client: OK + if redirect_domain.endswith('.' + client_domain): + return True, None + + # Different domain: WARNING (display to user, but allow) + warning = ( + f"Warning: Redirect to different domain ({redirect_domain}) " + f"than client ({client_domain}). Ensure you trust this application." + ) + return True, warning +``` + +**Display Warning to User**: +- If redirect_uri domain differs from client_id domain, show warning on consent screen +- User must explicitly approve redirect to different domain +- Prevents phishing via redirect URI manipulation + +--- + +### CSRF Protection + +**State Parameter**: +- Required in authorization request +- Stored with authorization code +- Passed through verification and consent steps +- Returned unchanged in redirect +- Client validates state matches original (client responsibility per OAuth 2.0) + +**Gondulf does NOT validate state** - This is intentional per OAuth 2.0: +- State is opaque to authorization server +- Client generates state, client validates state +- Gondulf only passes it through unchanged + +--- + +### Code Replay Prevention + +**Authorization Code**: +- Single-use enforcement (Phase 3 token endpoint marks as used) +- 10-minute expiration +- Bound to client_id, redirect_uri, me +- Stored in memory (Phase 1 CodeStorage) + +**Verification Code**: +- Single-use: Deleted after successful verification +- 15-minute expiration +- Max 3 attempts before invalidation +- Constant-time comparison (prevent timing attacks) + +## Testing Requirements + +### Unit Tests + +**HTML Fetcher Service** (9 tests): +- ✅ Successful HTTPS fetch returns content +- ✅ SSL verification failure returns None +- ✅ Timeout returns None +- ✅ HTTP error codes (404, 500) return None +- ✅ Redirects followed (up to max) +- ✅ Too many redirects returns None +- ✅ Content-Length exceeds limit returns None +- ✅ Actual content exceeds limit returns None +- ✅ Custom User-Agent sent + +**rel="me" Discovery Service** (12 tests): +- ✅ Discovery from `` tag +- ✅ Discovery from `` tag +- ✅ Multiple rel="me" links: first mailto selected +- ✅ Malformed HTML handled +- ✅ Missing rel="me" returns None +- ✅ Invalid email in link returns None +- ✅ Empty href returns None +- ✅ Non-mailto links ignored +- ✅ mailto with query params strips params +- ✅ Email validation: valid formats +- ✅ Email validation: invalid formats +- ✅ Exception during parsing returns None + +**Domain Verification Service** (15 tests): +- ✅ Full flow: DNS → rel="me" → email → code +- ✅ DNS failure blocks flow +- ✅ Site fetch failure blocks flow +- ✅ rel="me" failure blocks flow +- ✅ Email send failure cleans up code +- ✅ Code verification success stores domain +- ✅ Code verification failure decrements attempts +- ✅ Too many attempts invalidates code +- ✅ Invalid code returns error +- ✅ Code expiration handled +- ✅ Rate limiting works +- ✅ Already verified domain check +- ✅ Email masking correct +- ✅ Constant-time comparison used +- ✅ Metadata tracking works + +**Estimated Unit Test Count**: ~36 tests + +--- + +### Integration Tests + +**Verification Endpoints** (10 tests): +- ✅ POST /api/verify/start success case +- ✅ POST /api/verify/start with invalid domain +- ✅ POST /api/verify/start with DNS failure +- ✅ POST /api/verify/start with rel="me" failure +- ✅ POST /api/verify/start with email send failure +- ✅ POST /api/verify/code success case +- ✅ POST /api/verify/code with invalid code +- ✅ POST /api/verify/code with expired code +- ✅ POST /api/verify/code with missing code +- ✅ POST /api/verify/code with too many attempts + +**Authorization Endpoint** (15 tests): +- ✅ GET /authorize with valid params (already verified domain) +- ✅ GET /authorize with valid params (new domain) +- ✅ GET /authorize with invalid response_type +- ✅ GET /authorize with invalid me parameter +- ✅ GET /authorize with invalid client_id +- ✅ GET /authorize with invalid redirect_uri +- ✅ GET /authorize with missing state +- ✅ POST /authorize/verify-code with valid code +- ✅ POST /authorize/verify-code with invalid code +- ✅ POST /authorize/consent with action=approve +- ✅ POST /authorize/consent with action=deny +- ✅ Authorization code stored with metadata +- ✅ Authorization code expires after 10 min +- ✅ State parameter passed through +- ✅ redirect_uri domain mismatch shows warning + +**Estimated Integration Test Count**: ~25 tests + +--- + +### End-to-End Tests + +**Complete Flows** (5 tests): +- ✅ Full auth flow: /authorize → verify → consent → redirect with code +- ✅ Full auth flow with cached domain (skip verification) +- ✅ User denies consent → redirect with access_denied +- ✅ DNS verification failure → error page → retry → success +- ✅ Invalid code × 3 → error "too many attempts" + +**Estimated E2E Test Count**: ~5 tests + +--- + +### Security Tests + +**Input Validation** (8 tests): +- ✅ Malformed domain rejected +- ✅ Malformed email rejected (during validation) +- ✅ Malformed URL (me, client_id, redirect_uri) rejected +- ✅ URL with credentials rejected +- ✅ URL with fragment rejected (me parameter) +- ✅ Oversized HTML (>5MB) rejected +- ✅ Invalid email in rel="me" logged and skipped +- ✅ SQL injection attempts in domain parameter (should be parameterized) + +**Authentication Security** (5 tests): +- ✅ Expired code rejected +- ✅ Used code rejected (Phase 3) +- ✅ Invalid code rejected +- ✅ Brute force prevented (max 3 attempts) +- ✅ Constant-time comparison used (verify via timing analysis - difficult to test) + +**TLS/HTTPS** (4 tests): +- ✅ HTTP redirect_uri rejected in production +- ✅ Invalid SSL certificate rejected +- ✅ Site fetch over HTTPS only +- ✅ HTTP allowed for localhost only + +**Open Redirect** (3 tests): +- ✅ redirect_uri domain mismatch shows warning +- ✅ Invalid redirect_uri shows error page (no redirect) +- ✅ redirect_uri without hostname rejected + +**Estimated Security Test Count**: ~20 tests + +--- + +### Coverage Target + +**Phase 2 Overall**: 80%+ coverage (same as Phase 1) + +**Critical Code** (95%+ coverage): +- Domain verification service (orchestration logic) +- rel="me" discovery (email extraction) +- Authorization endpoint (parameter validation) +- Security functions (validation, constant-time comparison) + +**Total Estimated Test Count**: ~86 tests + +## Error Handling + +### DNS Verification Failure + +**Error Message**: +``` +DNS Verification Failed + +The DNS TXT record was not found for your domain. + +Please add the following TXT record to your DNS: + Type: TXT + Name: _gondulf.example.com + Value: verified + +DNS changes may take up to 24 hours to propagate. + +[Retry] +``` + +**HTTP Response**: 200 OK (HTML error page) + +**Logging**: WARNING level with domain + +--- + +### rel="me" Discovery Failure + +**Error Message**: +``` +Email Discovery Failed + +No rel="me" email link was found on your homepage. + +Please add the following to https://example.com: + + +This allows us to discover your email address automatically. + +Learn more: https://indieweb.org/rel-me + +[Retry] +``` + +**HTTP Response**: 200 OK (HTML error page) + +**Logging**: WARNING level with domain + +--- + +### Site Unreachable + +**Error Message**: +``` +Site Fetch Failed + +Could not fetch your site at https://example.com + +Please check: +• Site is accessible via HTTPS +• SSL certificate is valid +• No firewall blocking requests + +[Retry] +``` + +**HTTP Response**: 200 OK (HTML error page) + +**Logging**: ERROR level with domain and error details + +--- + +### Email Send Failure + +**Error Message**: +``` +Email Delivery Failed + +Failed to send verification code to u***@example.com + +Please check: +• Email address is correct in your rel="me" link +• Email server is accepting mail +• Check spam/junk folder + +[Retry] +``` + +**HTTP Response**: 200 OK (HTML error page) + +**Logging**: ERROR level with masked email + +--- + +### Invalid Code + +**Error Message**: +``` +Invalid code. 2 attempts remaining. +``` + +**HTTP Response**: 200 OK (code entry form with error) + +**Logging**: WARNING level with masked email + +--- + +### Too Many Attempts + +**Error Message**: +``` +Too Many Attempts + +You have exceeded the maximum number of attempts. + +Please request a new verification code. + +[Request New Code] +``` + +**HTTP Response**: 200 OK (error page with retry link) + +**Logging**: WARNING level with masked email + +--- + +### Rate Limit Exceeded + +**Error Message**: +``` +Rate Limit Exceeded + +Too many verification requests for this domain. + +Please wait 1 hour before requesting another code. +``` + +**HTTP Response**: 200 OK (error page) + +**Logging**: WARNING level with domain + +--- + +### OAuth 2.0 Errors (Authorization Endpoint) + +**Error Redirect Format**: +``` +{redirect_uri}?error={error_code}&error_description={description}&state={state} +``` + +**Error Codes**: +- `invalid_request`: Missing or invalid parameter +- `unauthorized_client`: Client not authorized +- `access_denied`: User denied authorization +- `unsupported_response_type`: response_type not "code" +- `server_error`: Internal server error + +**Example**: +``` +https://client.example.com/callback?error=invalid_request&error_description=Missing+state+parameter&state=abc123 +``` + +**Logging**: WARNING or ERROR level depending on error type + +--- + +### Error Logging Standards + +**Log Levels**: +- **DEBUG**: Normal operations, detailed flow +- **INFO**: Successful operations (code sent, domain verified) +- **WARNING**: Expected errors (invalid code, DNS not found) +- **ERROR**: Unexpected errors (SMTP failure, site unreachable) +- **CRITICAL**: System failures (should not occur in Phase 2) + +**What to Log**: +- ✅ Domain (public information) +- ✅ Email (partial mask: first 3 chars) +- ✅ Error details (for debugging) +- ✅ Request IDs (for correlation) + +**What NOT to Log**: +- ❌ Full email addresses +- ❌ Verification codes +- ❌ Authorization codes +- ❌ User-Agent (GDPR) +- ❌ IP addresses (GDPR) + +## Dependencies + +### New Python Packages + +**Add to pyproject.toml**: +```toml +[project] +dependencies = [ + # ... existing dependencies from Phase 1 + "beautifulsoup4>=4.12.0", # HTML parsing for rel="me" discovery +] +``` + +**Why beautifulsoup4**: +- Robust HTML parsing (handles malformed HTML) +- Safe for untrusted content (no script execution) +- Standard in Python ecosystem +- Pure Python (no C dependencies with html.parser) + +### Phase 1 Dependencies Used + +- `requests` (HTTP fetching - already in pyproject.toml) +- `dnspython` (DNS queries - Phase 1) +- `smtplib` (Email sending - Python stdlib, used by Phase 1) +- `sqlalchemy` (Database - Phase 1) +- `fastapi` (Web framework - Phase 1) +- `pydantic` (Data validation - Phase 1) + +### Configuration Additions + +**Optional new environment variables**: +```bash +# HTML Fetching (optional - has defaults) +GONDULF_HTML_FETCH_TIMEOUT=10 # seconds +GONDULF_HTML_MAX_SIZE=5242880 # bytes (5MB) +GONDULF_HTML_MAX_REDIRECTS=5 + +# Rate Limiting (optional - has defaults) +GONDULF_VERIFICATION_RATE_LIMIT=3 # codes per domain per hour +``` + +**Add to .env.example**: +```bash +# HTML Fetching Configuration (optional) +GONDULF_HTML_FETCH_TIMEOUT=10 +GONDULF_HTML_MAX_SIZE=5242880 +GONDULF_HTML_MAX_REDIRECTS=5 + +# Rate Limiting (optional) +GONDULF_VERIFICATION_RATE_LIMIT=3 +``` + +## Implementation Notes + +### Suggested Implementation Order + +1. **HTML Fetcher Service** (0.5 days) + - Straightforward HTTP fetching + - Few dependencies + - Easy to test in isolation + +2. **rel="me" Discovery Service** (0.5 days) + - Pure parsing logic + - No external dependencies (besides HTML input) + - Easy to test with mock HTML + +3. **Domain Verification Service** (1 day) + - Orchestrates all services + - More complex logic + - Needs all previous services complete + +4. **Database Migration** (0.5 days) + - Simple UPDATE query + - Apply before verification endpoints + +5. **Verification Endpoints** (0.5 days) + - Thin API layer over service + - FastAPI makes this straightforward + +6. **Authorization Endpoint** (3-4 days) + - Most complex component + - HTML templates needed + - Multiple sub-endpoints + - Needs comprehensive testing + +7. **Integration Testing** (1 day) + - Test all components together + - End-to-end flow verification + +**Total**: ~7-8 days (matches estimate in phase-1-impact-assessment.md) + +--- + +### Risks and Mitigations + +**Risk 1: HTML Parsing Edge Cases** +- **Mitigation**: BeautifulSoup handles malformed HTML gracefully +- **Testing**: Include malformed HTML in test cases +- **Fallback**: Clear error messages guide users to fix HTML + +**Risk 2: Email Delivery Failures** +- **Mitigation**: Comprehensive SMTP error handling +- **Testing**: Mock SMTP failures in tests +- **Fallback**: Clear troubleshooting instructions in error messages + +**Risk 3: DNS TXT Record Setup Complexity** +- **Mitigation**: Clear setup instructions with examples +- **User Education**: Document common DNS providers +- **Support**: Provide example DNS configurations + +**Risk 4: Authorization Endpoint Complexity** +- **Mitigation**: Break into smaller sub-endpoints (verify-code, consent) +- **Testing**: Comprehensive integration tests +- **Design**: Keep state management simple (use forms, avoid complex sessions) + +**Risk 5: Rate Limiting Implementation** +- **Mitigation**: Start with simple in-memory tracking (Phase 2) +- **Future**: Migrate to Redis for distributed rate limiting (Phase 3+) +- **Placeholder**: Implement rate limit check, return False for now + +--- + +### Performance Considerations + +**HTML Fetching**: +- Timeout: 10 seconds (prevent hanging) +- Size limit: 5MB (prevent memory exhaustion) +- Concurrent requests: Not needed in Phase 2 (one request per auth flow) + +**Database Queries**: +- Index on domains.domain ensures fast lookups +- Simple SELECT queries (no joins in Phase 2) +- Consider adding index on domains.verified if needed + +**In-Memory Storage**: +- Verification codes: ~100 bytes each +- Authorization codes: ~200 bytes each +- Expected load: 10s of users, <100 concurrent verifications +- Memory impact: Negligible (<10KB) + +**rel="me" Parsing**: +- BeautifulSoup is pure Python (not fastest, but sufficient) +- HTML size limited to 5MB (parse time <1 second) +- No performance issues expected for typical homepages + +--- + +### Future Extensibility + +**Redis Integration** (Phase 3+): +- Replace in-memory CodeStorage with Redis +- Enables distributed deployment (multiple Gondulf instances) +- No code changes needed (CodeStorage interface unchanged) + +**Client Metadata Caching** (Phase 3): +- Cache client_id fetch results +- Reduces HTTP requests during authorization +- Store in database or Redis + +**PKCE Support** (v1.1.0): +- Add code_challenge validation in authorization endpoint +- Add code_verifier validation in token endpoint (Phase 3) +- No breaking changes to v1.0.0 clients + +**Additional Authentication Methods** (v1.2.0+): +- GitHub/GitLab OAuth providers +- WebAuthn support +- All additive (user chooses method) + +## Acceptance Criteria + +Phase 2 is complete when ALL of the following criteria are met: + +### Functionality + +- [ ] HTML fetcher service fetches user homepages successfully +- [ ] rel="me" discovery service discovers email from HTML +- [ ] Domain verification service orchestrates two-factor verification +- [ ] DNS TXT verification required and working +- [ ] Email verification via rel="me" required and working +- [ ] Verification endpoints (/api/verify/start, /api/verify/code) working +- [ ] Authorization endpoint (/authorize) validates all parameters +- [ ] Authorization endpoint checks domain verification status +- [ ] Authorization endpoint shows verification form for unverified domains +- [ ] Authorization endpoint shows consent screen after verification +- [ ] Authorization code generated and stored on approval +- [ ] User can deny consent (redirects with access_denied) +- [ ] State parameter passed through all steps + +### Testing + +- [ ] All unit tests passing (estimated ~36 tests) +- [ ] All integration tests passing (estimated ~25 tests) +- [ ] All end-to-end tests passing (estimated ~5 tests) +- [ ] All security tests passing (estimated ~20 tests) +- [ ] Test coverage ≥80% overall +- [ ] Test coverage ≥95% for domain verification service +- [ ] Test coverage ≥95% for authorization endpoint +- [ ] No known bugs or failing tests + +### Security + +- [ ] HTTPS enforcement working (production) +- [ ] SSL certificate validation enforced (HTML fetching) +- [ ] HTML parsing secure (BeautifulSoup with html.parser) +- [ ] Input validation comprehensive (domain, email, URLs) +- [ ] Open redirect protection working (redirect_uri validation) +- [ ] Constant-time code comparison used +- [ ] Rate limiting implemented (basic in-memory) +- [ ] Attempt limiting working (max 3 per code) +- [ ] No PII in logs (email masked, no full addresses) +- [ ] Authorization codes single-use (marked for Phase 3) + +### Error Handling + +- [ ] DNS verification failure shows clear instructions +- [ ] rel="me" discovery failure shows HTML example +- [ ] Site unreachable shows troubleshooting steps +- [ ] Email send failure shows error with retry +- [ ] Invalid code shows attempts remaining +- [ ] Too many attempts invalidates code +- [ ] Rate limit exceeded shows wait time +- [ ] OAuth 2.0 errors formatted correctly +- [ ] All errors logged appropriately + +### Documentation + +- [ ] All new services have docstrings +- [ ] All public methods have type hints +- [ ] API endpoints documented (this design doc) +- [ ] Error messages user-friendly +- [ ] Setup instructions clear (DNS + rel="me") +- [ ] Database migration documented + +### Dependencies + +- [ ] beautifulsoup4 added to pyproject.toml +- [ ] No new system dependencies (all Python) +- [ ] Configuration updated (.env.example) + +### Database + +- [ ] Migration 002 applied successfully +- [ ] domains.verification_method updated to 'two_factor' +- [ ] No schema changes needed (existing schema works) + +### Integration + +- [ ] All Phase 1 services integrated successfully +- [ ] DNS service used for TXT verification +- [ ] Email service used for code sending +- [ ] Database service used for storing verified domains +- [ ] In-memory storage used for codes +- [ ] Logging used throughout + +### Performance + +- [ ] HTML fetching completes within 10 seconds +- [ ] rel="me" parsing completes within 1 second +- [ ] Full verification flow completes within 30 seconds +- [ ] Authorization endpoint responds within 2 seconds +- [ ] No memory leaks (codes expire and clean up) + +## Timeline Estimate + +**Phase 2 Implementation**: 7-9 days + +**Breakdown**: +- HTML Fetcher Service: 0.5 days +- rel="me" Discovery Service: 0.5 days +- Domain Verification Service: 1 day +- Database Migration: 0.5 days +- Verification Endpoints: 0.5 days +- Authorization Endpoint: 3-4 days +- Integration Testing: 1 day +- Documentation: 0.5 days (included in parallel) + +**Dependencies**: Phase 1 complete and approved + +**Risk Buffer**: +2 days (for unforeseen issues with HTML parsing or authorization flow complexity) + +## Sign-off + +**Design Status**: Complete and ready for implementation + +**Architect**: Claude (Architect Agent) +**Date**: 2025-11-20 + +**Next Steps**: +1. Developer reviews design document +2. Developer asks clarification questions if needed +3. Architect updates design based on feedback +4. Developer begins implementation following design +5. Developer creates implementation report upon completion +6. Architect reviews implementation report + +**Related Documents**: +- `/docs/architecture/overview.md` - System architecture +- `/docs/architecture/indieauth-protocol.md` - IndieAuth protocol implementation +- `/docs/architecture/security.md` - Security architecture +- `/docs/architecture/phase-1-impact-assessment.md` - Phase 2 requirements +- `/docs/decisions/ADR-005-email-based-authentication-v1-0-0.md` - Two-factor verification decision +- `/docs/decisions/ADR-008-rel-me-email-discovery.md` - rel="me" pattern decision +- `/docs/reports/2025-11-20-phase-1-foundation.md` - Phase 1 implementation +- `/docs/roadmap/v1.0.0.md` - Version plan + +--- + +**DESIGN READY: Phase 2 Domain Verification - Please review /docs/designs/phase-2-domain-verification.md** diff --git a/docs/designs/phase-2-implementation-guide.md b/docs/designs/phase-2-implementation-guide.md new file mode 100644 index 0000000..df5746e --- /dev/null +++ b/docs/designs/phase-2-implementation-guide.md @@ -0,0 +1,739 @@ +# Phase 2 Implementation Guide - Specific Details + +**Date**: 2024-11-20 +**Architect**: Claude (Architect Agent) +**Status**: Supplementary to Phase 2 Design +**Purpose**: Provide specific implementation details for Developer clarification questions + +This document supplements `/docs/designs/phase-2-domain-verification.md` with specific implementation decisions from ADR-0004. + +## 1. Rate Limiting Implementation + +### Approach +Implement actual in-memory rate limiting with timestamp tracking. + +### Implementation Specifications + +**Service Structure**: +```python +# src/gondulf/rate_limiter.py +from typing import Dict, List +import time + +class RateLimiter: + """In-memory rate limiter for domain verification attempts.""" + + def __init__(self, max_attempts: int = 3, window_hours: int = 1): + """ + Args: + max_attempts: Maximum attempts per domain in time window (default: 3) + window_hours: Time window in hours (default: 1) + """ + self.max_attempts = max_attempts + self.window_seconds = window_hours * 3600 + self._attempts: Dict[str, List[int]] = {} # domain -> [timestamp1, timestamp2, ...] + + def check_rate_limit(self, domain: str) -> bool: + """ + Check if domain has exceeded rate limit. + + Args: + domain: Domain to check + + Returns: + True if within rate limit, False if exceeded + """ + # Clean old timestamps first + self._clean_old_attempts(domain) + + # Check current count + if domain not in self._attempts: + return True + + return len(self._attempts[domain]) < self.max_attempts + + def record_attempt(self, domain: str) -> None: + """Record a verification attempt for domain.""" + now = int(time.time()) + if domain not in self._attempts: + self._attempts[domain] = [] + self._attempts[domain].append(now) + + def _clean_old_attempts(self, domain: str) -> None: + """Remove timestamps older than window.""" + if domain not in self._attempts: + return + + now = int(time.time()) + cutoff = now - self.window_seconds + self._attempts[domain] = [ts for ts in self._attempts[domain] if ts > cutoff] + + # Remove domain entirely if no recent attempts + if not self._attempts[domain]: + del self._attempts[domain] +``` + +**Usage in Endpoints**: +```python +# In verification endpoint +rate_limiter = get_rate_limiter() +if not rate_limiter.check_rate_limit(domain): + return {"success": False, "error": "rate_limit_exceeded"} + +rate_limiter.record_attempt(domain) +# ... proceed with verification +``` + +**Consequences**: +- State lost on restart (acceptable trade-off for simplicity) +- No persistence needed +- Simple dictionary-based implementation + +## 2. Authorization Code Metadata Structure + +### Approach +Use Phase 1's `CodeStorage` service with complete metadata structure from the start. + +### Data Structure Specification + +**Authorization Code Metadata**: +```python +{ + "client_id": "https://client.example.com/", + "redirect_uri": "https://client.example.com/callback", + "state": "client_state_value", + "code_challenge": "base64url_encoded_challenge", + "code_challenge_method": "S256", + "scope": "profile email", + "me": "https://user.example.com/", + "created_at": 1700000000, # epoch integer + "expires_at": 1700000600, # epoch integer (created_at + 600) + "used": False # Include now, consume in Phase 3 +} +``` + +**Storage Implementation**: +```python +# Use Phase 1's CodeStorage +code_storage = get_code_storage() +authorization_code = generate_random_code() +metadata = { + "client_id": client_id, + "redirect_uri": redirect_uri, + "state": state, + "code_challenge": code_challenge, + "code_challenge_method": code_challenge_method, + "scope": scope, + "me": me, + "created_at": int(time.time()), + "expires_at": int(time.time()) + 600, + "used": False +} +code_storage.store(f"authz:{authorization_code}", metadata, ttl=600) +``` + +**Rationale**: +- Epoch integers simpler than datetime objects +- Include `used` field now (Phase 3 will check/update it) +- Reuse existing `CodeStorage` infrastructure +- Key prefix `authz:` distinguishes from verification codes + +## 3. HTML Template Implementation + +### Approach +Use Jinja2 templates with separate template files. + +### Directory Structure +``` +src/gondulf/templates/ +├── base.html # Shared layout +├── verify_email.html # Email verification form +├── verify_totp.html # TOTP verification form (future) +├── authorize.html # Authorization consent page +└── error.html # Generic error page +``` + +### Base Template +```html + + + + + + +A verification code has been sent to {{ masked_email }}
+Please enter the 6-digit code to complete verification:
+ +{% if error %} +{{ error }}
+{% endif %} + + +{% endblock %} +``` + +### FastAPI Integration +```python +from fastapi import FastAPI, Request +from fastapi.templating import Jinja2Templates + +templates = Jinja2Templates(directory="src/gondulf/templates") + +@app.get("/verify/email") +async def verify_email_page(request: Request, domain: str): + masked = mask_email(discovered_email) + return templates.TemplateResponse("verify_email.html", { + "request": request, + "domain": domain, + "masked_email": masked + }) +``` + +**Dependencies**: +- Add to `pyproject.toml`: `jinja2 = "^3.1.0"` + +## 4. Database Migration Timing + +### Approach +Apply migration 002 immediately as part of Phase 2 setup. + +### Execution Order +1. Developer runs migration: `alembic upgrade head` +2. Migration 002 adds `two_factor` column with default value `false` +3. All Phase 2 code assumes column exists +4. New domains inserted with explicit `two_factor` value + +### Migration File (if not already created) +```python +# migrations/versions/002_add_two_factor_column.py +"""Add two_factor column to domains table + +Revision ID: 002 +Revises: 001 +Create Date: 2024-11-20 +""" +from alembic import op +import sqlalchemy as sa + +def upgrade(): + op.add_column('domains', + sa.Column('two_factor', sa.Boolean(), nullable=False, server_default='false') + ) + +def downgrade(): + op.drop_column('domains', 'two_factor') +``` + +**Rationale**: +- Keep database schema current with code expectations +- No conditional logic needed in Phase 2 code +- Clean separation: migration handles existing data, new code uses new schema + +## 5. Client Validation Helper Functions + +### Approach +Standalone utility functions in shared module. + +### Module Structure +```python +# src/gondulf/utils/validation.py +"""Client validation and utility functions.""" +from urllib.parse import urlparse +import re + +def mask_email(email: str) -> str: + """ + Mask email for display: user@example.com -> u***@example.com + + Args: + email: Email address to mask + + Returns: + Masked email string + """ + if '@' not in email: + return email + + local, domain = email.split('@', 1) + if len(local) <= 1: + return email + + masked_local = local[0] + '***' + return f"{masked_local}@{domain}" + + +def normalize_client_id(client_id: str) -> str: + """ + Normalize client_id URL to canonical form. + + Rules: + - Ensure https:// scheme + - Remove default port (443) + - Preserve path + + Args: + client_id: Client ID URL + + Returns: + Normalized client_id + """ + parsed = urlparse(client_id) + + # Ensure https + if parsed.scheme != 'https': + raise ValueError("client_id must use https scheme") + + # Remove default HTTPS port + netloc = parsed.netloc + if netloc.endswith(':443'): + netloc = netloc[:-4] + + # Reconstruct + normalized = f"https://{netloc}{parsed.path}" + if parsed.query: + normalized += f"?{parsed.query}" + if parsed.fragment: + normalized += f"#{parsed.fragment}" + + return normalized + + +def validate_redirect_uri(redirect_uri: str, client_id: str) -> bool: + """ + Validate redirect_uri against client_id per IndieAuth spec. + + Rules: + - Must use https scheme (except localhost) + - Must share same origin as client_id OR + - Must be subdomain of client_id domain + + Args: + redirect_uri: Redirect URI to validate + client_id: Client ID for comparison + + Returns: + True if valid, False otherwise + """ + try: + redirect_parsed = urlparse(redirect_uri) + client_parsed = urlparse(client_id) + + # Check scheme (allow http for localhost only) + if redirect_parsed.scheme != 'https': + if redirect_parsed.hostname not in ('localhost', '127.0.0.1'): + return False + + # Same origin check + if (redirect_parsed.scheme == client_parsed.scheme and + redirect_parsed.netloc == client_parsed.netloc): + return True + + # Subdomain check + redirect_host = redirect_parsed.hostname or '' + client_host = client_parsed.hostname or '' + + # Must end with .{client_host} + if redirect_host.endswith(f".{client_host}"): + return True + + return False + + except Exception: + return False +``` + +**Usage**: +```python +from gondulf.utils.validation import mask_email, validate_redirect_uri, normalize_client_id + +# In verification endpoint +masked = mask_email(discovered_email) + +# In authorization endpoint +normalized_client = normalize_client_id(client_id) +if not validate_redirect_uri(redirect_uri, normalized_client): + return error_response("invalid_redirect_uri") +``` + +## 6. Error Response Format Consistency + +### Approach +Use format appropriate to endpoint type. + +### Format Rules by Endpoint Type + +**Verification Endpoints** (`/verify/email`, `/verify/totp`): +```python +# Always return 200 OK with JSON +return JSONResponse( + status_code=200, + content={"success": False, "error": "invalid_code"} +) +``` + +**Authorization Endpoint - Pre-Client Validation**: +```python +# Return HTML error page if client_id not yet validated +return templates.TemplateResponse("error.html", { + "request": request, + "error": "Missing required parameter: client_id", + "error_code": "invalid_request" +}, status_code=400) +``` + +**Authorization Endpoint - Post-Client Validation**: +```python +# Return OAuth redirect with error parameter +from urllib.parse import urlencode +error_params = { + "error": "invalid_request", + "error_description": "Missing code_challenge parameter", + "state": request.query_params.get("state", "") +} +redirect_url = f"{redirect_uri}?{urlencode(error_params)}" +return RedirectResponse(url=redirect_url, status_code=302) +``` + +**Token Endpoint** (Phase 3): +```python +# Always return JSON with appropriate status code +return JSONResponse( + status_code=400, + content={ + "error": "invalid_grant", + "error_description": "Authorization code has expired" + } +) +``` + +### Error Flow Decision Tree +``` +Is this a verification endpoint? + YES -> Return JSON (200 OK) with success:false + NO -> Continue + +Has client_id been validated yet? + NO -> Return HTML error page + YES -> Continue + +Is redirect_uri valid? + NO -> Return HTML error page (can't redirect safely) + YES -> Return OAuth redirect with error +``` + +## 7. Dependency Injection Pattern + +### Approach +Singleton services instantiated at startup in `dependencies.py`. + +### Implementation Structure + +**Dependencies Module**: +```python +# src/gondulf/dependencies.py +"""FastAPI dependency injection for services.""" +from functools import lru_cache +from gondulf.config import get_config +from gondulf.database import DatabaseService +from gondulf.code_storage import CodeStorage +from gondulf.email_service import EmailService +from gondulf.dns_service import DNSService +from gondulf.html_fetcher import HTMLFetcherService +from gondulf.relme_parser import RelMeParser +from gondulf.verification_service import DomainVerificationService +from gondulf.rate_limiter import RateLimiter + +# Configuration +@lru_cache() +def get_config_singleton(): + """Get singleton configuration instance.""" + return get_config() + +# Phase 1 Services +@lru_cache() +def get_database(): + """Get singleton database service.""" + config = get_config_singleton() + return DatabaseService(config.database_url) + +@lru_cache() +def get_code_storage(): + """Get singleton code storage service.""" + return CodeStorage() + +@lru_cache() +def get_email_service(): + """Get singleton email service.""" + config = get_config_singleton() + return EmailService( + smtp_host=config.smtp_host, + smtp_port=config.smtp_port, + smtp_username=config.smtp_username, + smtp_password=config.smtp_password, + from_address=config.smtp_from_address + ) + +@lru_cache() +def get_dns_service(): + """Get singleton DNS service.""" + config = get_config_singleton() + return DNSService(nameservers=config.dns_nameservers) + +# Phase 2 Services +@lru_cache() +def get_html_fetcher(): + """Get singleton HTML fetcher service.""" + return HTMLFetcherService() + +@lru_cache() +def get_relme_parser(): + """Get singleton rel=me parser service.""" + return RelMeParser() + +@lru_cache() +def get_rate_limiter(): + """Get singleton rate limiter service.""" + return RateLimiter(max_attempts=3, window_hours=1) + +@lru_cache() +def get_verification_service(): + """Get singleton domain verification service.""" + return DomainVerificationService( + dns_service=get_dns_service(), + email_service=get_email_service(), + code_storage=get_code_storage(), + html_fetcher=get_html_fetcher(), + relme_parser=get_relme_parser() + ) +``` + +**Usage in Endpoints**: +```python +from fastapi import Depends +from gondulf.dependencies import get_verification_service, get_rate_limiter + +@app.post("/verify/email") +async def verify_email( + domain: str, + code: str, + verification_service: DomainVerificationService = Depends(get_verification_service), + rate_limiter: RateLimiter = Depends(get_rate_limiter) +): + # Use injected services + if not rate_limiter.check_rate_limit(domain): + return {"success": False, "error": "rate_limit_exceeded"} + + result = verification_service.verify_email_code(domain, code) + return {"success": result} +``` + +**Rationale**: +- `@lru_cache()` ensures single instance per function +- Services configured once at startup +- Consistent with Phase 1 pattern +- Simple to test (can override dependencies in tests) + +## 8. Test Organization for Authorization Endpoint + +### Approach +Separate test files per major endpoint with shared fixtures. + +### File Structure +``` +tests/ +├── conftest.py # Shared fixtures and configuration +├── test_verification_endpoints.py # Email/TOTP verification tests +└── test_authorization_endpoint.py # Authorization flow tests +``` + +### Shared Fixtures Module +```python +# tests/conftest.py +import pytest +from fastapi.testclient import TestClient +from gondulf.main import app +from gondulf.dependencies import get_database, get_code_storage, get_rate_limiter + +@pytest.fixture +def client(): + """FastAPI test client.""" + return TestClient(app) + +@pytest.fixture +def mock_database(): + """Mock database service for testing.""" + # Create in-memory test database + from gondulf.database import DatabaseService + db = DatabaseService("sqlite:///:memory:") + db.initialize() + return db + +@pytest.fixture +def mock_code_storage(): + """Mock code storage for testing.""" + from gondulf.code_storage import CodeStorage + return CodeStorage() + +@pytest.fixture +def mock_rate_limiter(): + """Mock rate limiter with clean state.""" + from gondulf.rate_limiter import RateLimiter + return RateLimiter() + +@pytest.fixture +def verified_domain(mock_database): + """Fixture providing a pre-verified domain.""" + domain = "example.com" + mock_database.store_verified_domain( + domain=domain, + email="user@example.com", + two_factor=True + ) + return domain + +@pytest.fixture +def override_dependencies(mock_database, mock_code_storage, mock_rate_limiter): + """Override FastAPI dependencies with test mocks.""" + app.dependency_overrides[get_database] = lambda: mock_database + app.dependency_overrides[get_code_storage] = lambda: mock_code_storage + app.dependency_overrides[get_rate_limiter] = lambda: mock_rate_limiter + yield + app.dependency_overrides.clear() +``` + +### Verification Endpoints Tests +```python +# tests/test_verification_endpoints.py +import pytest + +class TestEmailVerification: + """Tests for /verify/email endpoint.""" + + def test_email_verification_success(self, client, override_dependencies): + """Test successful email verification.""" + # Test implementation + pass + + def test_email_verification_invalid_code(self, client, override_dependencies): + """Test email verification with invalid code.""" + pass + + def test_email_verification_rate_limit(self, client, override_dependencies): + """Test rate limiting on email verification.""" + pass + +class TestTOTPVerification: + """Tests for /verify/totp endpoint (future).""" + pass +``` + +### Authorization Endpoint Tests +```python +# tests/test_authorization_endpoint.py +import pytest +from urllib.parse import parse_qs, urlparse + +class TestAuthorizationEndpoint: + """Tests for /authorize endpoint.""" + + def test_authorize_missing_client_id(self, client, override_dependencies): + """Test authorization with missing client_id parameter.""" + response = client.get("/authorize") + assert response.status_code == 400 + assert "client_id" in response.text + + def test_authorize_invalid_redirect_uri(self, client, override_dependencies): + """Test authorization with mismatched redirect_uri.""" + params = { + "client_id": "https://client.example.com/", + "redirect_uri": "https://evil.com/callback", + "response_type": "code", + "state": "test_state" + } + response = client.get("/authorize", params=params) + assert response.status_code == 400 + + def test_authorize_success_flow(self, client, override_dependencies, verified_domain): + """Test complete successful authorization flow.""" + # Full flow test with verified domain + params = { + "client_id": "https://client.example.com/", + "redirect_uri": "https://client.example.com/callback", + "response_type": "code", + "state": "test_state", + "code_challenge": "test_challenge", + "code_challenge_method": "S256", + "me": f"https://{verified_domain}/" + } + response = client.get("/authorize", params=params, allow_redirects=False) + assert response.status_code == 302 + + # Verify redirect contains authorization code + redirect_url = response.headers["location"] + parsed = urlparse(redirect_url) + query_params = parse_qs(parsed.query) + assert "code" in query_params + assert query_params["state"][0] == "test_state" +``` + +### Test Organization Rules +1. **One test class per major functionality** (email verification, authorization flow) +2. **Test complete flows, not internal methods** (black box testing) +3. **Use shared fixtures** for common setup (verified domains, mock services) +4. **Test both success and error paths** +5. **Test security boundaries** (rate limiting, invalid inputs, unauthorized access) + +## Summary + +These implementation decisions provide the Developer with unambiguous direction for Phase 2 implementation. All decisions prioritize simplicity while maintaining security and specification compliance. + +**Key Principles Applied**: +- Real implementations over stubs (rate limiting, validation) +- Reuse existing infrastructure (CodeStorage, dependency pattern) +- Standard tools over custom solutions (Jinja2 templates) +- Simple data structures (epoch integers, dictionaries) +- Clear separation of concerns (utility functions, test organization) + +**Next Steps for Developer**: +1. Review this guide alongside Phase 2 design document +2. Implement in the order specified by Phase 2 design +3. Follow patterns and structures defined here +4. Ask clarification questions if any ambiguity remains before implementation + +All architectural decisions are now documented and ready for implementation. diff --git a/docs/roadmap/backlog.md b/docs/roadmap/backlog.md index dcbeb98..9386463 100644 --- a/docs/roadmap/backlog.md +++ b/docs/roadmap/backlog.md @@ -568,9 +568,86 @@ These features are REQUIRED for the first production-ready release. Technical debt items are tracked here with a DEBT: prefix. Per project standards, each release must allocate at least 10% of effort to technical debt reduction. -### DEBT: Add Redis for session storage (M) +### DEBT: TD-001 - FastAPI Lifespan Migration (XS) +**Created**: 2025-11-20 (Phase 1 review) +**Priority**: P2 +**Component**: Core Infrastructure + +**Issue**: Using deprecated `@app.on_event()` decorators instead of lifespan context manager. + +**Impact**: +- Deprecation warnings in FastAPI 0.109+ +- Will break in future FastAPI version +- Not following current best practices + +**Current Mitigation**: Still works in current FastAPI version. + +**Effort to Fix**: < 1 day +- Replace `@app.on_event("startup")` with lifespan context manager +- Update database initialization to use lifespan +- Update tests if needed + +**Plan**: Address in v1.1.0 or during FastAPI upgrade. + +**References**: FastAPI lifespan documentation + +--- + +### DEBT: TD-002 - Database Migration Rollback Safety (S) +**Created**: 2025-11-20 (Phase 1 review) +**Priority**: P2 +**Component**: Database Layer + +**Issue**: No migration rollback capability. Migrations are one-way only. + +**Impact**: +- Cannot easily roll back schema changes +- Requires manual SQL to undo migrations +- Risk during production deployments + +**Current Mitigation**: Simple schema, manual SQL backups acceptable for v1.0.0. + +**Effort to Fix**: 1-2 days +- Integrate Alembic for migration management +- Create rollback scripts for existing migrations +- Update deployment documentation + +**Plan**: Address before v1.1.0 when schema changes become more frequent. + +**References**: Alembic documentation + +--- + +### DEBT: TD-003 - Async Email Support (S) +**Created**: 2025-11-20 (Phase 1 review) +**Priority**: P2 +**Component**: Email Service + +**Issue**: Synchronous SMTP blocks request thread during email sending. + +**Impact**: +- Email sending delays response to user (1-5 seconds) +- Thread blocked during SMTP operation +- Poor UX during slow email delivery + +**Current Mitigation**: Acceptable for low-volume v1.0.0. Timeout limits (10s) prevent long blocks. + +**Effort to Fix**: 1-2 days +- Implement background task queue (FastAPI BackgroundTasks or Celery) +- Make email sending non-blocking +- Update UX to show "Sending email..." message +- Add retry logic for failed sends + +**Plan**: Address in v1.1.0 when user volume increases or when UX feedback indicates issue. + +**Alternative**: Use async SMTP library (aiosmtplib) + +--- + +### DEBT: TD-004 - Add Redis for Session Storage (M) **Created**: 2025-11-20 (architectural decision) **Priority**: P2 +**Component**: Storage Layer **Issue**: In-memory storage doesn't survive restarts. @@ -584,22 +661,6 @@ Technical debt items are tracked here with a DEBT: prefix. Per project standards --- -### DEBT: Implement schema migrations (S) -**Created**: 2025-11-20 (architectural decision) -**Priority**: P2 - -**Issue**: No formal migration system, using raw SQL files. - -**Impact**: Schema changes require manual intervention. - -**Mitigation (current)**: Simple schema, infrequent changes acceptable for v1.0.0. - -**Effort to Fix**: 1-2 days (Alembic integration) - -**Plan**: Address before v1.1.0 when schema changes become more frequent. - ---- - ## Backlog Management ### Adding New Features