docs: add Phase 2 domain verification design and clarifications

Add comprehensive Phase 2 documentation: - Complete design document for two-factor domain verification - Implementation guide with code examples - ADR for implementation decisions (ADR-0004) - ADR for rel="me" email discovery (ADR-008) - Phase 1 impact assessment - All 23 clarification questions answered - Updated architecture docs (indieauth-protocol, security) - Updated ADR-005 with rel="me" approach - Updated backlog with technical debt items Design ready for Phase 2 implementation. Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 13:05:09 -07:00
parent bebd47955f
commit 6f06aebf40
10 changed files with 5605 additions and 410 deletions
--- a/docs/architecture/indieauth-protocol.md
+++ b/docs/architecture/indieauth-protocol.md
@@ -162,26 +162,34 @@ Accept: text/html
 - Reject non-200 responses
 - Log client_id fetch failures

-#### Authentication Flow (v1.0.0: Email-based)
+#### Authentication Flow (v1.0.0: Two-Factor Domain Verification)

-1. **Domain Ownership Check**
-   - Check if `me` domain has verified TXT record: `_gondulf.example.com` = `verified`
-   - If found and cached, skip email verification
-   - If not found, proceed to email verification
+1. **DNS TXT Record Verification (Required)**
+   - Check if `me` domain has TXT record: `_gondulf.{domain}` = `verified`
+   - Query multiple DNS resolvers (Google 8.8.8.8, Cloudflare 1.1.1.1)
+   - Require consensus from at least 2 resolvers
+   - If not found: Display error with instructions to add TXT record
+   - If found: Proceed to email discovery
+   - Proves: User controls DNS for the domain

-2. **Email Verification**
-   - Display form requesting email address
-   - Validate email is at `me` domain (e.g., `admin@example.com` for `https://example.com`)
+2. **Email Discovery via rel="me" (Required)**
+   - Fetch user's domain homepage (e.g., https://example.com)
+   - Parse HTML for `<link rel="me" href="mailto:user@example.com">` or `<a rel="me" href="mailto:user@example.com">`
+   - Extract email address from first matching mailto: link
+   - If not found: Display error with instructions to add rel="me" link
+   - If found: Proceed to email verification
+   - Proves: User has published email relationship on their site
+   - Reference: https://indieweb.org/rel-me
+
+3. **Email Verification Code (Required)**
   - Generate 6-digit verification code (cryptographically random)
   - Store code in memory with 15-minute TTL
-   - Send code via SMTP
-   - Display code entry form
-
-3. **Code Verification**
+   - Send code to discovered email address via SMTP
+   - Display code entry form showing discovered email (partially masked)
   - User enters 6-digit code
-   - Validate code matches and hasn't expired
+   - Validate code matches and hasn't expired (max 3 attempts)
+   - Proves: User controls the email account
   - Mark domain as verified (store in database)
-   - Proceed to authorization

 4. **User Consent**
   - Display authorization prompt:
@@ -208,6 +216,8 @@ Accept: text/html
   Location: {redirect_uri}?code={code}&state={state}
   ```

+**Security Model**: Two-factor verification requires BOTH DNS control AND email control. An attacker would need to compromise both to authenticate fraudulently.
+
 #### Error Responses

 Return error via redirect when possible:
@@ -404,18 +414,19 @@ Future implementation per RFC 7009.

 ```python
 {
-    "email": "admin@example.com",
+    "email": "admin@example.com",  # Discovered from rel="me", not user-provided
    "code": "123456",  # 6-digit string
    "domain": "example.com",
    "created_at": datetime,
    "expires_at": datetime,  # created_at + 15 minutes
-    "attempts": 0  # Rate limiting
+    "attempts": 0  # Rate limiting (max 3 attempts)
 }
 ```

 **Storage**: Python dict with TTL management
+**Email Source**: Discovered from site's rel="me" link (not user input)
 **Expiration**: 15 minutes
-**Rate Limiting**: Max 3 attempts per email
+**Rate Limiting**: Max 3 attempts per email, max 3 codes per domain per hour
 **Cleanup**: Automatic expiration via TTL

 ### Access Token (SQLite)
@@ -448,18 +459,21 @@ CREATE TABLE tokens (
 CREATE TABLE domains (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    domain TEXT NOT NULL UNIQUE,
-    verification_method TEXT NOT NULL,  -- 'txt_record' or 'email'
+    verification_method TEXT NOT NULL,  -- 'two_factor' (DNS + Email)
    verified_at TIMESTAMP NOT NULL,
-    last_checked TIMESTAMP,
-    txt_record_valid BOOLEAN DEFAULT 0,
+    last_dns_check TIMESTAMP,
+    dns_txt_valid BOOLEAN DEFAULT 0,
+    last_email_check TIMESTAMP,

    INDEX idx_domain (domain)
 );
 ```

 **Purpose**: Cache domain ownership verification
-**TXT Record**: Re-verified periodically (daily)
-**Email Verification**: Permanent unless admin deletes
+**Verification Method**: Always 'two_factor' in v1.0.0 (DNS TXT + Email via rel="me")
+**DNS TXT**: Re-verified periodically (daily check)
+**Email**: NOT stored (only verification timestamp recorded)
+**Re-verification**: DNS checked periodically, email re-verified on each login
 **Cleanup**: Optional (admin decision)

 ## Security Considerations
--- a/docs/architecture/phase-1-impact-assessment.md
+++ b/docs/architecture/phase-1-impact-assessment.md
@@ -0,0 +1,809 @@
+# Phase 1 Impact Assessment: Authentication Flow Change
+
+**Date**: 2025-11-20
+**Architect**: Claude (Architect Agent)
+**Related ADRs**: ADR-005 (updated), ADR-008 (new)
+**Related Report**: /docs/reports/2025-11-20-phase-1-foundation.md
+
+## Summary
+
+The authentication design has been updated to require BOTH DNS TXT verification AND email verification via rel="me" discovery. This change impacts Phase 1 implementation and defines new requirements for Phase 2.
+
+## Authentication Flow Change
+
+### Original Design (ADR-005 v1)
+- **Primary**: Email verification (user provides email)
+- **Optional**: DNS TXT verification (fast-path to skip email)
+- **Flow**: DNS check → if not found, request email → send code → verify code
+
+### Updated Design (ADR-005 v2 + ADR-008)
+- **Required Factor 1**: DNS TXT verification (`_gondulf.{domain}` = `verified`)
+- **Required Factor 2**: Email verification via rel="me" discovery
+- **Flow**: DNS check → rel="me" discovery → send code to discovered email → verify code
+
+### Key Differences
+
+| Aspect | Original | Updated |
+|--------|----------|---------|
+| DNS TXT | Optional (fast-path) | Required (first factor) |
+| Email Discovery | User input | rel="me" link parsing |
+| Email Verification | Optional (fallback) | Required (second factor) |
+| Security Model | Single-factor | Two-factor |
+| Attack Resistance | Moderate | High (requires DNS + email control) |
+| Setup Complexity | Lower (email only works) | Higher (both required) |
+
+## Phase 1 Implementation Impact
+
+### What Phase 1 Implemented
+
+Phase 1 successfully implemented:
+- ✅ Configuration management (GONDULF_* environment variables)
+- ✅ Database layer with migrations (SQLite, SQLAlchemy Core)
+- ✅ In-memory code storage (TTL-based expiration)
+- ✅ Email service (SMTP with STARTTLS support)
+- ✅ DNS service (TXT record querying with fallback resolvers)
+- ✅ Structured logging
+- ✅ FastAPI application with health check endpoint
+- ✅ 94.16% test coverage (96 tests passing)
+
+### Does Phase 1 Need Changes?
+
+**Answer: NO. Phase 1 implementation remains valid.**
+
+#### Analysis
+
+**Email Service** (`src/gondulf/email.py`):
+- Current: Generic email sending service
+- Change Impact: **None**
+- Reason: Email service sends codes to any email address. Whether email is user-provided or rel="me"-discovered doesn't affect this service.
+- Status: **No changes needed**
+
+**DNS Service** (`src/gondulf/dns.py`):
+- Current: TXT record verification with fallback resolvers
+- Change Impact: **None**
+- Reason: DNS service already implements TXT record verification as designed. Changing from "optional" to "required" is a business logic change, not a DNS service change.
+- Status: **No changes needed**
+
+**In-Memory Storage** (`src/gondulf/storage.py`):
+- Current: TTL-based code storage
+- Change Impact: **None**
+- Reason: Storage mechanism is independent of how email is discovered or whether DNS is optional/required.
+- Status: **No changes needed**
+
+**Database Schema** (`001_initial_schema.sql`):
+- Current: `domains` table with `domain`, `verification_method`, `verified_at`
+- Change Impact: **Minor update needed in Phase 2**
+- Reason: Schema already supports storing verification method. Will need to update from `'txt_record'` or `'email'` to `'two_factor'` when storing records.
+- Status: **Schema structure OK, values will change in Phase 2**
+
+**Configuration** (`src/gondulf/config.py`):
+- Current: SMTP configuration, DNS configuration, timeouts
+- Change Impact: **None immediately, optional addition in Phase 2**
+- Reason: Current configuration supports both email and DNS. May want to add timeout for HTML fetching in Phase 2.
+- Status: **No changes needed now**
+
+### Phase 1 Status: APPROVED
+
+Phase 1 implementation remains valid and does NOT require any revisions due to the authentication flow change. All Phase 1 components are foundational services that work regardless of how they're orchestrated in the authentication flow.
+
+## Phase 2 Requirements: New Implementation Needs
+
+Phase 2 must now implement the updated authentication flow. Here's what needs to be built:
+
+### 1. HTML Fetching Service (NEW)
+
+**Purpose**: Fetch user's homepage to discover rel="me" links
+
+**Implementation**:
+```python
+# src/gondulf/html_fetcher.py
+
+import requests
+from typing import Optional
+
+class HTMLFetcherService:
+    """
+    Fetch user's homepage over HTTPS.
+    """
+    def __init__(self, timeout: int = 10):
+        self.timeout = timeout
+        self.max_redirects = 5
+        self.max_size = 5 * 1024 * 1024  # 5MB
+
+    def fetch_site(self, domain: str) -> Optional[str]:
+        """
+        Fetch site HTML content.
+
+        Args:
+            domain: Domain to fetch (e.g., "example.com")
+
+        Returns:
+            HTML content as string, or None if fetch fails
+        """
+        url = f"https://{domain}"
+
+        try:
+            response = requests.get(
+                url,
+                timeout=self.timeout,
+                allow_redirects=True,
+                max_redirects=self.max_redirects,
+                verify=True  # Enforce SSL verification
+            )
+            response.raise_for_status()
+
+            # Check content size
+            if len(response.content) > self.max_size:
+                raise ValueError(f"Response too large: {len(response.content)} bytes")
+
+            return response.text
+
+        except requests.exceptions.SSLError as e:
+            logger.error(f"SSL verification failed for {domain}: {e}")
+            return None
+        except requests.exceptions.Timeout:
+            logger.error(f"Timeout fetching {domain}")
+            return None
+        except Exception as e:
+            logger.error(f"Failed to fetch {domain}: {e}")
+            return None
+```
+
+**Dependencies**:
+- `requests` library (already in pyproject.toml)
+- Timeout configuration (add to Config if needed)
+
+**Tests Required**:
+- Successful HTTPS fetch
+- SSL verification failure
+- Timeout handling
+- HTTP error codes (404, 500, etc.)
+- Redirect following
+- Size limit enforcement
+
+---
+
+### 2. rel="me" Email Discovery Service (NEW)
+
+**Purpose**: Parse HTML to discover email from rel="me" links
+
+**Implementation**:
+```python
+# src/gondulf/relme.py
+
+from bs4 import BeautifulSoup
+from typing import Optional
+import re
+
+class RelMeDiscoveryService:
+    """
+    Discover email addresses from rel="me" links in HTML.
+    """
+
+    def discover_email(self, html_content: str) -> Optional[str]:
+        """
+        Parse HTML and discover email from rel="me" link.
+
+        Args:
+            html_content: HTML content as string
+
+        Returns:
+            Email address or None if not found
+        """
+        try:
+            # Parse HTML (BeautifulSoup handles malformed HTML)
+            soup = BeautifulSoup(html_content, 'html.parser')
+
+            # Find all rel="me" links (<link> and <a> tags)
+            me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
+
+            # Look for mailto: links
+            for link in me_links:
+                href = link.get('href', '')
+                if href.startswith('mailto:'):
+                    email = href.replace('mailto:', '').strip()
+
+                    # Validate email format
+                    if self._validate_email_format(email):
+                        logger.info(f"Discovered email via rel='me': {email[:3]}***")
+                        return email
+
+            logger.warning("No rel='me' mailto: link found in HTML")
+            return None
+
+        except Exception as e:
+            logger.error(f"Failed to parse HTML: {e}")
+            return None
+
+    def _validate_email_format(self, email: str) -> bool:
+        """Validate email address format (RFC 5322 simplified)."""
+        email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
+
+        if not re.match(email_regex, email):
+            return False
+
+        if len(email) > 254:  # RFC 5321 maximum
+            return False
+
+        if email.count('@') != 1:
+            return False
+
+        return True
+```
+
+**Dependencies**:
+- `beautifulsoup4` library (add to pyproject.toml)
+- `html.parser` (Python standard library)
+
+**Tests Required**:
+- Discovery from `<link rel="me">` tags
+- Discovery from `<a rel="me">` tags
+- Multiple rel="me" links (select first mailto)
+- Malformed HTML handling
+- Missing rel="me" links
+- Invalid email format in link
+- Edge cases (empty href, non-mailto links, etc.)
+
+---
+
+### 3. Domain Verification Service (UPDATED)
+
+**Purpose**: Orchestrate two-factor verification (DNS + Email)
+
+**Implementation**:
+```python
+# src/gondulf/domain_verification.py
+
+from typing import Tuple, Optional
+from .dns import DNSService
+from .html_fetcher import HTMLFetcherService
+from .relme import RelMeDiscoveryService
+from .email import EmailService
+from .storage import CodeStorage
+
+class DomainVerificationService:
+    """
+    Two-factor domain verification service.
+
+    Verifies domain ownership through:
+    1. DNS TXT record verification
+    2. Email verification via rel="me" discovery
+    """
+
+    def __init__(
+        self,
+        dns_service: DNSService,
+        html_fetcher: HTMLFetcherService,
+        relme_discovery: RelMeDiscoveryService,
+        email_service: EmailService,
+        code_storage: CodeStorage
+    ):
+        self.dns = dns_service
+        self.html_fetcher = html_fetcher
+        self.relme = relme_discovery
+        self.email = email_service
+        self.code_storage = code_storage
+
+    def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]:
+        """
+        Start domain verification process.
+
+        Returns: (success, discovered_email, error_message)
+
+        Raises HTTPException with appropriate error if verification cannot start.
+        """
+        # Step 1: Verify DNS TXT record
+        dns_verified = self.dns.verify_txt_record(domain, "verified")
+        if not dns_verified:
+            error = f"DNS TXT record not found for {domain}. Please add: _gondulf.{domain} TXT verified"
+            return False, None, error
+
+        # Step 2: Fetch site and discover email
+        html = self.html_fetcher.fetch_site(domain)
+        if html is None:
+            error = f"Could not fetch site at https://{domain}. Please ensure site is accessible via HTTPS."
+            return False, None, error
+
+        # Step 3: Discover email from rel="me"
+        email = self.relme.discover_email(html)
+        if email is None:
+            error = 'No rel="me" mailto: link found. Please add: <link rel="me" href="mailto:you@example.com">'
+            return False, None, error
+
+        # Step 4: Generate and send verification code
+        code = self._generate_code()
+        self.code_storage.store(email, code, ttl=900)  # 15 minutes
+
+        email_sent = self.email.send_verification_email(email, code)
+        if not email_sent:
+            error = f"Failed to send verification email to {email}. Please try again."
+            return False, email, error
+
+        # Success: code sent to discovered email
+        return True, email, None
+
+    def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, str]:
+        """
+        Verify submitted code.
+
+        Returns: (success, domain_or_error_message)
+        """
+        stored_data = self.code_storage.get(email)
+
+        if stored_data is None:
+            return False, "No verification code found. Please restart verification."
+
+        code, domain = stored_data
+
+        # Verify code (constant-time comparison)
+        if not secrets.compare_digest(submitted_code, code):
+            return False, "Invalid code. Please try again."
+
+        # Success: mark code as used
+        self.code_storage.delete(email)
+
+        return True, domain
+
+    def _generate_code(self) -> str:
+        """Generate 6-digit verification code."""
+        return ''.join(secrets.choice('0123456789') for _ in range(6))
+```
+
+**Dependencies**:
+- All Phase 1 services (DNS, Email, Storage)
+- New HTML fetcher service
+- New rel="me" discovery service
+
+**Tests Required**:
+- Full verification flow (DNS → rel="me" → email → code)
+- DNS verification failure
+- Site fetch failure
+- rel="me" discovery failure
+- Email send failure
+- Code verification success/failure
+- Multiple attempts tracking
+- Code expiration
+
+---
+
+### 4. Domain Verification UI Endpoints (NEW)
+
+**Purpose**: HTTP endpoints for user interaction
+
+**Implementation**:
+```python
+# src/gondulf/routers/verification.py
+
+from fastapi import APIRouter, HTTPException
+from pydantic import BaseModel
+
+router = APIRouter(prefix="/verify", tags=["verification"])
+
+class VerificationStartRequest(BaseModel):
+    domain: str
+
+class VerificationStartResponse(BaseModel):
+    success: bool
+    email_masked: Optional[str]  # e.g., "u***@example.com"
+    error: Optional[str]
+
+class VerificationCodeRequest(BaseModel):
+    email: str
+    code: str
+
+class VerificationCodeResponse(BaseModel):
+    success: bool
+    domain: Optional[str]
+    error: Optional[str]
+
+@router.post("/start", response_model=VerificationStartResponse)
+async def start_verification(request: VerificationStartRequest):
+    """
+    Start domain verification process.
+
+    Steps:
+    1. Verify DNS TXT record
+    2. Discover email from rel="me"
+    3. Send verification code to email
+    """
+    success, email, error = domain_verification_service.start_verification(request.domain)
+
+    if not success:
+        return VerificationStartResponse(success=False, email_masked=None, error=error)
+
+    # Mask email for display: u***@example.com
+    masked_email = f"{email[0]}***@{email.split('@')[1]}"
+
+    return VerificationStartResponse(
+        success=True,
+        email_masked=masked_email,
+        error=None
+    )
+
+@router.post("/code", response_model=VerificationCodeResponse)
+async def verify_code(request: VerificationCodeRequest):
+    """
+    Verify submitted code.
+
+    Returns domain if code is valid.
+    """
+    success, result = domain_verification_service.verify_code(request.email, request.code)
+
+    if not success:
+        return VerificationCodeResponse(success=False, domain=None, error=result)
+
+    return VerificationCodeResponse(success=True, domain=result, error=None)
+```
+
+**Dependencies**:
+- FastAPI router
+- Pydantic models
+- Domain verification service
+
+**Tests Required**:
+- POST /verify/start success case
+- POST /verify/start with DNS failure
+- POST /verify/start with rel="me" failure
+- POST /verify/start with email send failure
+- POST /verify/code success case
+- POST /verify/code with invalid code
+- POST /verify/code with expired code
+- POST /verify/code with missing code
+
+---
+
+### 5. Authorization Endpoint Integration (UPDATED)
+
+**Changes to Authorization Flow**:
+
+**Before** (original design):
+```
+1. User enters domain (me parameter)
+2. Display form: "Enter your email at {domain}"
+3. User enters email manually
+4. Send code, user enters code
+5. Display consent screen
+```
+
+**After** (updated design):
+```
+1. User enters domain (me parameter)
+2. Server performs two-factor verification:
+   a. Verify DNS TXT record
+   b. Discover email from rel="me"
+   c. Send code to discovered email
+3. Display code entry form (show discovered email masked)
+4. User enters code
+5. Display consent screen
+```
+
+**Implementation Changes**:
+- Call `DomainVerificationService.start_verification()` instead of requesting email from user
+- Update UI to show "Sending code to u***@example.com" instead of email input form
+- Handle new error cases (DNS not found, rel="me" not found, site unreachable)
+
+---
+
+## Phase 2 Feature Breakdown
+
+### New Dependencies to Add
+
+**pyproject.toml additions**:
+```toml
+[project]
+dependencies = [
+    # ... existing dependencies
+    "beautifulsoup4>=4.12.0",  # HTML parsing for rel="me" discovery
+]
+```
+
+### New Source Files
+
+1. `src/gondulf/html_fetcher.py` - HTML fetching service
+2. `src/gondulf/relme.py` - rel="me" email discovery service
+3. `src/gondulf/domain_verification.py` - Two-factor verification orchestration
+4. `src/gondulf/routers/verification.py` - Verification endpoints (if implemented separately from authorization)
+
+### Updated Files
+
+1. `src/gondulf/main.py` - Register new routers, initialize new services
+2. `src/gondulf/config.py` - Optional: add HTML fetch timeout config
+3. Database migration (002_update_verification_method.sql) - Change domain.verification_method values
+
+### New Test Files
+
+1. `tests/unit/test_html_fetcher.py` - HTML fetching tests
+2. `tests/unit/test_relme.py` - rel="me" discovery tests
+3. `tests/unit/test_domain_verification.py` - Verification service tests
+4. `tests/integration/test_verification_endpoints.py` - Verification endpoint tests
+
+### Estimated Effort
+
+**New Components**:
+- HTML Fetcher Service: 0.5 days
+- rel="me" Discovery Service: 0.5 days
+- Domain Verification Service: 1 day
+- Verification Endpoints: 0.5 days
+- Tests (all new components): 1 day
+
+**Total New Work**: ~3.5 days
+
+**Authorization Endpoint** (already planned):
+- Original estimate: 3-5 days
+- Updated estimate: 3-5 days (same - just uses DomainVerificationService)
+
+## Database Schema Updates
+
+### Migration: 002_update_verification_method.sql
+
+```sql
+-- Update verification_method values from single-factor to two-factor
+-- This is a data migration, not schema change
+
+UPDATE domains
+SET verification_method = 'two_factor'
+WHERE verification_method IN ('txt_record', 'email');
+
+-- No schema changes needed - 'verification_method' column already exists
+```
+
+**When to Apply**: Phase 2, before authorization endpoint implementation
+
+## Error Message Updates
+
+### DNS TXT Not Found
+
+```
+DNS Verification Failed
+
+Please add this TXT record to your domain's DNS:
+
+Type: TXT
+Name: _gondulf.example.com
+Value: verified
+
+DNS changes may take up to 24 hours to propagate.
+
+Need help? See: https://docs.gondulf.example.com/setup/dns
+```
+
+### rel="me" Not Found
+
+```
+Email Discovery Failed
+
+Could not find a rel="me" email link on your homepage.
+
+Please add this to your homepage (https://example.com):
+<link rel="me" href="mailto:your-email@example.com">
+
+This declares your email address for IndieAuth verification.
+
+Learn more: https://indieweb.org/rel-me
+```
+
+### Site Unreachable
+
+```
+Site Fetch Failed
+
+Could not fetch your site at https://example.com
+
+Please check:
+- Site is accessible via HTTPS
+- SSL certificate is valid
+- No firewall blocking requests
+
+Try again once your site is accessible.
+```
+
+### Email Send Failure
+
+```
+Email Delivery Failed
+
+Failed to send verification code to u***@example.com
+
+Please check:
+- Email address is correct in your rel="me" link
+- Email server is accepting mail
+- Check spam/junk folder
+
+Try again, or contact support if the issue persists.
+```
+
+## Documentation Updates Needed
+
+### User Documentation (Phase 2)
+
+1. **Setup Guide**: `/docs/user/setup.md`
+   - Step 1: Add DNS TXT record
+   - Step 2: Add rel="me" link to homepage
+   - Step 3: Test verification
+
+2. **Troubleshooting**: `/docs/user/troubleshooting.md`
+   - DNS verification failures
+   - rel="me" discovery issues
+   - Email delivery problems
+
+3. **Examples**: `/docs/user/examples.md`
+   - Example HTML with rel="me" link
+   - Example DNS configuration (various providers)
+
+### Developer Documentation (Phase 2)
+
+1. **API Reference**: `/docs/api/verification.md`
+   - POST /verify/start endpoint
+   - POST /verify/code endpoint
+   - Error codes and responses
+
+2. **Architecture**: `/docs/architecture/domain-verification.md`
+   - Two-factor verification flow diagram
+   - Service interaction diagram
+   - Error handling flowchart
+
+## Security Considerations for Phase 2
+
+### New Attack Surfaces
+
+1. **HTML Parsing**:
+   - Risk: Malicious HTML exploiting parser
+   - Mitigation: BeautifulSoup handles untrusted HTML safely
+   - Test: Fuzzing with malformed HTML
+
+2. **HTTPS Fetching**:
+   - Risk: SSL verification bypass
+   - Mitigation: Enforce `verify=True` in requests
+   - Test: Attempt to fetch site with invalid certificate (must fail)
+
+3. **rel="me" Spoofing**:
+   - Risk: Attacker adds rel="me" to compromised site
+   - Mitigation: Two-factor requirement (also need DNS control)
+   - Test: Verify DNS check happens BEFORE rel="me" discovery
+
+### Security Testing Required
+
+1. **Input Validation**:
+   - Malformed domain names
+   - Oversized HTML responses (>5MB)
+   - Invalid email formats in rel="me" links
+
+2. **TLS Enforcement**:
+   - Verify HTTPS-only fetching
+   - Verify SSL certificate validation
+   - Reject sites with invalid certificates
+
+3. **Rate Limiting** (future):
+   - Prevent bulk rel="me" discovery
+   - Limit verification attempts per domain
+
+## Configuration Updates
+
+### Optional New Config
+
+```python
+# src/gondulf/config.py
+
+class Config:
+    # ... existing config
+
+    # HTML Fetching (optional, has sensible defaults)
+    HTML_FETCH_TIMEOUT: int = 10  # seconds
+    HTML_MAX_SIZE: int = 5 * 1024 * 1024  # 5MB
+    HTML_MAX_REDIRECTS: int = 5
+```
+
+### Environment Variables
+
+```bash
+# .env.example additions (optional)
+
+# HTML Fetching Configuration (optional - has defaults)
+GONDULF_HTML_FETCH_TIMEOUT=10  # Timeout for fetching user's site (seconds)
+GONDULF_HTML_MAX_SIZE=5242880  # Maximum HTML size (bytes, default 5MB)
+GONDULF_HTML_MAX_REDIRECTS=5   # Maximum redirects to follow
+```
+
+## Testing Strategy for Phase 2
+
+### Unit Tests
+
+**HTML Fetcher**:
+- Mock successful HTTPS response
+- Mock SSL verification failure
+- Mock timeout
+- Mock HTTP errors (404, 500, etc.)
+- Mock size limit exceeded
+- Mock redirect following
+
+**rel="me" Discovery**:
+- Parse `<link rel="me" href="mailto:...">`
+- Parse `<a rel="me" href="mailto:...">`
+- Handle malformed HTML
+- Handle missing rel="me" links
+- Handle invalid email in link
+- Handle multiple rel="me" links (select first)
+
+**Domain Verification Service**:
+- Full two-factor flow success
+- DNS verification failure
+- Site fetch failure
+- rel="me" discovery failure
+- Email send failure
+- Code verification success/failure
+
+### Integration Tests
+
+**Verification Endpoints**:
+- POST /verify/start with valid domain (mock services)
+- POST /verify/start with DNS failure
+- POST /verify/start with rel="me" failure
+- POST /verify/code with valid code
+- POST /verify/code with invalid code
+
+### End-to-End Tests (Future)
+
+- Complete verification flow with real HTML
+- Authorization flow integration
+- Token issuance after successful verification
+
+## Acceptance Criteria for Phase 2
+
+Phase 2 will be considered complete when:
+
+1. ✅ HTML fetcher service implemented and tested
+2. ✅ rel="me" discovery service implemented and tested
+3. ✅ Domain verification service orchestrates two-factor verification
+4. ✅ Verification endpoints return correct responses for all cases
+5. ✅ Error messages are clear and actionable
+6. ✅ All new tests passing (unit + integration)
+7. ✅ Test coverage remains >80% overall
+8. ✅ Security testing complete (HTML parsing, TLS enforcement)
+9. ✅ Documentation updated (user setup guide, API reference)
+10. ✅ Database migration applied successfully
+
+## Timeline Estimate
+
+**Phase 2 Components**:
+- HTML Fetcher: 0.5 days
+- rel="me" Discovery: 0.5 days
+- Domain Verification Service: 1 day
+- Verification Endpoints: 0.5 days
+- Testing: 1 day
+- Documentation: 0.5 days
+
+**Total New Work**: ~4 days
+
+**Authorization Endpoint** (already planned):
+- Original estimate: 3-5 days
+- Updated estimate: 3-5 days (uses DomainVerificationService)
+
+**Phase 2 Total**: ~7-9 days (vs. original estimate of 3-5 days)
+
+**Impact**: +4 days of work due to authentication flow change
+
+## Recommendation
+
+**Phase 1**: APPROVED as-is. No changes needed.
+
+**Phase 2**: Proceed with implementation of:
+1. HTML fetching service
+2. rel="me" discovery service
+3. Domain verification service (two-factor orchestration)
+4. Verification endpoints
+5. Updated authorization endpoint to use domain verification service
+
+The additional work (HTML fetching + rel="me" discovery) adds ~4 days to Phase 2, bringing total Phase 2 estimate to 7-9 days instead of original 3-5 days.
+
+## Sign-off
+
+**Assessment Status**: Complete
+**Phase 1 Impact**: None - Phase 1 approved as-is
+**Phase 2 Impact**: Additional 4 days of work for new services
+**Risk Level**: Low - All new work is well-scoped and testable
+**Ready to Proceed**: Yes
+
+---
+
+**Assessment completed**: 2025-11-20
+**Architect**: Claude (Architect Agent)
--- a/docs/architecture/security.md
+++ b/docs/architecture/security.md
@@ -58,108 +58,174 @@ Gondulf follows a defense-in-depth security model with these core principles:

 ## Authentication Security

-### Email-Based Verification (v1.0.0)
+### Two-Factor Domain Verification (v1.0.0)

-**Mechanism**: Users prove domain ownership by receiving verification code at email address on that domain.
+**Mechanism**: Users prove domain ownership through TWO independent factors:
+1. **DNS TXT Record**: Proves DNS control (`_gondulf.{domain}` = `verified`)
+2. **Email via rel="me"**: Proves email control (discovered from site's rel="me" link)
+
+**Security Model**: An attacker must compromise BOTH factors to authenticate fraudulently. This is significantly stronger than single-factor verification.

 #### Threat: Email Interception

 **Risk**: Attacker intercepts email containing verification code.

 **Mitigations**:
-1. **Short Code Lifetime**: 15-minute expiration
-2. **Single Use**: Code invalidated after verification
-3. **Rate Limiting**: Max 3 code requests per email per hour
-4. **TLS Email Delivery**: Require STARTTLS for SMTP
-5. **Display Warning**: "Only request code if you initiated this login"
+1. **Two-Factor Requirement**: Email alone is insufficient (DNS also required)
+2. **Short Code Lifetime**: 15-minute expiration
+3. **Single Use**: Code invalidated after verification
+4. **Rate Limiting**: Max 3 code requests per domain per hour
+5. **TLS Email Delivery**: Require STARTTLS for SMTP
+6. **Display Warning**: "Only request code if you initiated this login"

-**Residual Risk**: Acceptable for v1.0.0 given short lifetime and single-use.
+**Residual Risk**: Low. Even with email interception, attacker still needs DNS control.

 #### Threat: Code Brute Force

 **Risk**: Attacker guesses 6-digit verification code.

 **Mitigations**:
-1. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
-2. **Attempt Limiting**: Max 3 attempts per email
-3. **Short Lifetime**: 15-minute window
-4. **Rate Limiting**: Max 10 attempts per IP per hour
-5. **Exponential Backoff**: 5-second delay after each failed attempt
+1. **Two-Factor Requirement**: Code alone is insufficient (DNS also required)
+2. **Sufficient Entropy**: 1,000,000 possible codes (6 digits)
+3. **Attempt Limiting**: Max 3 attempts per email
+4. **Short Lifetime**: 15-minute window
+5. **Rate Limiting**: Max 3 codes per domain per hour
+6. **Single-Use**: Code invalidated after use

 **Math**:
 - 3 attempts × 1,000,000 codes = 0.0003% success probability
 - 15-minute window limits attack time
- Rate limiting prevents distributed guessing
+- Even if guessed, attacker still needs DNS control

-**Residual Risk**: Very low, acceptable for v1.0.0.
+**Residual Risk**: Very low. Two-factor requirement makes brute force insufficient.
+
+#### Threat: DNS TXT Record Spoofing
+
+**Risk**: Attacker attempts to spoof DNS responses.
+
+**Mitigations**:
+1. **Multiple Resolvers**: Query 2+ independent DNS servers (Google, Cloudflare)
+2. **Consensus Required**: Require agreement from at least 2 resolvers
+3. **DNSSEC Support**: Validate DNSSEC signatures when available (future)
+4. **Timeout Handling**: Fail securely if DNS unavailable
+5. **Logging**: Log all DNS verification attempts
+
+**Residual Risk**: Low. Spoofing multiple independent resolvers is difficult.
+
+#### Threat: rel="me" Link Spoofing
+
+**Risk**: Attacker compromises user's website to add malicious rel="me" link.
+
+**Mitigations**:
+1. **Two-Factor Requirement**: Website compromise alone insufficient (DNS also required)
+2. **HTTPS Required**: Fetch site over TLS (prevents MITM)
+3. **Certificate Validation**: Verify SSL certificate
+4. **Email Domain Matching**: Email should match site domain (warning if not)
+5. **User Education**: Inform users to secure their website
+
+**Residual Risk**: Moderate. If attacker compromises both DNS and website, they can authenticate. This is acceptable as it represents full domain compromise.

 #### Threat: Email Address Enumeration

-**Risk**: Attacker discovers which domains are registered by requesting codes.
+**Risk**: Attacker discovers email addresses by triggering rel="me" discovery.

 **Mitigations**:
-1. **Consistent Response**: Always say "If email exists, code sent"
-2. **No Error Differentiation**: Same message for valid/invalid emails
-3. **Rate Limiting**: Prevent bulk enumeration
+1. **Public Information**: rel="me" links are intentionally public
+2. **User Awareness**: Users know they're publishing email on their site
+3. **Rate Limiting**: Prevent bulk scanning
+4. **Robots.txt**: Users can restrict crawler access if desired

-**Residual Risk**: Minimal, domain names are public anyway (DNS).
+**Residual Risk**: Minimal. Email addresses are intentionally published by users on their own sites.

-### Domain Ownership Verification
+### Domain Ownership Verification (Two-Factor)

-#### TXT Record Validation (Preferred)
+**Mechanism**: v1.0.0 requires BOTH verification methods:

-**Mechanism**: Admin adds DNS TXT record `_gondulf.example.com` = `verified`.
+#### 1. TXT Record Validation (Required)
+
+**Mechanism**: Admin adds DNS TXT record `_gondulf.{domain}` = `verified`.

 **Security Properties**:
- Requires DNS control (stronger than email)
+- Proves DNS control (first factor)
 - Verifiable without user interaction
 - Cacheable for performance
 - Re-verifiable periodically

-**Threat: DNS Spoofing**
-
-**Mitigations**:
-1. **DNSSEC**: Validate DNSSEC signatures if available
-2. **Multiple Resolvers**: Query 2+ DNS servers, require consensus
-3. **Caching**: Cache valid results, re-verify daily
-4. **Logging**: Log all DNS verification attempts
-
 **Implementation**:
 ```python
 import dns.resolver
-import dns.dnssec

 def verify_txt_record(domain: str) -> bool:
    """
    Verify _gondulf.{domain} TXT record exists with value 'verified'.
+    Requires consensus from multiple independent resolvers.
    """
    try:
        # Use Google and Cloudflare DNS for redundancy
        resolvers = ['8.8.8.8', '1.1.1.1']
-        results = []
+        verified_count = 0

        for resolver_ip in resolvers:
            resolver = dns.resolver.Resolver()
            resolver.nameservers = [resolver_ip]
            resolver.timeout = 5
-            resolver.lifetime = 5

            answers = resolver.resolve(f'_gondulf.{domain}', 'TXT')
            for rdata in answers:
                txt_value = rdata.to_text().strip('"')
                if txt_value == 'verified':
-                    results.append(True)
+                    verified_count += 1
                    break

-        # Require consensus from both resolvers
-        return len(results) >= 2
+        # Require consensus from at least 2 resolvers
+        return verified_count >= 2

    except Exception as e:
        logger.warning(f"DNS verification failed for {domain}: {e}")
        return False
 ```

-**Residual Risk**: Low, DNS is foundational internet infrastructure.
+#### 2. Email Verification via rel="me" (Required)
+
+**Mechanism**: Email discovered from site's `<link rel="me" href="mailto:...">`, then verified with code.
+
+**Security Properties**:
+- Proves website control (can modify HTML)
+- Proves email control (receives and enters code)
+- Follows IndieWeb standards (rel="me")
+- Self-documenting (user declares email publicly)
+
+**Implementation**:
+```python
+from bs4 import BeautifulSoup
+import requests
+
+def discover_email_from_site(domain: str) -> Optional[str]:
+    """
+    Fetch site and discover email from rel="me" link.
+    """
+    try:
+        response = requests.get(f"https://{domain}", timeout=10, allow_redirects=True)
+        response.raise_for_status()
+
+        soup = BeautifulSoup(response.content, 'html.parser')
+        me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
+
+        for link in me_links:
+            href = link.get('href', '')
+            if href.startswith('mailto:'):
+                email = href.replace('mailto:', '').strip()
+                if validate_email_format(email):
+                    return email
+
+        return None
+
+    except Exception as e:
+        logger.error(f"Failed to discover email for {domain}: {e}")
+        return None
+```
+
+**Combined Residual Risk**: Low. Attacker must compromise DNS, website, and email account to authenticate fraudulently.

 ## Authorization Security

@@ -431,15 +497,80 @@ class AuthorizeRequest(BaseModel):

 **Residual Risk**: Minimal, Pydantic provides strong validation.

+### HTML Parsing Security (rel="me" Discovery)
+
+#### Threat: Malicious HTML Injection
+
+**Risk**: Attacker's site contains malicious HTML to exploit parser.
+
+**Mitigations**:
+1. **Robust Parser**: Use BeautifulSoup (handles malformed HTML safely)
+2. **Link Extraction Only**: Only extract href attributes, no script execution
+3. **Timeout**: 10-second timeout for HTTP requests
+4. **Size Limit**: Limit response size (prevent memory exhaustion)
+5. **HTTPS Required**: Fetch over TLS only
+6. **Certificate Validation**: Verify SSL certificates
+
+**Implementation**:
+```python
+from bs4 import BeautifulSoup
+import requests
+
+def discover_email_from_site(domain: str) -> Optional[str]:
+    """
+    Safely discover email from rel="me" link.
+    """
+    try:
+        # Fetch with safety limits
+        response = requests.get(
+            f"https://{domain}",
+            timeout=10,
+            allow_redirects=True,
+            max_redirects=5,
+            stream=True  # Don't load entire response into memory
+        )
+        response.raise_for_status()
+
+        # Limit response size (prevent memory exhaustion)
+        MAX_SIZE = 5 * 1024 * 1024  # 5MB
+        content = response.raw.read(MAX_SIZE)
+
+        # Parse HTML (BeautifulSoup handles malformed HTML safely)
+        soup = BeautifulSoup(content, 'html.parser')
+
+        # Find rel="me" links (no script execution)
+        me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
+
+        # Extract mailto: links only
+        for link in me_links:
+            href = link.get('href', '')
+            if href.startswith('mailto:'):
+                email = href.replace('mailto:', '').strip()
+                # Validate email format before returning
+                if validate_email_format(email):
+                    return email
+
+        return None
+
+    except requests.exceptions.SSLError as e:
+        logger.error(f"SSL certificate validation failed for {domain}: {e}")
+        return None
+    except Exception as e:
+        logger.error(f"Failed to discover email for {domain}: {e}")
+        return None
+```
+
+**Residual Risk**: Very low. BeautifulSoup is designed for untrusted HTML.
+
 ### Email Validation

 #### Threat: Email Injection Attacks

-**Risk**: Attacker injects SMTP commands via email address field.
+**Risk**: Attacker crafts malicious email address in rel="me" link.

 **Mitigations**:
 1. **Format Validation**: Strict email regex (RFC 5322)
-2. **Domain Matching**: Require email domain match `me` domain
+2. **No User Input**: Email discovered from site (not user-provided)
 3. **SMTP Library**: Use well-tested library (smtplib)
 4. **Content Encoding**: Encode email content properly
 5. **Rate Limiting**: Prevent abuse
@@ -447,31 +578,27 @@ class AuthorizeRequest(BaseModel):
 **Validation**:
 ```python
 import re
-from email.utils import parseaddr

-def validate_email(email: str, required_domain: str) -> tuple[bool, str]:
+def validate_email_format(email: str) -> bool:
    """
-    Validate email address and domain match.
+    Validate email address format.
    """
-    # Parse email (RFC 5322 compliant)
-    name, addr = parseaddr(email)
-
-    # Basic format check
+    # Basic format check (RFC 5322 simplified)
    email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
-    if not re.match(email_regex, addr):
-        return False, "Invalid email format"
+    if not re.match(email_regex, email):
+        return False

-    # Extract domain
-    email_domain = addr.split('@')[1].lower()
-    required_domain = required_domain.lower()
+    # Sanity checks
+    if len(email) > 254:  # RFC 5321 maximum
+        return False
+    if email.count('@') != 1:
+        return False

-    # Domain must match
-    if email_domain != required_domain:
-        return False, f"Email must be at {required_domain}"
-
-    return True, ""
+    return True
 ```

+**Note**: Domain matching is NOT enforced in v1.0.0. User may have email at different domain than their identity site (e.g., phil@gmail.com for phil.example.com). This is acceptable as user explicitly publishes the email on their site.
+
 **Residual Risk**: Low, standard validation patterns.

 ## Network Security
@@ -567,21 +694,29 @@ async def add_security_headers(request: Request, call_next):

 **Email Handling**:
 ```python
-# Email stored ONLY during verification (in-memory, 15-min TTL)
+# Email discovered from rel="me" link (not user-provided)
+# Stored ONLY during verification (in-memory, 15-min TTL)
 verification_codes[code_id] = {
-    "email": email,  # ← Exists ONLY here, NEVER in database
+    "email": email,  # ← Discovered from site, exists ONLY here, NEVER in database
    "code": code,
+    "domain": domain,
    "expires_at": datetime.utcnow() + timedelta(minutes=15)
 }

-# After verification: email is deleted, only domain stored
+# After verification: email is deleted, only domain + timestamp stored
 db.execute('''
-    INSERT INTO domains (domain, verification_method, verified_at)
-    VALUES (?, 'email', ?)
-''', (domain, datetime.utcnow()))
-# Note: NO email address in database
+    INSERT INTO domains (domain, verification_method, verified_at, last_email_check)
+    VALUES (?, 'two_factor', ?, ?)
+''', (domain, datetime.utcnow(), datetime.utcnow()))
+# Note: NO email address in database, only verification timestamp
 ```

+**rel="me" Discovery**:
+- Email addresses are public (user publishes on their site)
+- Server fetches email from user's site (not user input)
+- Reduces social engineering risk (can't claim arbitrary email)
+- Follows IndieWeb standards for identity
+
 ### Database Security

 **SQLite Security**:
@@ -829,13 +964,15 @@ security:
 ## Security Roadmap

 ### v1.0.0 (MVP)
- ✅ Email-based authentication
+- ✅ Two-factor domain verification (DNS TXT + Email via rel="me")
+- ✅ rel="me" email discovery (IndieWeb standard)
+- ✅ HTML parsing security (BeautifulSoup)
 - ✅ TLS/HTTPS enforcement
 - ✅ Secure token generation (opaque, hashed)
 - ✅ URL validation (open redirect prevention)
 - ✅ Input validation (Pydantic)
 - ✅ Security headers
- ✅ Minimal data collection
+- ✅ Minimal data collection (no email storage)

 ### v1.1.0
 - PKCE support (code challenge/verifier)