Files
Gondulf/docs/architecture/phase-1-impact-assessment.md
Phil Skentelbery 6f06aebf40 docs: add Phase 2 domain verification design and clarifications
Add comprehensive Phase 2 documentation:
- Complete design document for two-factor domain verification
- Implementation guide with code examples
- ADR for implementation decisions (ADR-0004)
- ADR for rel="me" email discovery (ADR-008)
- Phase 1 impact assessment
- All 23 clarification questions answered
- Updated architecture docs (indieauth-protocol, security)
- Updated ADR-005 with rel="me" approach
- Updated backlog with technical debt items

Design ready for Phase 2 implementation.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 13:05:09 -07:00

24 KiB

Phase 1 Impact Assessment: Authentication Flow Change

Date: 2025-11-20 Architect: Claude (Architect Agent) Related ADRs: ADR-005 (updated), ADR-008 (new) Related Report: /docs/reports/2025-11-20-phase-1-foundation.md

Summary

The authentication design has been updated to require BOTH DNS TXT verification AND email verification via rel="me" discovery. This change impacts Phase 1 implementation and defines new requirements for Phase 2.

Authentication Flow Change

Original Design (ADR-005 v1)

  • Primary: Email verification (user provides email)
  • Optional: DNS TXT verification (fast-path to skip email)
  • Flow: DNS check → if not found, request email → send code → verify code

Updated Design (ADR-005 v2 + ADR-008)

  • Required Factor 1: DNS TXT verification (_gondulf.{domain} = verified)
  • Required Factor 2: Email verification via rel="me" discovery
  • Flow: DNS check → rel="me" discovery → send code to discovered email → verify code

Key Differences

Aspect Original Updated
DNS TXT Optional (fast-path) Required (first factor)
Email Discovery User input rel="me" link parsing
Email Verification Optional (fallback) Required (second factor)
Security Model Single-factor Two-factor
Attack Resistance Moderate High (requires DNS + email control)
Setup Complexity Lower (email only works) Higher (both required)

Phase 1 Implementation Impact

What Phase 1 Implemented

Phase 1 successfully implemented:

  • Configuration management (GONDULF_* environment variables)
  • Database layer with migrations (SQLite, SQLAlchemy Core)
  • In-memory code storage (TTL-based expiration)
  • Email service (SMTP with STARTTLS support)
  • DNS service (TXT record querying with fallback resolvers)
  • Structured logging
  • FastAPI application with health check endpoint
  • 94.16% test coverage (96 tests passing)

Does Phase 1 Need Changes?

Answer: NO. Phase 1 implementation remains valid.

Analysis

Email Service (src/gondulf/email.py):

  • Current: Generic email sending service
  • Change Impact: None
  • Reason: Email service sends codes to any email address. Whether email is user-provided or rel="me"-discovered doesn't affect this service.
  • Status: No changes needed

DNS Service (src/gondulf/dns.py):

  • Current: TXT record verification with fallback resolvers
  • Change Impact: None
  • Reason: DNS service already implements TXT record verification as designed. Changing from "optional" to "required" is a business logic change, not a DNS service change.
  • Status: No changes needed

In-Memory Storage (src/gondulf/storage.py):

  • Current: TTL-based code storage
  • Change Impact: None
  • Reason: Storage mechanism is independent of how email is discovered or whether DNS is optional/required.
  • Status: No changes needed

Database Schema (001_initial_schema.sql):

  • Current: domains table with domain, verification_method, verified_at
  • Change Impact: Minor update needed in Phase 2
  • Reason: Schema already supports storing verification method. Will need to update from 'txt_record' or 'email' to 'two_factor' when storing records.
  • Status: Schema structure OK, values will change in Phase 2

Configuration (src/gondulf/config.py):

  • Current: SMTP configuration, DNS configuration, timeouts
  • Change Impact: None immediately, optional addition in Phase 2
  • Reason: Current configuration supports both email and DNS. May want to add timeout for HTML fetching in Phase 2.
  • Status: No changes needed now

Phase 1 Status: APPROVED

Phase 1 implementation remains valid and does NOT require any revisions due to the authentication flow change. All Phase 1 components are foundational services that work regardless of how they're orchestrated in the authentication flow.

Phase 2 Requirements: New Implementation Needs

Phase 2 must now implement the updated authentication flow. Here's what needs to be built:

1. HTML Fetching Service (NEW)

Purpose: Fetch user's homepage to discover rel="me" links

Implementation:

# src/gondulf/html_fetcher.py

import requests
from typing import Optional

class HTMLFetcherService:
    """
    Fetch user's homepage over HTTPS.
    """
    def __init__(self, timeout: int = 10):
        self.timeout = timeout
        self.max_redirects = 5
        self.max_size = 5 * 1024 * 1024  # 5MB

    def fetch_site(self, domain: str) -> Optional[str]:
        """
        Fetch site HTML content.

        Args:
            domain: Domain to fetch (e.g., "example.com")

        Returns:
            HTML content as string, or None if fetch fails
        """
        url = f"https://{domain}"

        try:
            response = requests.get(
                url,
                timeout=self.timeout,
                allow_redirects=True,
                max_redirects=self.max_redirects,
                verify=True  # Enforce SSL verification
            )
            response.raise_for_status()

            # Check content size
            if len(response.content) > self.max_size:
                raise ValueError(f"Response too large: {len(response.content)} bytes")

            return response.text

        except requests.exceptions.SSLError as e:
            logger.error(f"SSL verification failed for {domain}: {e}")
            return None
        except requests.exceptions.Timeout:
            logger.error(f"Timeout fetching {domain}")
            return None
        except Exception as e:
            logger.error(f"Failed to fetch {domain}: {e}")
            return None

Dependencies:

  • requests library (already in pyproject.toml)
  • Timeout configuration (add to Config if needed)

Tests Required:

  • Successful HTTPS fetch
  • SSL verification failure
  • Timeout handling
  • HTTP error codes (404, 500, etc.)
  • Redirect following
  • Size limit enforcement

2. rel="me" Email Discovery Service (NEW)

Purpose: Parse HTML to discover email from rel="me" links

Implementation:

# src/gondulf/relme.py

from bs4 import BeautifulSoup
from typing import Optional
import re

class RelMeDiscoveryService:
    """
    Discover email addresses from rel="me" links in HTML.
    """

    def discover_email(self, html_content: str) -> Optional[str]:
        """
        Parse HTML and discover email from rel="me" link.

        Args:
            html_content: HTML content as string

        Returns:
            Email address or None if not found
        """
        try:
            # Parse HTML (BeautifulSoup handles malformed HTML)
            soup = BeautifulSoup(html_content, 'html.parser')

            # Find all rel="me" links (<link> and <a> tags)
            me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')

            # Look for mailto: links
            for link in me_links:
                href = link.get('href', '')
                if href.startswith('mailto:'):
                    email = href.replace('mailto:', '').strip()

                    # Validate email format
                    if self._validate_email_format(email):
                        logger.info(f"Discovered email via rel='me': {email[:3]}***")
                        return email

            logger.warning("No rel='me' mailto: link found in HTML")
            return None

        except Exception as e:
            logger.error(f"Failed to parse HTML: {e}")
            return None

    def _validate_email_format(self, email: str) -> bool:
        """Validate email address format (RFC 5322 simplified)."""
        email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'

        if not re.match(email_regex, email):
            return False

        if len(email) > 254:  # RFC 5321 maximum
            return False

        if email.count('@') != 1:
            return False

        return True

Dependencies:

  • beautifulsoup4 library (add to pyproject.toml)
  • html.parser (Python standard library)

Tests Required:

  • Discovery from <link rel="me"> tags
  • Discovery from <a rel="me"> tags
  • Multiple rel="me" links (select first mailto)
  • Malformed HTML handling
  • Missing rel="me" links
  • Invalid email format in link
  • Edge cases (empty href, non-mailto links, etc.)

3. Domain Verification Service (UPDATED)

Purpose: Orchestrate two-factor verification (DNS + Email)

Implementation:

# src/gondulf/domain_verification.py

from typing import Tuple, Optional
from .dns import DNSService
from .html_fetcher import HTMLFetcherService
from .relme import RelMeDiscoveryService
from .email import EmailService
from .storage import CodeStorage

class DomainVerificationService:
    """
    Two-factor domain verification service.

    Verifies domain ownership through:
    1. DNS TXT record verification
    2. Email verification via rel="me" discovery
    """

    def __init__(
        self,
        dns_service: DNSService,
        html_fetcher: HTMLFetcherService,
        relme_discovery: RelMeDiscoveryService,
        email_service: EmailService,
        code_storage: CodeStorage
    ):
        self.dns = dns_service
        self.html_fetcher = html_fetcher
        self.relme = relme_discovery
        self.email = email_service
        self.code_storage = code_storage

    def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]:
        """
        Start domain verification process.

        Returns: (success, discovered_email, error_message)

        Raises HTTPException with appropriate error if verification cannot start.
        """
        # Step 1: Verify DNS TXT record
        dns_verified = self.dns.verify_txt_record(domain, "verified")
        if not dns_verified:
            error = f"DNS TXT record not found for {domain}. Please add: _gondulf.{domain} TXT verified"
            return False, None, error

        # Step 2: Fetch site and discover email
        html = self.html_fetcher.fetch_site(domain)
        if html is None:
            error = f"Could not fetch site at https://{domain}. Please ensure site is accessible via HTTPS."
            return False, None, error

        # Step 3: Discover email from rel="me"
        email = self.relme.discover_email(html)
        if email is None:
            error = 'No rel="me" mailto: link found. Please add: <link rel="me" href="mailto:you@example.com">'
            return False, None, error

        # Step 4: Generate and send verification code
        code = self._generate_code()
        self.code_storage.store(email, code, ttl=900)  # 15 minutes

        email_sent = self.email.send_verification_email(email, code)
        if not email_sent:
            error = f"Failed to send verification email to {email}. Please try again."
            return False, email, error

        # Success: code sent to discovered email
        return True, email, None

    def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, str]:
        """
        Verify submitted code.

        Returns: (success, domain_or_error_message)
        """
        stored_data = self.code_storage.get(email)

        if stored_data is None:
            return False, "No verification code found. Please restart verification."

        code, domain = stored_data

        # Verify code (constant-time comparison)
        if not secrets.compare_digest(submitted_code, code):
            return False, "Invalid code. Please try again."

        # Success: mark code as used
        self.code_storage.delete(email)

        return True, domain

    def _generate_code(self) -> str:
        """Generate 6-digit verification code."""
        return ''.join(secrets.choice('0123456789') for _ in range(6))

Dependencies:

  • All Phase 1 services (DNS, Email, Storage)
  • New HTML fetcher service
  • New rel="me" discovery service

Tests Required:

  • Full verification flow (DNS → rel="me" → email → code)
  • DNS verification failure
  • Site fetch failure
  • rel="me" discovery failure
  • Email send failure
  • Code verification success/failure
  • Multiple attempts tracking
  • Code expiration

4. Domain Verification UI Endpoints (NEW)

Purpose: HTTP endpoints for user interaction

Implementation:

# src/gondulf/routers/verification.py

from fastapi import APIRouter, HTTPException
from pydantic import BaseModel

router = APIRouter(prefix="/verify", tags=["verification"])

class VerificationStartRequest(BaseModel):
    domain: str

class VerificationStartResponse(BaseModel):
    success: bool
    email_masked: Optional[str]  # e.g., "u***@example.com"
    error: Optional[str]

class VerificationCodeRequest(BaseModel):
    email: str
    code: str

class VerificationCodeResponse(BaseModel):
    success: bool
    domain: Optional[str]
    error: Optional[str]

@router.post("/start", response_model=VerificationStartResponse)
async def start_verification(request: VerificationStartRequest):
    """
    Start domain verification process.

    Steps:
    1. Verify DNS TXT record
    2. Discover email from rel="me"
    3. Send verification code to email
    """
    success, email, error = domain_verification_service.start_verification(request.domain)

    if not success:
        return VerificationStartResponse(success=False, email_masked=None, error=error)

    # Mask email for display: u***@example.com
    masked_email = f"{email[0]}***@{email.split('@')[1]}"

    return VerificationStartResponse(
        success=True,
        email_masked=masked_email,
        error=None
    )

@router.post("/code", response_model=VerificationCodeResponse)
async def verify_code(request: VerificationCodeRequest):
    """
    Verify submitted code.

    Returns domain if code is valid.
    """
    success, result = domain_verification_service.verify_code(request.email, request.code)

    if not success:
        return VerificationCodeResponse(success=False, domain=None, error=result)

    return VerificationCodeResponse(success=True, domain=result, error=None)

Dependencies:

  • FastAPI router
  • Pydantic models
  • Domain verification service

Tests Required:

  • POST /verify/start success case
  • POST /verify/start with DNS failure
  • POST /verify/start with rel="me" failure
  • POST /verify/start with email send failure
  • POST /verify/code success case
  • POST /verify/code with invalid code
  • POST /verify/code with expired code
  • POST /verify/code with missing code

5. Authorization Endpoint Integration (UPDATED)

Changes to Authorization Flow:

Before (original design):

1. User enters domain (me parameter)
2. Display form: "Enter your email at {domain}"
3. User enters email manually
4. Send code, user enters code
5. Display consent screen

After (updated design):

1. User enters domain (me parameter)
2. Server performs two-factor verification:
   a. Verify DNS TXT record
   b. Discover email from rel="me"
   c. Send code to discovered email
3. Display code entry form (show discovered email masked)
4. User enters code
5. Display consent screen

Implementation Changes:

  • Call DomainVerificationService.start_verification() instead of requesting email from user
  • Update UI to show "Sending code to u***@example.com" instead of email input form
  • Handle new error cases (DNS not found, rel="me" not found, site unreachable)

Phase 2 Feature Breakdown

New Dependencies to Add

pyproject.toml additions:

[project]
dependencies = [
    # ... existing dependencies
    "beautifulsoup4>=4.12.0",  # HTML parsing for rel="me" discovery
]

New Source Files

  1. src/gondulf/html_fetcher.py - HTML fetching service
  2. src/gondulf/relme.py - rel="me" email discovery service
  3. src/gondulf/domain_verification.py - Two-factor verification orchestration
  4. src/gondulf/routers/verification.py - Verification endpoints (if implemented separately from authorization)

Updated Files

  1. src/gondulf/main.py - Register new routers, initialize new services
  2. src/gondulf/config.py - Optional: add HTML fetch timeout config
  3. Database migration (002_update_verification_method.sql) - Change domain.verification_method values

New Test Files

  1. tests/unit/test_html_fetcher.py - HTML fetching tests
  2. tests/unit/test_relme.py - rel="me" discovery tests
  3. tests/unit/test_domain_verification.py - Verification service tests
  4. tests/integration/test_verification_endpoints.py - Verification endpoint tests

Estimated Effort

New Components:

  • HTML Fetcher Service: 0.5 days
  • rel="me" Discovery Service: 0.5 days
  • Domain Verification Service: 1 day
  • Verification Endpoints: 0.5 days
  • Tests (all new components): 1 day

Total New Work: ~3.5 days

Authorization Endpoint (already planned):

  • Original estimate: 3-5 days
  • Updated estimate: 3-5 days (same - just uses DomainVerificationService)

Database Schema Updates

Migration: 002_update_verification_method.sql

-- Update verification_method values from single-factor to two-factor
-- This is a data migration, not schema change

UPDATE domains
SET verification_method = 'two_factor'
WHERE verification_method IN ('txt_record', 'email');

-- No schema changes needed - 'verification_method' column already exists

When to Apply: Phase 2, before authorization endpoint implementation

Error Message Updates

DNS TXT Not Found

DNS Verification Failed

Please add this TXT record to your domain's DNS:

Type: TXT
Name: _gondulf.example.com
Value: verified

DNS changes may take up to 24 hours to propagate.

Need help? See: https://docs.gondulf.example.com/setup/dns

rel="me" Not Found

Email Discovery Failed

Could not find a rel="me" email link on your homepage.

Please add this to your homepage (https://example.com):
<link rel="me" href="mailto:your-email@example.com">

This declares your email address for IndieAuth verification.

Learn more: https://indieweb.org/rel-me

Site Unreachable

Site Fetch Failed

Could not fetch your site at https://example.com

Please check:
- Site is accessible via HTTPS
- SSL certificate is valid
- No firewall blocking requests

Try again once your site is accessible.

Email Send Failure

Email Delivery Failed

Failed to send verification code to u***@example.com

Please check:
- Email address is correct in your rel="me" link
- Email server is accepting mail
- Check spam/junk folder

Try again, or contact support if the issue persists.

Documentation Updates Needed

User Documentation (Phase 2)

  1. Setup Guide: /docs/user/setup.md

    • Step 1: Add DNS TXT record
    • Step 2: Add rel="me" link to homepage
    • Step 3: Test verification
  2. Troubleshooting: /docs/user/troubleshooting.md

    • DNS verification failures
    • rel="me" discovery issues
    • Email delivery problems
  3. Examples: /docs/user/examples.md

    • Example HTML with rel="me" link
    • Example DNS configuration (various providers)

Developer Documentation (Phase 2)

  1. API Reference: /docs/api/verification.md

    • POST /verify/start endpoint
    • POST /verify/code endpoint
    • Error codes and responses
  2. Architecture: /docs/architecture/domain-verification.md

    • Two-factor verification flow diagram
    • Service interaction diagram
    • Error handling flowchart

Security Considerations for Phase 2

New Attack Surfaces

  1. HTML Parsing:

    • Risk: Malicious HTML exploiting parser
    • Mitigation: BeautifulSoup handles untrusted HTML safely
    • Test: Fuzzing with malformed HTML
  2. HTTPS Fetching:

    • Risk: SSL verification bypass
    • Mitigation: Enforce verify=True in requests
    • Test: Attempt to fetch site with invalid certificate (must fail)
  3. rel="me" Spoofing:

    • Risk: Attacker adds rel="me" to compromised site
    • Mitigation: Two-factor requirement (also need DNS control)
    • Test: Verify DNS check happens BEFORE rel="me" discovery

Security Testing Required

  1. Input Validation:

    • Malformed domain names
    • Oversized HTML responses (>5MB)
    • Invalid email formats in rel="me" links
  2. TLS Enforcement:

    • Verify HTTPS-only fetching
    • Verify SSL certificate validation
    • Reject sites with invalid certificates
  3. Rate Limiting (future):

    • Prevent bulk rel="me" discovery
    • Limit verification attempts per domain

Configuration Updates

Optional New Config

# src/gondulf/config.py

class Config:
    # ... existing config

    # HTML Fetching (optional, has sensible defaults)
    HTML_FETCH_TIMEOUT: int = 10  # seconds
    HTML_MAX_SIZE: int = 5 * 1024 * 1024  # 5MB
    HTML_MAX_REDIRECTS: int = 5

Environment Variables

# .env.example additions (optional)

# HTML Fetching Configuration (optional - has defaults)
GONDULF_HTML_FETCH_TIMEOUT=10  # Timeout for fetching user's site (seconds)
GONDULF_HTML_MAX_SIZE=5242880  # Maximum HTML size (bytes, default 5MB)
GONDULF_HTML_MAX_REDIRECTS=5   # Maximum redirects to follow

Testing Strategy for Phase 2

Unit Tests

HTML Fetcher:

  • Mock successful HTTPS response
  • Mock SSL verification failure
  • Mock timeout
  • Mock HTTP errors (404, 500, etc.)
  • Mock size limit exceeded
  • Mock redirect following

rel="me" Discovery:

  • Parse <link rel="me" href="mailto:...">
  • Parse <a rel="me" href="mailto:...">
  • Handle malformed HTML
  • Handle missing rel="me" links
  • Handle invalid email in link
  • Handle multiple rel="me" links (select first)

Domain Verification Service:

  • Full two-factor flow success
  • DNS verification failure
  • Site fetch failure
  • rel="me" discovery failure
  • Email send failure
  • Code verification success/failure

Integration Tests

Verification Endpoints:

  • POST /verify/start with valid domain (mock services)
  • POST /verify/start with DNS failure
  • POST /verify/start with rel="me" failure
  • POST /verify/code with valid code
  • POST /verify/code with invalid code

End-to-End Tests (Future)

  • Complete verification flow with real HTML
  • Authorization flow integration
  • Token issuance after successful verification

Acceptance Criteria for Phase 2

Phase 2 will be considered complete when:

  1. HTML fetcher service implemented and tested
  2. rel="me" discovery service implemented and tested
  3. Domain verification service orchestrates two-factor verification
  4. Verification endpoints return correct responses for all cases
  5. Error messages are clear and actionable
  6. All new tests passing (unit + integration)
  7. Test coverage remains >80% overall
  8. Security testing complete (HTML parsing, TLS enforcement)
  9. Documentation updated (user setup guide, API reference)
  10. Database migration applied successfully

Timeline Estimate

Phase 2 Components:

  • HTML Fetcher: 0.5 days
  • rel="me" Discovery: 0.5 days
  • Domain Verification Service: 1 day
  • Verification Endpoints: 0.5 days
  • Testing: 1 day
  • Documentation: 0.5 days

Total New Work: ~4 days

Authorization Endpoint (already planned):

  • Original estimate: 3-5 days
  • Updated estimate: 3-5 days (uses DomainVerificationService)

Phase 2 Total: ~7-9 days (vs. original estimate of 3-5 days)

Impact: +4 days of work due to authentication flow change

Recommendation

Phase 1: APPROVED as-is. No changes needed.

Phase 2: Proceed with implementation of:

  1. HTML fetching service
  2. rel="me" discovery service
  3. Domain verification service (two-factor orchestration)
  4. Verification endpoints
  5. Updated authorization endpoint to use domain verification service

The additional work (HTML fetching + rel="me" discovery) adds ~4 days to Phase 2, bringing total Phase 2 estimate to 7-9 days instead of original 3-5 days.

Sign-off

Assessment Status: Complete Phase 1 Impact: None - Phase 1 approved as-is Phase 2 Impact: Additional 4 days of work for new services Risk Level: Low - All new work is well-scoped and testable Ready to Proceed: Yes


Assessment completed: 2025-11-20 Architect: Claude (Architect Agent)