Add comprehensive Phase 2 documentation: - Complete design document for two-factor domain verification - Implementation guide with code examples - ADR for implementation decisions (ADR-0004) - ADR for rel="me" email discovery (ADR-008) - Phase 1 impact assessment - All 23 clarification questions answered - Updated architecture docs (indieauth-protocol, security) - Updated ADR-005 with rel="me" approach - Updated backlog with technical debt items Design ready for Phase 2 implementation. Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
91 KiB
Phase 2 Design: Domain Verification & Authorization Endpoint
Date: 2025-11-20 Architect: Claude (Architect Agent) Status: Ready for Implementation Design Version: 1.0
Overview
What Phase 2 Builds
Phase 2 implements the complete two-factor domain verification flow and the IndieAuth authorization endpoint, building on Phase 1's foundational services.
Core Functionality:
- HTML fetching service to retrieve user's homepage
- rel="me" email discovery service to parse HTML for email links
- Domain verification service to orchestrate two-factor verification (DNS TXT + Email)
- HTTP endpoints for verification flow
- Authorization endpoint to start IndieAuth authentication flow
Connection to IndieAuth Protocol: Phase 2 implements steps 1-7 of the IndieAuth authorization flow (see /docs/architecture/indieauth-protocol.md lines 165-174), completing the domain verification and authorization code generation.
Connection to Phase 1: Phase 2 uses all Phase 1 services:
- Configuration (SMTP, DNS, database settings)
- Database (to store verified domains)
- In-memory storage (for authorization codes)
- Email service (to send verification codes)
- DNS service (to verify TXT records)
- Logging (structured logging throughout)
Authentication Security Model
Per ADR-005 and ADR-008, Phase 2 implements two-factor domain verification:
Factor 1: DNS TXT Record (proves DNS control)
- Required:
_gondulf.{domain}TXT record =verified - Verified via Phase 1 DNS service
- Consensus from multiple resolvers
Factor 2: Email Verification via rel="me" (proves email control)
- Discover email from
<link rel="me" href="mailto:...">on user's site - Send 6-digit code to discovered email
- User enters code to complete verification
Combined Security: Attacker must compromise BOTH DNS and email to authenticate fraudulently.
Components
1. HTML Fetching Service
File: src/gondulf/html_fetcher.py
Purpose: Fetch user's homepage over HTTPS to discover rel="me" links.
Public Interface:
from typing import Optional
import requests
class HTMLFetcherService:
"""
Fetch user's homepage over HTTPS with security safeguards.
"""
def __init__(
self,
timeout: int = 10,
max_redirects: int = 5,
max_size: int = 5 * 1024 * 1024 # 5MB
):
"""
Initialize HTML fetcher service.
Args:
timeout: HTTP request timeout in seconds (default: 10)
max_redirects: Maximum redirects to follow (default: 5)
max_size: Maximum response size in bytes (default: 5MB)
"""
self.timeout = timeout
self.max_redirects = max_redirects
self.max_size = max_size
def fetch_site(self, domain: str) -> Optional[str]:
"""
Fetch site HTML content over HTTPS.
Args:
domain: Domain to fetch (e.g., "example.com")
Returns:
HTML content as string, or None if fetch fails
Raises:
No exceptions raised - all errors logged and None returned
"""
Implementation Details:
def fetch_site(self, domain: str) -> Optional[str]:
"""Fetch site HTML content over HTTPS."""
url = f"https://{domain}"
try:
# Fetch with security limits
response = requests.get(
url,
timeout=self.timeout,
allow_redirects=True,
max_redirects=self.max_redirects,
verify=True, # SECURITY: Enforce SSL certificate verification
headers={
'User-Agent': 'Gondulf/1.0.0 IndieAuth (+https://github.com/yourusername/gondulf)'
}
)
response.raise_for_status()
# SECURITY: Check response size to prevent memory exhaustion
content_length = int(response.headers.get('Content-Length', 0))
if content_length > self.max_size:
logger.warning(f"Response too large for {domain}: {content_length} bytes")
return None
# Check actual content size (Content-Length may be absent)
if len(response.content) > self.max_size:
logger.warning(f"Response content too large for {domain}: {len(response.content)} bytes")
return None
logger.info(f"Successfully fetched {domain}: {len(response.content)} bytes")
return response.text
except requests.exceptions.SSLError as e:
logger.error(f"SSL verification failed for {domain}: {e}")
return None
except requests.exceptions.Timeout:
logger.error(f"Timeout fetching {domain} after {self.timeout}s")
return None
except requests.exceptions.TooManyRedirects:
logger.error(f"Too many redirects for {domain}")
return None
except requests.exceptions.HTTPError as e:
logger.error(f"HTTP error fetching {domain}: {e}")
return None
except Exception as e:
logger.error(f"Unexpected error fetching {domain}: {e}")
return None
Dependencies:
requestslibrary (already in pyproject.toml)- Python standard library: typing
- Phase 1 logging configuration
Error Handling:
- SSL verification failure: Log error, return None (security: reject invalid certificates)
- Timeout: Log error, return None (configurable timeout via init)
- HTTP errors (404, 500, etc.): Log error with status code, return None
- Size limit exceeded: Log warning, return None (prevent DoS)
- Too many redirects: Log error, return None (prevent redirect loops)
- Generic exceptions: Log error, return None (fail-safe)
Security Considerations:
- HTTPS only (hardcoded in URL)
- SSL certificate verification enforced (verify=True, cannot be disabled)
- Response size limit (5MB default, configurable)
- Timeout to prevent hanging (10s default, configurable)
- Redirect limit (5 max, configurable)
- User-Agent header identifies Gondulf for server logs
Testing Requirements:
- ✅ Successful HTTPS fetch returns HTML content
- ✅ SSL verification failure returns None
- ✅ Timeout returns None
- ✅ HTTP error codes (404, 500) return None
- ✅ Redirects followed (up to max_redirects)
- ✅ Too many redirects returns None
- ✅ Content-Length exceeds max_size returns None
- ✅ Actual content exceeds max_size returns None
- ✅ Custom User-Agent sent in request
2. rel="me" Email Discovery Service
File: src/gondulf/relme.py
Purpose: Parse HTML to discover email addresses from rel="me" links following IndieWeb standards.
Public Interface:
from typing import Optional
from bs4 import BeautifulSoup
import re
class RelMeDiscoveryService:
"""
Discover email addresses from rel="me" links in HTML.
Follows IndieWeb rel="me" standard: https://indieweb.org/rel-me
"""
def discover_email(self, html_content: str) -> Optional[str]:
"""
Parse HTML and discover email from rel="me" link.
Args:
html_content: HTML content as string
Returns:
Email address or None if not found
Raises:
No exceptions raised - all errors logged and None returned
"""
def validate_email_format(self, email: str) -> bool:
"""
Validate email address format (RFC 5322 simplified).
Args:
email: Email address to validate
Returns:
True if valid format, False otherwise
"""
Implementation Details:
def discover_email(self, html_content: str) -> Optional[str]:
"""Parse HTML and discover email from rel='me' link."""
try:
# Parse HTML (BeautifulSoup handles malformed HTML gracefully)
soup = BeautifulSoup(html_content, 'html.parser')
# Find all rel="me" links - both <link> and <a> tags
# Case-insensitive matching via BeautifulSoup
me_links = soup.find_all('link', rel='me') + soup.find_all('a', rel='me')
# Look for mailto: links
for link in me_links:
href = link.get('href', '')
if href.startswith('mailto:'):
# Extract email from mailto: URL
email = href.replace('mailto:', '').strip()
# Remove query parameters if present (e.g., mailto:user@example.com?subject=Hello)
if '?' in email:
email = email.split('?')[0]
# Validate email format
if self.validate_email_format(email):
logger.info(f"Discovered email via rel='me': {email[:3]}***@{email.split('@')[1]}")
return email
else:
logger.warning(f"Found rel='me' mailto link with invalid email format: {email}")
logger.warning("No rel='me' mailto: link found in HTML")
return None
except Exception as e:
logger.error(f"Failed to parse HTML for rel='me' links: {e}")
return None
def validate_email_format(self, email: str) -> bool:
"""Validate email address format (RFC 5322 simplified)."""
# Basic format validation
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, email):
return False
# Length check (RFC 5321 maximum)
if len(email) > 254:
return False
# Must have exactly one @
if email.count('@') != 1:
return False
# Domain must have at least one dot
local, domain = email.split('@')
if '.' not in domain:
return False
return True
Dependencies:
beautifulsoup4>=4.12.0(NEW - add to pyproject.toml)html.parser(Python standard library, used by BeautifulSoup)re(Python standard library)- Phase 1 logging configuration
Error Handling:
- Malformed HTML: BeautifulSoup handles gracefully, continues parsing
- Missing rel="me" links: Log warning, return None
- Invalid email format in link: Log warning, skip link, continue searching
- Multiple rel="me" mailto links: Return first valid one
- Empty href attribute: Skip link, continue searching
- Exception during parsing: Log error, return None
Security Considerations:
- No script execution: BeautifulSoup only extracts attributes, never executes JavaScript
- Email validation: Strict format checking prevents injection
- Link extraction only: No rendering or evaluation of HTML
- Partial masking in logs: Only log first 3 chars of email (privacy)
Testing Requirements:
- ✅ Discovery from
<link rel="me" href="mailto:...">tag - ✅ Discovery from
<a rel="me" href="mailto:...">tag - ✅ Multiple rel="me" links: select first mailto
- ✅ Malformed HTML handled gracefully
- ✅ Missing rel="me" links returns None
- ✅ Invalid email format in link returns None (but logs warning)
- ✅ Empty href returns None
- ✅ Non-mailto rel="me" links ignored (e.g., https:// links)
- ✅ mailto with query parameters (e.g., ?subject=Hi) strips params
- ✅ Email validation: valid formats accepted
- ✅ Email validation: invalid formats rejected (no @, no domain, too long, etc.)
3. Domain Verification Service
File: src/gondulf/domain_verification.py
Purpose: Orchestrate two-factor domain verification (DNS TXT + Email via rel="me").
Public Interface:
from typing import Tuple, Optional
from .dns import DNSService
from .html_fetcher import HTMLFetcherService
from .relme import RelMeDiscoveryService
from .email import EmailService
from .storage import CodeStorage
from .database.connection import DatabaseConnection
import secrets
class DomainVerificationService:
"""
Two-factor domain verification service.
Verifies domain ownership through:
1. DNS TXT record verification (_gondulf.{domain} = verified)
2. Email verification via rel="me" discovery
"""
def __init__(
self,
dns_service: DNSService,
html_fetcher: HTMLFetcherService,
relme_discovery: RelMeDiscoveryService,
email_service: EmailService,
code_storage: CodeStorage,
database: DatabaseConnection,
code_ttl: int = 900 # 15 minutes
):
"""
Initialize domain verification service.
Args:
dns_service: DNS service for TXT record verification
html_fetcher: HTML fetcher service
relme_discovery: rel="me" email discovery service
email_service: Email service for sending codes
code_storage: In-memory storage for verification codes
database: Database connection for storing verified domains
code_ttl: Verification code TTL in seconds (default: 900 = 15 min)
"""
def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]:
"""
Start domain verification process.
Steps:
1. Verify DNS TXT record exists
2. Fetch user's homepage
3. Discover email from rel="me" link
4. Generate and send verification code
Args:
domain: Domain to verify (e.g., "example.com")
Returns:
Tuple of (success, discovered_email_masked, error_message)
- success: True if code sent, False if verification cannot start
- discovered_email_masked: Email with partial masking (e.g., "u***@example.com")
- error_message: Error description if success=False, None otherwise
"""
def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, Optional[str], Optional[str]]:
"""
Verify submitted code.
Args:
email: Email address (discovered from rel="me")
submitted_code: 6-digit code entered by user
Returns:
Tuple of (success, domain, error_message)
- success: True if code valid, False otherwise
- domain: User's verified domain if success=True
- error_message: Error description if success=False
"""
def is_domain_verified(self, domain: str) -> bool:
"""
Check if domain is already verified (cached in database).
Args:
domain: Domain to check
Returns:
True if domain previously verified, False otherwise
"""
Implementation Details:
def start_verification(self, domain: str) -> Tuple[bool, Optional[str], Optional[str]]:
"""Start domain verification process."""
logger.info(f"Starting domain verification: {domain}")
# Step 1: Verify DNS TXT record (first factor)
logger.debug(f"Verifying DNS TXT record for {domain}")
dns_verified = self.dns_service.verify_txt_record(domain, "verified")
if not dns_verified:
error = (
f"DNS verification failed. TXT record not found for _gondulf.{domain}. "
f"Please add: Type=TXT, Name=_gondulf.{domain}, Value=verified"
)
logger.warning(f"DNS verification failed: {domain}")
return False, None, error
logger.info(f"DNS TXT record verified: {domain}")
# Step 2: Fetch site homepage
logger.debug(f"Fetching homepage for {domain}")
html = self.html_fetcher.fetch_site(domain)
if html is None:
error = (
f"Could not fetch site at https://{domain}. "
f"Please ensure site is accessible via HTTPS with valid SSL certificate."
)
logger.warning(f"Site fetch failed: {domain}")
return False, None, error
logger.info(f"Successfully fetched homepage: {domain}")
# Step 3: Discover email from rel="me" (second factor discovery)
logger.debug(f"Discovering email via rel='me' for {domain}")
email = self.relme_discovery.discover_email(html)
if email is None:
error = (
'No rel="me" mailto: link found on homepage. '
f'Please add to https://{domain}: '
'<link rel="me" href="mailto:your-email@example.com">'
)
logger.warning(f"rel='me' discovery failed: {domain}")
return False, None, error
logger.info(f"Email discovered via rel='me' for {domain}: {email[:3]}***")
# Step 4: Check rate limiting
if self._is_rate_limited(domain):
error = (
f"Rate limit exceeded for {domain}. "
f"Please wait before requesting another verification code."
)
logger.warning(f"Rate limit exceeded: {domain}")
return False, email, error
# Step 5: Generate verification code
code = self._generate_code()
# Step 6: Store code with metadata
self.code_storage.store(email, code, ttl=self.code_ttl)
# Store metadata for rate limiting and domain association
self._store_code_metadata(email, domain)
logger.debug(f"Verification code generated and stored for {email[:3]}***")
# Step 7: Send verification email (second factor verification)
logger.debug(f"Sending verification email to {email[:3]}***")
email_sent = self.email_service.send_verification_email(email, code)
if not email_sent:
# Clean up stored code if email fails
self.code_storage.delete(email)
error = (
f"Failed to send verification code to {email}. "
f"Please check email address in rel='me' link and try again."
)
logger.error(f"Email send failed: {email[:3]}***")
return False, email, error
logger.info(f"Verification code sent successfully to {email[:3]}***")
# Mask email for display: u***@example.com
email_masked = self._mask_email(email)
return True, email_masked, None
def verify_code(self, email: str, submitted_code: str) -> Tuple[bool, Optional[str], Optional[str]]:
"""Verify submitted code."""
logger.info(f"Verifying code for {email[:3]}***")
# Retrieve stored code
stored_code = self.code_storage.get(email)
if stored_code is None:
logger.warning(f"No verification code found for {email[:3]}***")
return False, None, "No verification code found. Please request a new code."
# Get code metadata
metadata = self._get_code_metadata(email)
if metadata is None:
logger.error(f"Code found but metadata missing for {email[:3]}***")
return False, None, "Verification error. Please request a new code."
domain = metadata['domain']
attempts = metadata.get('attempts', 0)
# Check attempt limit (prevent brute force)
if attempts >= 3:
logger.warning(f"Too many attempts for {email[:3]}***")
self.code_storage.delete(email)
self._delete_code_metadata(email)
return False, None, "Too many attempts. Please request a new code."
# Increment attempt counter
self._increment_attempts(email)
# Verify code using constant-time comparison (SECURITY: prevent timing attacks)
if not secrets.compare_digest(submitted_code, stored_code):
logger.warning(f"Invalid code submitted for {email[:3]}***")
return False, None, f"Invalid code. {3 - attempts - 1} attempts remaining."
# Code is valid - clean up and mark domain as verified
logger.info(f"Code verified successfully for {domain}")
self.code_storage.delete(email)
self._delete_code_metadata(email)
# Store verified domain in database
self._store_verified_domain(domain)
return True, domain, None
def is_domain_verified(self, domain: str) -> bool:
"""Check if domain already verified."""
with self.database.get_connection() as conn:
result = conn.execute(
"SELECT verified FROM domains WHERE domain = ?",
(domain,)
).fetchone()
if result and result['verified']:
logger.debug(f"Domain already verified: {domain}")
return True
return False
def _generate_code(self) -> str:
"""Generate 6-digit verification code."""
return ''.join(secrets.choice('0123456789') for _ in range(6))
def _mask_email(self, email: str) -> str:
"""Mask email for display: u***@example.com"""
local, domain = email.split('@')
if len(local) <= 1:
return f"{local[0]}***@{domain}"
return f"{local[0]}***@{domain}"
def _is_rate_limited(self, domain: str) -> bool:
"""
Check if domain is rate limited.
Rate limit: Max 3 codes per domain per hour.
"""
# TODO: Implement rate limiting using code metadata
# For Phase 2, we'll implement simple in-memory tracking
# Future: Use Redis for distributed rate limiting
return False # Placeholder - implement in actual code
def _store_code_metadata(self, email: str, domain: str) -> None:
"""Store code metadata for rate limiting and domain association."""
# TODO: Implement metadata storage
# Store: email -> {domain, created_at, attempts}
pass
def _get_code_metadata(self, email: str) -> Optional[dict]:
"""Retrieve code metadata."""
# TODO: Implement metadata retrieval
# Return: {domain, created_at, attempts}
return {'domain': 'example.com', 'attempts': 0} # Placeholder
def _delete_code_metadata(self, email: str) -> None:
"""Delete code metadata."""
# TODO: Implement metadata deletion
pass
def _increment_attempts(self, email: str) -> None:
"""Increment attempt counter for email."""
# TODO: Implement attempt increment
pass
def _store_verified_domain(self, domain: str) -> None:
"""Store verified domain in database."""
from datetime import datetime
with self.database.get_connection() as conn:
conn.execute(
"""
INSERT OR REPLACE INTO domains (domain, verification_method, verified, verified_at, last_dns_check)
VALUES (?, ?, ?, ?, ?)
""",
(domain, 'two_factor', True, datetime.utcnow(), datetime.utcnow())
)
conn.commit()
logger.info(f"Domain verification stored in database: {domain}")
Dependencies:
- All Phase 1 services (DNS, Email, Storage, Database)
- HTML fetcher service (Phase 2)
- rel="me" discovery service (Phase 2)
- Python standard library: secrets, datetime
Error Handling:
- DNS verification failure: Return error with setup instructions
- Site fetch failure: Return error with troubleshooting steps
- rel="me" discovery failure: Return error with HTML example
- Email send failure: Return error, clean up stored code
- Code not found: Return error, suggest requesting new code
- Code expired: Handled by CodeStorage TTL
- Too many attempts: Return error, invalidate code
- Invalid code: Return error with remaining attempts
- Rate limit exceeded: Return error, suggest waiting
Security Considerations:
- Two-factor verification: Both DNS and email required
- Constant-time code comparison: Prevent timing attacks (secrets.compare_digest)
- Rate limiting: Max 3 codes per domain per hour (prevents abuse)
- Attempt limiting: Max 3 code submission attempts (prevents brute force)
- Single-use codes: Deleted after successful verification
- Email masking in logs: Only log partial email (privacy)
- No email storage: Email used only during verification, never persisted
Testing Requirements:
- ✅ Full verification flow: DNS → rel="me" → email → code verification
- ✅ DNS verification failure blocks flow
- ✅ Site fetch failure blocks flow
- ✅ rel="me" discovery failure blocks flow
- ✅ Email send failure cleans up stored code
- ✅ Code verification success stores domain in database
- ✅ Code verification failure decrements remaining attempts
- ✅ Too many attempts invalidates code
- ✅ Invalid code returns error with attempts remaining
- ✅ Code expiration handled by storage layer
- ✅ Rate limiting prevents excessive code requests
- ✅ Already verified domain check works
- ✅ Email masking works correctly
4. Domain Verification Endpoints
File: src/gondulf/routers/verification.py
Purpose: HTTP API endpoints for user interaction during verification flow.
Public Interface:
from fastapi import APIRouter, HTTPException, Depends
from pydantic import BaseModel, Field
from typing import Optional
router = APIRouter(prefix="/api/verify", tags=["verification"])
# Request/Response Models
class VerificationStartRequest(BaseModel):
"""Request to start domain verification."""
domain: str = Field(
...,
min_length=3,
max_length=253,
description="Domain to verify (e.g., 'example.com')"
)
class VerificationStartResponse(BaseModel):
"""Response from starting verification."""
success: bool
email_masked: Optional[str] = Field(None, description="Partially masked email (e.g., 'u***@example.com')")
error: Optional[str] = Field(None, description="Error message if success=False")
class VerificationCodeRequest(BaseModel):
"""Request to verify code."""
email: str = Field(..., description="Email address discovered from rel='me'")
code: str = Field(..., min_length=6, max_length=6, pattern="^[0-9]{6}$", description="6-digit verification code")
class VerificationCodeResponse(BaseModel):
"""Response from code verification."""
success: bool
domain: Optional[str] = Field(None, description="Verified domain if success=True")
error: Optional[str] = Field(None, description="Error message if success=False")
# Endpoints
@router.post("/start", response_model=VerificationStartResponse)
async def start_verification(
request: VerificationStartRequest,
domain_verification: DomainVerificationService = Depends(get_domain_verification_service)
) -> VerificationStartResponse:
"""
Start domain verification process.
Steps:
1. Verify DNS TXT record exists
2. Discover email from rel="me" link
3. Send verification code to discovered email
Returns masked email on success, error message on failure.
"""
@router.post("/code", response_model=VerificationCodeResponse)
async def verify_code(
request: VerificationCodeRequest,
domain_verification: DomainVerificationService = Depends(get_domain_verification_service)
) -> VerificationCodeResponse:
"""
Verify submitted code.
Returns verified domain on success, error message on failure.
"""
Implementation Details:
@router.post("/start", response_model=VerificationStartResponse)
async def start_verification(
request: VerificationStartRequest,
domain_verification: DomainVerificationService = Depends(get_domain_verification_service)
) -> VerificationStartResponse:
"""Start domain verification process."""
logger.info(f"Verification start request: {request.domain}")
# Normalize domain (lowercase, remove trailing slash)
domain = request.domain.lower().rstrip('/')
# Remove protocol if present
if domain.startswith('http://') or domain.startswith('https://'):
domain = domain.split('://', 1)[1]
# Remove path if present
if '/' in domain:
domain = domain.split('/')[0]
# Validate domain format (basic validation)
if not domain or '.' not in domain:
logger.warning(f"Invalid domain format: {request.domain}")
return VerificationStartResponse(
success=False,
email_masked=None,
error="Invalid domain format. Please provide a valid domain (e.g., 'example.com')."
)
# Start verification
success, email_masked, error = domain_verification.start_verification(domain)
if not success:
logger.warning(f"Verification start failed for {domain}: {error}")
return VerificationStartResponse(
success=False,
email_masked=email_masked,
error=error
)
logger.info(f"Verification started successfully for {domain}")
return VerificationStartResponse(
success=True,
email_masked=email_masked,
error=None
)
@router.post("/code", response_model=VerificationCodeResponse)
async def verify_code(
request: VerificationCodeRequest,
domain_verification: DomainVerificationService = Depends(get_domain_verification_service)
) -> VerificationCodeResponse:
"""Verify submitted code."""
logger.info(f"Code verification request for email: {request.email[:3]}***")
# Verify code
success, domain, error = domain_verification.verify_code(request.email, request.code)
if not success:
logger.warning(f"Code verification failed for {request.email[:3]}***: {error}")
return VerificationCodeResponse(
success=False,
domain=None,
error=error
)
logger.info(f"Code verified successfully for domain: {domain}")
return VerificationCodeResponse(
success=True,
domain=domain,
error=None
)
Dependencies:
- FastAPI router and dependency injection
- Pydantic models for request/response validation
- Domain verification service (injected via Depends)
- Phase 1 logging configuration
Error Handling:
- Invalid domain format: Return 200 with success=False, descriptive error
- Pydantic validation errors: Automatic 422 response with validation details
- Service errors: Propagated via success=False in response
- All errors logged at WARNING level
- No 500 errors expected (all errors handled gracefully)
Security Considerations:
- Input validation: Pydantic models enforce constraints
- Domain normalization: Prevent URL injection
- No authentication required: Public endpoints (verification is the authentication)
- Rate limiting: Handled by DomainVerificationService (not endpoint level)
- Email not validated at endpoint level: Service handles validation
Testing Requirements:
- ✅ POST /api/verify/start with valid domain returns success
- ✅ POST /api/verify/start with invalid domain format returns error
- ✅ POST /api/verify/start with DNS failure returns error
- ✅ POST /api/verify/start with rel="me" failure returns error
- ✅ POST /api/verify/start with email send failure returns error
- ✅ POST /api/verify/code with valid code returns domain
- ✅ POST /api/verify/code with invalid code returns error
- ✅ POST /api/verify/code with expired code returns error
- ✅ POST /api/verify/code with missing code returns error
- ✅ POST /api/verify/code with too many attempts returns error
- ✅ Pydantic validation errors return 422
5. Authorization Endpoint
File: src/gondulf/routers/authorization.py
Purpose: Implement IndieAuth authorization endpoint (/authorize) per W3C spec.
Public Interface:
from fastapi import APIRouter, Request, HTTPException, Depends
from fastapi.responses import RedirectResponse, HTMLResponse
from pydantic import BaseModel, HttpUrl, Field
from typing import Optional, Literal
router = APIRouter(tags=["indieauth"])
# Request Models
class AuthorizeRequest(BaseModel):
"""
IndieAuth authorization request parameters.
Per W3C IndieAuth specification (Section 5.1):
https://www.w3.org/TR/indieauth/#authorization-request
"""
me: HttpUrl = Field(..., description="User's profile URL (domain identity)")
client_id: HttpUrl = Field(..., description="Client application URL")
redirect_uri: HttpUrl = Field(..., description="Where to redirect after authorization")
state: str = Field(..., min_length=1, max_length=512, description="CSRF protection token")
response_type: Literal["code"] = Field(..., description="Must be 'code' for authorization code flow")
scope: Optional[str] = Field(None, description="Requested scopes (ignored in v1.0.0)")
code_challenge: Optional[str] = Field(None, description="PKCE challenge (not supported in v1.0.0)")
code_challenge_method: Optional[str] = Field(None, description="PKCE method (not supported in v1.0.0)")
# Endpoints
@router.get("/authorize")
async def authorize(
request: Request,
me: str,
client_id: str,
redirect_uri: str,
state: str,
response_type: str,
scope: Optional[str] = None,
code_challenge: Optional[str] = None,
code_challenge_method: Optional[str] = None,
domain_verification: DomainVerificationService = Depends(get_domain_verification_service)
) -> HTMLResponse:
"""
IndieAuth authorization endpoint.
Per W3C IndieAuth specification:
https://www.w3.org/TR/indieauth/#authorization-request
Flow:
1. Validate all parameters
2. Check if domain already verified (skip verification if cached)
3. If not verified, initiate two-factor verification flow
4. Display consent screen with client info
5. On approval, generate authorization code
6. Redirect to client with code + state
"""
Implementation Details (High-Level - Full implementation too long for this doc):
@router.get("/authorize")
async def authorize(
request: Request,
me: str,
client_id: str,
redirect_uri: str,
state: str,
response_type: str,
# ... other parameters
) -> HTMLResponse:
"""IndieAuth authorization endpoint."""
# STEP 1: Validate response_type
if response_type != "code":
# Return error (redirect if possible)
return _error_response(
redirect_uri=redirect_uri,
state=state,
error="unsupported_response_type",
description="Only response_type=code is supported"
)
# STEP 2: Validate and normalize 'me' parameter
me_normalized = _validate_and_normalize_me(me)
if me_normalized is None:
return _error_response(
redirect_uri=redirect_uri,
state=state,
error="invalid_request",
description="Invalid 'me' parameter format"
)
# STEP 3: Validate client_id
client_valid = _validate_client_id(client_id)
if not client_valid:
return _error_response(
redirect_uri=redirect_uri,
state=state,
error="invalid_client",
description="Invalid client_id"
)
# STEP 4: Validate redirect_uri
redirect_valid = _validate_redirect_uri(redirect_uri, client_id)
if not redirect_valid:
# SECURITY: Cannot redirect to invalid URI - display error page
return _error_page("Invalid redirect_uri")
# STEP 5: Check if domain already verified
domain = _extract_domain_from_me(me_normalized)
if domain_verification.is_domain_verified(domain):
# Skip verification, go directly to consent
logger.info(f"Domain already verified: {domain}")
return await _show_consent_screen(
me=me_normalized,
client_id=client_id,
redirect_uri=redirect_uri,
state=state
)
# STEP 6: Domain not verified - start verification flow
logger.info(f"Starting verification for new domain: {domain}")
success, email_masked, error = domain_verification.start_verification(domain)
if not success:
# Verification failed - show error with instructions
return _verification_error_page(domain, error)
# STEP 7: Show code entry form
return _code_entry_page(
domain=domain,
email_masked=email_masked,
me=me_normalized,
client_id=client_id,
redirect_uri=redirect_uri,
state=state
)
# Additional endpoints for verification flow
@router.post("/authorize/verify-code")
async def verify_code_and_consent(
request: Request,
email: str,
code: str,
me: str,
client_id: str,
redirect_uri: str,
state: str,
domain_verification: DomainVerificationService = Depends(get_domain_verification_service)
) -> HTMLResponse:
"""
Verify code and show consent screen.
Called when user submits verification code during authorization flow.
"""
# Verify code
success, domain, error = domain_verification.verify_code(email, code)
if not success:
# Code invalid - show error, allow retry
return _code_entry_page_with_error(
domain=_extract_domain_from_me(me),
email_masked=_mask_email(email),
error=error,
me=me,
client_id=client_id,
redirect_uri=redirect_uri,
state=state
)
# Code valid - show consent screen
return await _show_consent_screen(
me=me,
client_id=client_id,
redirect_uri=redirect_uri,
state=state
)
@router.post("/authorize/consent")
async def handle_consent(
request: Request,
action: Literal["approve", "deny"],
me: str,
client_id: str,
redirect_uri: str,
state: str,
code_storage: CodeStorage = Depends(get_code_storage)
) -> RedirectResponse:
"""
Handle user consent decision.
Called when user approves or denies authorization.
"""
if action == "deny":
# User denied - redirect with error
return RedirectResponse(
url=f"{redirect_uri}?error=access_denied&error_description=User denied authorization&state={state}",
status_code=302
)
# User approved - generate authorization code
auth_code = _generate_authorization_code()
# Store code in memory with metadata
code_storage.store(auth_code, {
'me': me,
'client_id': client_id,
'redirect_uri': redirect_uri,
'state': state,
'created_at': datetime.utcnow()
}, ttl=600) # 10 minutes
logger.info(f"Authorization code generated for {me} / {client_id}")
# Redirect to client with code + state
return RedirectResponse(
url=f"{redirect_uri}?code={auth_code}&state={state}",
status_code=302
)
# Helper functions (implementations not shown for brevity)
def _validate_and_normalize_me(me: str) -> Optional[str]:
"""Validate and normalize 'me' parameter per IndieAuth spec."""
pass
def _validate_client_id(client_id: str) -> bool:
"""Validate client_id is a valid URL."""
pass
def _validate_redirect_uri(redirect_uri: str, client_id: str) -> bool:
"""Validate redirect_uri against client_id."""
pass
def _extract_domain_from_me(me: str) -> str:
"""Extract domain from 'me' URL."""
pass
async def _show_consent_screen(...) -> HTMLResponse:
"""Render consent screen HTML."""
pass
def _code_entry_page(...) -> HTMLResponse:
"""Render code entry page HTML."""
pass
def _error_response(...) -> RedirectResponse:
"""Generate OAuth 2.0 error redirect."""
pass
def _generate_authorization_code() -> str:
"""Generate cryptographically secure authorization code."""
return secrets.token_urlsafe(32) # 256 bits
Dependencies:
- FastAPI router, Request, Response types
- Pydantic models for validation
- Domain verification service (Phase 2)
- Code storage (Phase 1)
- HTML templates (new - Jinja2)
- Python standard library: secrets, datetime
Error Handling:
- Invalid response_type: Redirect with
unsupported_response_typeerror - Invalid me parameter: Redirect with
invalid_requesterror - Invalid client_id: Redirect with
invalid_clienterror - Invalid redirect_uri: Display error page (cannot redirect)
- DNS verification failure: Display error page with setup instructions
- rel="me" discovery failure: Display error page with HTML example
- Email send failure: Display error page with troubleshooting
- Code verification failure: Display code entry page with error, allow retry
- User denies consent: Redirect with
access_deniederror - All errors follow OAuth 2.0 error response format
Security Considerations:
- HTTPS only: Enforced by middleware (production)
- redirect_uri validation: Prevent open redirect attacks
- State parameter: Passed through, client validates (CSRF protection)
- Authorization code: Cryptographically secure (256 bits)
- Code single-use: Enforced by token endpoint (Phase 3)
- Code expiration: 10 minutes TTL
- Domain verification: Two-factor required before code generation
- No client secrets: All clients are public per IndieAuth spec
Testing Requirements:
- ✅ GET /authorize with valid parameters shows verification or consent
- ✅ GET /authorize with invalid response_type returns error
- ✅ GET /authorize with invalid me parameter returns error
- ✅ GET /authorize with invalid client_id returns error
- ✅ GET /authorize with invalid redirect_uri shows error page
- ✅ GET /authorize with already verified domain skips to consent
- ✅ POST /authorize/verify-code with valid code shows consent
- ✅ POST /authorize/verify-code with invalid code shows error
- ✅ POST /authorize/consent with action=approve generates code and redirects
- ✅ POST /authorize/consent with action=deny redirects with access_denied
- ✅ Authorization code stored in memory with correct metadata
- ✅ Authorization code expires after 10 minutes
- ✅ State parameter passed through all steps
Data Flow
Complete Two-Factor Verification Flow
┌─────────────────────────────────────────────────────────────────┐
│ User / Client Application │
└───────────────────────────────┬─────────────────────────────────┘
│
│ GET /authorize?me=example.com&...
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Authorization Endpoint │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 1. Validate parameters (me, client_id, redirect_uri, │ │
│ │ state, response_type) │ │
│ └──────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼───────────────────────────────┐ │
│ │ 2. Check if domain already verified in database │ │
│ └──────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ │ │ │
│ │ Verified? │ │
│ │ │ │
│ ┌─────────┴─────No─────────┴─────────┐ │
│ │ │ │
│ │ YES │ NO │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ Skip to Consent │ │ Start Verification Flow │ │
│ │ (Step 9) │ │ (Step 3) │ │
│ └──────────────────┘ └─────────┬────────────────┘ │
│ │ │
└───────────────────────────────────────────────┼──────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Domain Verification Service (Two-Factor) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 3. Verify DNS TXT Record (First Factor) │ │
│ │ Query: _gondulf.example.com TXT │ │
│ │ Expected: "verified" │ │
│ └──────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ │ TXT found? │ │
│ ┌─────────┴─────No─────────┴─────────┐ │
│ │ YES │ NO │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ Continue to │ │ FAIL: Display error │ │
│ │ Step 4 │ │ "Add DNS TXT record" │ │
│ └─────────┬────────┘ └──────────────────────────┘ │
│ │ │
│ ┌─────────▼────────────────────────────────────────────────┐ │
│ │ 4. Fetch User's Homepage via HTTPS │ │
│ │ URL: https://example.com │ │
│ │ Timeout: 10s, Max size: 5MB, Verify SSL │ │
│ └──────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ │ Fetch success? │ │
│ ┌─────────┴─────No─────────┴─────────┐ │
│ │ YES │ NO │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ Continue to │ │ FAIL: Display error │ │
│ │ Step 5 │ │ "Site unreachable" │ │
│ └─────────┬────────┘ └──────────────────────────┘ │
│ │ │
│ ┌─────────▼────────────────────────────────────────────────┐ │
│ │ 5. Discover Email via rel="me" (Second Factor Discovery)│ │
│ │ Parse HTML for: <link rel="me" href="mailto:..."> │ │
│ │ Extract and validate email format │ │
│ └──────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ │ Email found? │ │
│ ┌─────────┴─────No─────────┴─────────┐ │
│ │ YES │ NO │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ Continue to │ │ FAIL: Display error │ │
│ │ Step 6 │ │ "Add rel='me' link" │ │
│ └─────────┬────────┘ └──────────────────────────┘ │
│ │ │
│ ┌─────────▼────────────────────────────────────────────────┐ │
│ │ 6. Generate and Send Verification Code │ │
│ │ (Second Factor Verification) │ │
│ │ - Generate 6-digit code (cryptographically secure) │ │
│ │ - Store code in memory (TTL: 15 minutes) │ │
│ │ - Send code to discovered email via SMTP │ │
│ └──────────────────────────┬───────────────────────────────┘ │
│ │ │
└─────────────────────────────┼────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Display Code Entry Form │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ "Verification code sent to u***@example.com" │ │
│ │ [Enter 6-digit code: ______] │ │
│ │ [Submit] │ │
│ └──────────────────────────┬───────────────────────────────┘ │
└─────────────────────────────┼────────────────────────────────────┘
│
│ POST /authorize/verify-code
│ {email, code, me, client_id, ...}
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Domain Verification Service (Continued) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 7. Verify Submitted Code │ │
│ │ - Retrieve stored code from memory │ │
│ │ - Check expiration (15 min TTL) │ │
│ │ - Check attempts (max 3) │ │
│ │ - Constant-time compare submitted vs stored │ │
│ └──────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌────────┴────────┐ │
│ │ Code valid? │ │
│ ┌─────────┴─────No─────────┴─────────┐ │
│ │ YES │ NO │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ Store verified │ │ Show error, allow retry │ │
│ │ domain in DB │ │ (if attempts remaining) │ │
│ └─────────┬────────┘ └──────────────────────────┘ │
│ │ │
│ ┌─────────▼────────────────────────────────────────────────┐ │
│ │ 8. Domain Verified (Two-Factor Complete) │ │
│ │ - DNS TXT verified ✓ │ │
│ │ - Email verified ✓ │ │
│ │ - Store in database: verification_method='two_factor' │ │
│ └──────────────────────────┬───────────────────────────────┘ │
└─────────────────────────────┼────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Display Consent Screen │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ "Sign in to [App Name] as example.com" │ │
│ │ │ │
│ │ Client: https://client.example.com │ │
│ │ Redirect: https://client.example.com/callback │ │
│ │ │ │
│ │ [Approve] [Deny] │ │
│ └──────────────────────────┬───────────────────────────────┘ │
└─────────────────────────────┼────────────────────────────────────┘
│
│ POST /authorize/consent
│ {action: "approve", ...}
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Authorization Endpoint (Continued) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 9. Generate Authorization Code │ │
│ │ - Generate cryptographically secure code (256 bits) │ │
│ │ - Store in memory with metadata: │ │
│ │ • me (user's domain) │ │
│ │ • client_id │ │
│ │ • redirect_uri │ │
│ │ • state │ │
│ │ • TTL: 10 minutes │ │
│ └──────────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼───────────────────────────────┐ │
│ │ 10. Redirect to Client with Code │ │
│ │ {redirect_uri}?code={code}&state={state} │ │
│ └──────────────────────────┬───────────────────────────────┘ │
└─────────────────────────────┼────────────────────────────────────┘
│
│ HTTP 302 Redirect
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Client Application │
│ • Receives authorization code │
│ • Validates state parameter (CSRF protection) │
│ • Exchanges code for token (Phase 3: Token Endpoint) │
└─────────────────────────────────────────────────────────────────┘
State Transitions
Domain Verification States:
- Unverified: Domain never seen before
- DNS Verified: TXT record confirmed
- Email Discovered: rel="me" link found
- Code Sent: Verification code sent to email
- Fully Verified: Code verified, stored in database
- Cached: Domain verification cached (skip steps 1-5 on future auth)
Authorization Flow States:
- Request Received: Parameters validated
- Domain Check: Checking if domain verified
- Verification In Progress: User entering code
- Consent Pending: User viewing consent screen
- Approved: User approved, code generated
- Denied: User denied, error redirect
- Complete: Redirected to client with code
Error Paths
DNS Verification Failure:
/authorize → Validate params → Check DNS TXT → [NOT FOUND]
→ Display error page with instructions
→ User adds TXT record, clicks "Retry"
→ Loop back to Check DNS TXT
rel="me" Discovery Failure:
/authorize → DNS verified → Fetch site → Discover email → [NOT FOUND]
→ Display error page with HTML example
→ User adds <link rel="me">, clicks "Retry"
→ Loop back to Fetch site
Email Send Failure:
/authorize → DNS + rel="me" OK → Send email → [SMTP ERROR]
→ Display error page with troubleshooting
→ User checks SMTP config, clicks "Retry"
→ Loop back to Send email
Invalid Code:
/authorize/verify-code → Verify code → [INVALID]
→ Display code entry form with error
→ "Invalid code. 2 attempts remaining."
→ User enters code again
→ Loop back to Verify code
Rate Limit Exceeded:
/authorize → Start verification → Check rate limit → [EXCEEDED]
→ Display error: "Too many attempts, wait 1 hour"
→ User waits, tries again later
API Endpoints
POST /api/verify/start
Purpose: Start domain verification process.
Request:
{
"domain": "example.com"
}
Success Response (200 OK):
{
"success": true,
"email_masked": "u***@example.com",
"error": null
}
Error Response (200 OK with success=false):
{
"success": false,
"email_masked": null,
"error": "DNS TXT record not found for _gondulf.example.com. Please add: Type=TXT, Name=_gondulf.example.com, Value=verified"
}
Validation Errors (422 Unprocessable Entity):
{
"detail": [
{
"loc": ["body", "domain"],
"msg": "field required",
"type": "value_error.missing"
}
]
}
Rate Limiting:
- Max 3 requests per domain per hour
- Enforced by DomainVerificationService
Authentication: None required (public endpoint)
POST /api/verify/code
Purpose: Verify submitted 6-digit code.
Request:
{
"email": "user@example.com",
"code": "123456"
}
Success Response (200 OK):
{
"success": true,
"domain": "example.com",
"error": null
}
Error Response (200 OK with success=false):
{
"success": false,
"domain": null,
"error": "Invalid code. 2 attempts remaining."
}
Validation Errors (422 Unprocessable Entity):
{
"detail": [
{
"loc": ["body", "code"],
"msg": "string does not match regex \"^[0-9]{6}$\"",
"type": "value_error.str.regex"
}
]
}
Rate Limiting:
- Max 3 attempts per email per code
- Enforced by code verification logic
Authentication: None required (code is the authentication)
GET /authorize
Purpose: IndieAuth authorization endpoint.
Query Parameters:
me(required): User's profile URL (e.g., "https://example.com")client_id(required): Client application URLredirect_uri(required): Where to redirect after authorizationstate(required): CSRF protection tokenresponse_type(required): Must be "code"scope(optional): Requested scopes (ignored in v1.0.0)code_challenge(optional): PKCE challenge (not supported in v1.0.0)code_challenge_method(optional): PKCE method (not supported in v1.0.0)
Success Response: HTML page (verification form or consent screen)
Error Redirect (302 Found):
{redirect_uri}?error=invalid_request&error_description=Invalid+me+parameter&state={state}
Error Codes (OAuth 2.0 standard):
invalid_request: Missing or invalid parameterunauthorized_client: Client not authorizedaccess_denied: User denied authorizationunsupported_response_type: response_type not "code"server_error: Internal server error
Error Page (when redirect not possible):
<!DOCTYPE html>
<html>
<head><title>Authorization Error</title></head>
<body>
<h1>Authorization Error</h1>
<p>Invalid redirect_uri. Cannot redirect safely.</p>
</body>
</html>
Rate Limiting: None at endpoint level (handled by verification service)
Authentication: None initially (domain verification IS the authentication)
POST /authorize/verify-code
Purpose: Verify code during authorization flow.
Form Data:
email(required): Email address from rel="me"code(required): 6-digit verification codeme(required): User's profile URLclient_id(required): Client application URLredirect_uri(required): Redirect URIstate(required): State parameter
Success Response: HTML page (consent screen)
Error Response: HTML page (code entry form with error message)
POST /authorize/consent
Purpose: Handle user consent decision.
Form Data:
action(required): "approve" or "deny"me(required): User's profile URLclient_id(required): Client application URLredirect_uri(required): Redirect URIstate(required): State parameter
Success Response (Approve) (302 Found):
{redirect_uri}?code={authorization_code}&state={state}
Success Response (Deny) (302 Found):
{redirect_uri}?error=access_denied&error_description=User+denied+authorization&state={state}
Data Models
Verified Domain (Database Table)
Table: domains
Schema (from Phase 1):
CREATE TABLE domains (
domain TEXT PRIMARY KEY,
verification_method TEXT NOT NULL, -- 'two_factor' for v1.0.0
verified BOOLEAN NOT NULL DEFAULT FALSE,
verified_at TIMESTAMP,
last_dns_check TIMESTAMP,
last_email_check TIMESTAMP
);
Updated in Phase 2: Change verification_method values from 'email' / 'txt_record' to 'two_factor'.
Migration: 002_update_verification_method.sql:
-- Update verification_method values to reflect two-factor requirement
UPDATE domains
SET verification_method = 'two_factor'
WHERE verification_method IN ('email', 'txt_record');
Indexes (from Phase 1):
CREATE INDEX idx_domains_domain ON domains(domain);
CREATE INDEX idx_domains_verified ON domains(verified);
Authorization Code (In-Memory)
Storage: Phase 1 CodeStorage with metadata
Structure:
{
"code": "abc123...", # 43-char base64url (32 bytes)
"me": "https://example.com",
"client_id": "https://client.example.com",
"redirect_uri": "https://client.example.com/callback",
"state": "client-provided-state",
"created_at": datetime,
"expires_at": datetime, # created_at + 10 minutes
"used": False # For Phase 3 token endpoint
}
TTL: 10 minutes (per W3C spec: "shortly after")
Storage Location: Phase 1 CodeStorage service
Verification Code Metadata (In-Memory)
Storage: Additional metadata alongside verification codes
Structure:
{
"email": "user@example.com",
"domain": "example.com",
"attempts": 0, # Increment on each failed attempt
"created_at": datetime
}
Purpose: Track attempts and associate email with domain for rate limiting.
TTL: Same as verification code (15 minutes)
Security Requirements
Input Validation
Domain Parameter:
def validate_domain(domain: str) -> Tuple[bool, Optional[str], Optional[str]]:
"""
Validate domain parameter.
Returns: (is_valid, normalized_domain, error_message)
"""
# Remove protocol if present
if domain.startswith('http://') or domain.startswith('https://'):
domain = domain.split('://', 1)[1]
# Remove path if present
if '/' in domain:
domain = domain.split('/')[0]
# Lowercase
domain = domain.lower().strip()
# Must contain at least one dot
if '.' not in domain:
return False, None, "Domain must contain at least one dot (e.g., example.com)"
# Must not be empty
if not domain:
return False, None, "Domain cannot be empty"
# Must not contain invalid characters
if any(c in domain for c in [' ', '@', ':', '?', '#']):
return False, None, "Domain contains invalid characters"
# Length check
if len(domain) > 253:
return False, None, "Domain too long (max 253 characters)"
return True, domain, None
Email Parameter:
def validate_email(email: str) -> bool:
"""
Validate email format (RFC 5322 simplified).
Used by rel="me" discovery service.
"""
email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
if not re.match(email_regex, email):
return False
if len(email) > 254: # RFC 5321 maximum
return False
if email.count('@') != 1:
return False
local, domain = email.split('@')
if '.' not in domain:
return False
return True
URL Parameters (me, client_id, redirect_uri):
def validate_url(url: str, param_name: str) -> Tuple[bool, Optional[str]]:
"""
Validate URL parameter.
Returns: (is_valid, error_message)
"""
from urllib.parse import urlparse
try:
parsed = urlparse(url)
except Exception:
return False, f"{param_name} must be a valid URL"
# Must have scheme and netloc
if not parsed.scheme or not parsed.netloc:
return False, f"{param_name} must be a complete URL (e.g., https://example.com)"
# Must be http or https
if parsed.scheme not in ['http', 'https']:
return False, f"{param_name} must use http or https"
# No fragments for 'me' parameter
if param_name == "me" and parsed.fragment:
return False, "me parameter must not contain fragment"
# No credentials
if parsed.username or parsed.password:
return False, f"{param_name} must not contain credentials"
return True, None
HTTPS Enforcement
Configuration:
# In production config
if not DEBUG:
# Enforce HTTPS
app.add_middleware(HTTPSRedirectMiddleware)
# Reject HTTP redirect_uri (except localhost)
if redirect_uri.startswith('http://'):
parsed = urlparse(redirect_uri)
if parsed.hostname not in ['localhost', '127.0.0.1']:
return error_response("redirect_uri must use HTTPS in production")
HTML Fetching:
- HTTPS only (hardcoded
https://in URL) - SSL certificate verification enforced (
verify=True, no option to disable) - Reject sites with invalid certificates
HTML Parsing Security
BeautifulSoup Configuration:
# Use html.parser (Python standard library, safe for untrusted HTML)
soup = BeautifulSoup(html_content, 'html.parser')
Why html.parser:
- Part of Python standard library (no external C dependencies)
- Designed for untrusted HTML
- No script execution
- No external resource loading
- Handles malformed HTML gracefully
Size Limits:
- Maximum response size: 5MB (configurable)
- Checked both in Content-Length header and actual content
Timeout:
- HTTP request timeout: 10 seconds (configurable)
- Prevents hanging on slow sites
Protection Against Open Redirects
redirect_uri Validation:
def validate_redirect_uri(redirect_uri: str, client_id: str) -> Tuple[bool, Optional[str]]:
"""
Validate redirect_uri against client_id.
Returns: (is_valid, warning_message)
"""
from urllib.parse import urlparse
redirect_parsed = urlparse(redirect_uri)
client_parsed = urlparse(client_id)
# Must be HTTPS (except localhost)
if redirect_parsed.scheme != 'https':
if redirect_parsed.hostname not in ['localhost', '127.0.0.1']:
return False, "redirect_uri must use HTTPS"
# Must have valid hostname
if not redirect_parsed.hostname:
return False, "redirect_uri must have valid hostname"
redirect_domain = redirect_parsed.hostname.lower()
client_domain = client_parsed.hostname.lower()
# Exact match: OK
if redirect_domain == client_domain:
return True, None
# Subdomain of client: OK
if redirect_domain.endswith('.' + client_domain):
return True, None
# Different domain: WARNING (display to user, but allow)
warning = (
f"Warning: Redirect to different domain ({redirect_domain}) "
f"than client ({client_domain}). Ensure you trust this application."
)
return True, warning
Display Warning to User:
- If redirect_uri domain differs from client_id domain, show warning on consent screen
- User must explicitly approve redirect to different domain
- Prevents phishing via redirect URI manipulation
CSRF Protection
State Parameter:
- Required in authorization request
- Stored with authorization code
- Passed through verification and consent steps
- Returned unchanged in redirect
- Client validates state matches original (client responsibility per OAuth 2.0)
Gondulf does NOT validate state - This is intentional per OAuth 2.0:
- State is opaque to authorization server
- Client generates state, client validates state
- Gondulf only passes it through unchanged
Code Replay Prevention
Authorization Code:
- Single-use enforcement (Phase 3 token endpoint marks as used)
- 10-minute expiration
- Bound to client_id, redirect_uri, me
- Stored in memory (Phase 1 CodeStorage)
Verification Code:
- Single-use: Deleted after successful verification
- 15-minute expiration
- Max 3 attempts before invalidation
- Constant-time comparison (prevent timing attacks)
Testing Requirements
Unit Tests
HTML Fetcher Service (9 tests):
- ✅ Successful HTTPS fetch returns content
- ✅ SSL verification failure returns None
- ✅ Timeout returns None
- ✅ HTTP error codes (404, 500) return None
- ✅ Redirects followed (up to max)
- ✅ Too many redirects returns None
- ✅ Content-Length exceeds limit returns None
- ✅ Actual content exceeds limit returns None
- ✅ Custom User-Agent sent
rel="me" Discovery Service (12 tests):
- ✅ Discovery from
<link rel="me">tag - ✅ Discovery from
<a rel="me">tag - ✅ Multiple rel="me" links: first mailto selected
- ✅ Malformed HTML handled
- ✅ Missing rel="me" returns None
- ✅ Invalid email in link returns None
- ✅ Empty href returns None
- ✅ Non-mailto links ignored
- ✅ mailto with query params strips params
- ✅ Email validation: valid formats
- ✅ Email validation: invalid formats
- ✅ Exception during parsing returns None
Domain Verification Service (15 tests):
- ✅ Full flow: DNS → rel="me" → email → code
- ✅ DNS failure blocks flow
- ✅ Site fetch failure blocks flow
- ✅ rel="me" failure blocks flow
- ✅ Email send failure cleans up code
- ✅ Code verification success stores domain
- ✅ Code verification failure decrements attempts
- ✅ Too many attempts invalidates code
- ✅ Invalid code returns error
- ✅ Code expiration handled
- ✅ Rate limiting works
- ✅ Already verified domain check
- ✅ Email masking correct
- ✅ Constant-time comparison used
- ✅ Metadata tracking works
Estimated Unit Test Count: ~36 tests
Integration Tests
Verification Endpoints (10 tests):
- ✅ POST /api/verify/start success case
- ✅ POST /api/verify/start with invalid domain
- ✅ POST /api/verify/start with DNS failure
- ✅ POST /api/verify/start with rel="me" failure
- ✅ POST /api/verify/start with email send failure
- ✅ POST /api/verify/code success case
- ✅ POST /api/verify/code with invalid code
- ✅ POST /api/verify/code with expired code
- ✅ POST /api/verify/code with missing code
- ✅ POST /api/verify/code with too many attempts
Authorization Endpoint (15 tests):
- ✅ GET /authorize with valid params (already verified domain)
- ✅ GET /authorize with valid params (new domain)
- ✅ GET /authorize with invalid response_type
- ✅ GET /authorize with invalid me parameter
- ✅ GET /authorize with invalid client_id
- ✅ GET /authorize with invalid redirect_uri
- ✅ GET /authorize with missing state
- ✅ POST /authorize/verify-code with valid code
- ✅ POST /authorize/verify-code with invalid code
- ✅ POST /authorize/consent with action=approve
- ✅ POST /authorize/consent with action=deny
- ✅ Authorization code stored with metadata
- ✅ Authorization code expires after 10 min
- ✅ State parameter passed through
- ✅ redirect_uri domain mismatch shows warning
Estimated Integration Test Count: ~25 tests
End-to-End Tests
Complete Flows (5 tests):
- ✅ Full auth flow: /authorize → verify → consent → redirect with code
- ✅ Full auth flow with cached domain (skip verification)
- ✅ User denies consent → redirect with access_denied
- ✅ DNS verification failure → error page → retry → success
- ✅ Invalid code × 3 → error "too many attempts"
Estimated E2E Test Count: ~5 tests
Security Tests
Input Validation (8 tests):
- ✅ Malformed domain rejected
- ✅ Malformed email rejected (during validation)
- ✅ Malformed URL (me, client_id, redirect_uri) rejected
- ✅ URL with credentials rejected
- ✅ URL with fragment rejected (me parameter)
- ✅ Oversized HTML (>5MB) rejected
- ✅ Invalid email in rel="me" logged and skipped
- ✅ SQL injection attempts in domain parameter (should be parameterized)
Authentication Security (5 tests):
- ✅ Expired code rejected
- ✅ Used code rejected (Phase 3)
- ✅ Invalid code rejected
- ✅ Brute force prevented (max 3 attempts)
- ✅ Constant-time comparison used (verify via timing analysis - difficult to test)
TLS/HTTPS (4 tests):
- ✅ HTTP redirect_uri rejected in production
- ✅ Invalid SSL certificate rejected
- ✅ Site fetch over HTTPS only
- ✅ HTTP allowed for localhost only
Open Redirect (3 tests):
- ✅ redirect_uri domain mismatch shows warning
- ✅ Invalid redirect_uri shows error page (no redirect)
- ✅ redirect_uri without hostname rejected
Estimated Security Test Count: ~20 tests
Coverage Target
Phase 2 Overall: 80%+ coverage (same as Phase 1)
Critical Code (95%+ coverage):
- Domain verification service (orchestration logic)
- rel="me" discovery (email extraction)
- Authorization endpoint (parameter validation)
- Security functions (validation, constant-time comparison)
Total Estimated Test Count: ~86 tests
Error Handling
DNS Verification Failure
Error Message:
DNS Verification Failed
The DNS TXT record was not found for your domain.
Please add the following TXT record to your DNS:
Type: TXT
Name: _gondulf.example.com
Value: verified
DNS changes may take up to 24 hours to propagate.
[Retry]
HTTP Response: 200 OK (HTML error page)
Logging: WARNING level with domain
rel="me" Discovery Failure
Error Message:
Email Discovery Failed
No rel="me" email link was found on your homepage.
Please add the following to https://example.com:
<link rel="me" href="mailto:your-email@example.com">
This allows us to discover your email address automatically.
Learn more: https://indieweb.org/rel-me
[Retry]
HTTP Response: 200 OK (HTML error page)
Logging: WARNING level with domain
Site Unreachable
Error Message:
Site Fetch Failed
Could not fetch your site at https://example.com
Please check:
• Site is accessible via HTTPS
• SSL certificate is valid
• No firewall blocking requests
[Retry]
HTTP Response: 200 OK (HTML error page)
Logging: ERROR level with domain and error details
Email Send Failure
Error Message:
Email Delivery Failed
Failed to send verification code to u***@example.com
Please check:
• Email address is correct in your rel="me" link
• Email server is accepting mail
• Check spam/junk folder
[Retry]
HTTP Response: 200 OK (HTML error page)
Logging: ERROR level with masked email
Invalid Code
Error Message:
Invalid code. 2 attempts remaining.
HTTP Response: 200 OK (code entry form with error)
Logging: WARNING level with masked email
Too Many Attempts
Error Message:
Too Many Attempts
You have exceeded the maximum number of attempts.
Please request a new verification code.
[Request New Code]
HTTP Response: 200 OK (error page with retry link)
Logging: WARNING level with masked email
Rate Limit Exceeded
Error Message:
Rate Limit Exceeded
Too many verification requests for this domain.
Please wait 1 hour before requesting another code.
HTTP Response: 200 OK (error page)
Logging: WARNING level with domain
OAuth 2.0 Errors (Authorization Endpoint)
Error Redirect Format:
{redirect_uri}?error={error_code}&error_description={description}&state={state}
Error Codes:
invalid_request: Missing or invalid parameterunauthorized_client: Client not authorizedaccess_denied: User denied authorizationunsupported_response_type: response_type not "code"server_error: Internal server error
Example:
https://client.example.com/callback?error=invalid_request&error_description=Missing+state+parameter&state=abc123
Logging: WARNING or ERROR level depending on error type
Error Logging Standards
Log Levels:
- DEBUG: Normal operations, detailed flow
- INFO: Successful operations (code sent, domain verified)
- WARNING: Expected errors (invalid code, DNS not found)
- ERROR: Unexpected errors (SMTP failure, site unreachable)
- CRITICAL: System failures (should not occur in Phase 2)
What to Log:
- ✅ Domain (public information)
- ✅ Email (partial mask: first 3 chars)
- ✅ Error details (for debugging)
- ✅ Request IDs (for correlation)
What NOT to Log:
- ❌ Full email addresses
- ❌ Verification codes
- ❌ Authorization codes
- ❌ User-Agent (GDPR)
- ❌ IP addresses (GDPR)
Dependencies
New Python Packages
Add to pyproject.toml:
[project]
dependencies = [
# ... existing dependencies from Phase 1
"beautifulsoup4>=4.12.0", # HTML parsing for rel="me" discovery
]
Why beautifulsoup4:
- Robust HTML parsing (handles malformed HTML)
- Safe for untrusted content (no script execution)
- Standard in Python ecosystem
- Pure Python (no C dependencies with html.parser)
Phase 1 Dependencies Used
requests(HTTP fetching - already in pyproject.toml)dnspython(DNS queries - Phase 1)smtplib(Email sending - Python stdlib, used by Phase 1)sqlalchemy(Database - Phase 1)fastapi(Web framework - Phase 1)pydantic(Data validation - Phase 1)
Configuration Additions
Optional new environment variables:
# HTML Fetching (optional - has defaults)
GONDULF_HTML_FETCH_TIMEOUT=10 # seconds
GONDULF_HTML_MAX_SIZE=5242880 # bytes (5MB)
GONDULF_HTML_MAX_REDIRECTS=5
# Rate Limiting (optional - has defaults)
GONDULF_VERIFICATION_RATE_LIMIT=3 # codes per domain per hour
Add to .env.example:
# HTML Fetching Configuration (optional)
GONDULF_HTML_FETCH_TIMEOUT=10
GONDULF_HTML_MAX_SIZE=5242880
GONDULF_HTML_MAX_REDIRECTS=5
# Rate Limiting (optional)
GONDULF_VERIFICATION_RATE_LIMIT=3
Implementation Notes
Suggested Implementation Order
-
HTML Fetcher Service (0.5 days)
- Straightforward HTTP fetching
- Few dependencies
- Easy to test in isolation
-
rel="me" Discovery Service (0.5 days)
- Pure parsing logic
- No external dependencies (besides HTML input)
- Easy to test with mock HTML
-
Domain Verification Service (1 day)
- Orchestrates all services
- More complex logic
- Needs all previous services complete
-
Database Migration (0.5 days)
- Simple UPDATE query
- Apply before verification endpoints
-
Verification Endpoints (0.5 days)
- Thin API layer over service
- FastAPI makes this straightforward
-
Authorization Endpoint (3-4 days)
- Most complex component
- HTML templates needed
- Multiple sub-endpoints
- Needs comprehensive testing
-
Integration Testing (1 day)
- Test all components together
- End-to-end flow verification
Total: ~7-8 days (matches estimate in phase-1-impact-assessment.md)
Risks and Mitigations
Risk 1: HTML Parsing Edge Cases
- Mitigation: BeautifulSoup handles malformed HTML gracefully
- Testing: Include malformed HTML in test cases
- Fallback: Clear error messages guide users to fix HTML
Risk 2: Email Delivery Failures
- Mitigation: Comprehensive SMTP error handling
- Testing: Mock SMTP failures in tests
- Fallback: Clear troubleshooting instructions in error messages
Risk 3: DNS TXT Record Setup Complexity
- Mitigation: Clear setup instructions with examples
- User Education: Document common DNS providers
- Support: Provide example DNS configurations
Risk 4: Authorization Endpoint Complexity
- Mitigation: Break into smaller sub-endpoints (verify-code, consent)
- Testing: Comprehensive integration tests
- Design: Keep state management simple (use forms, avoid complex sessions)
Risk 5: Rate Limiting Implementation
- Mitigation: Start with simple in-memory tracking (Phase 2)
- Future: Migrate to Redis for distributed rate limiting (Phase 3+)
- Placeholder: Implement rate limit check, return False for now
Performance Considerations
HTML Fetching:
- Timeout: 10 seconds (prevent hanging)
- Size limit: 5MB (prevent memory exhaustion)
- Concurrent requests: Not needed in Phase 2 (one request per auth flow)
Database Queries:
- Index on domains.domain ensures fast lookups
- Simple SELECT queries (no joins in Phase 2)
- Consider adding index on domains.verified if needed
In-Memory Storage:
- Verification codes: ~100 bytes each
- Authorization codes: ~200 bytes each
- Expected load: 10s of users, <100 concurrent verifications
- Memory impact: Negligible (<10KB)
rel="me" Parsing:
- BeautifulSoup is pure Python (not fastest, but sufficient)
- HTML size limited to 5MB (parse time <1 second)
- No performance issues expected for typical homepages
Future Extensibility
Redis Integration (Phase 3+):
- Replace in-memory CodeStorage with Redis
- Enables distributed deployment (multiple Gondulf instances)
- No code changes needed (CodeStorage interface unchanged)
Client Metadata Caching (Phase 3):
- Cache client_id fetch results
- Reduces HTTP requests during authorization
- Store in database or Redis
PKCE Support (v1.1.0):
- Add code_challenge validation in authorization endpoint
- Add code_verifier validation in token endpoint (Phase 3)
- No breaking changes to v1.0.0 clients
Additional Authentication Methods (v1.2.0+):
- GitHub/GitLab OAuth providers
- WebAuthn support
- All additive (user chooses method)
Acceptance Criteria
Phase 2 is complete when ALL of the following criteria are met:
Functionality
- HTML fetcher service fetches user homepages successfully
- rel="me" discovery service discovers email from HTML
- Domain verification service orchestrates two-factor verification
- DNS TXT verification required and working
- Email verification via rel="me" required and working
- Verification endpoints (/api/verify/start, /api/verify/code) working
- Authorization endpoint (/authorize) validates all parameters
- Authorization endpoint checks domain verification status
- Authorization endpoint shows verification form for unverified domains
- Authorization endpoint shows consent screen after verification
- Authorization code generated and stored on approval
- User can deny consent (redirects with access_denied)
- State parameter passed through all steps
Testing
- All unit tests passing (estimated ~36 tests)
- All integration tests passing (estimated ~25 tests)
- All end-to-end tests passing (estimated ~5 tests)
- All security tests passing (estimated ~20 tests)
- Test coverage ≥80% overall
- Test coverage ≥95% for domain verification service
- Test coverage ≥95% for authorization endpoint
- No known bugs or failing tests
Security
- HTTPS enforcement working (production)
- SSL certificate validation enforced (HTML fetching)
- HTML parsing secure (BeautifulSoup with html.parser)
- Input validation comprehensive (domain, email, URLs)
- Open redirect protection working (redirect_uri validation)
- Constant-time code comparison used
- Rate limiting implemented (basic in-memory)
- Attempt limiting working (max 3 per code)
- No PII in logs (email masked, no full addresses)
- Authorization codes single-use (marked for Phase 3)
Error Handling
- DNS verification failure shows clear instructions
- rel="me" discovery failure shows HTML example
- Site unreachable shows troubleshooting steps
- Email send failure shows error with retry
- Invalid code shows attempts remaining
- Too many attempts invalidates code
- Rate limit exceeded shows wait time
- OAuth 2.0 errors formatted correctly
- All errors logged appropriately
Documentation
- All new services have docstrings
- All public methods have type hints
- API endpoints documented (this design doc)
- Error messages user-friendly
- Setup instructions clear (DNS + rel="me")
- Database migration documented
Dependencies
- beautifulsoup4 added to pyproject.toml
- No new system dependencies (all Python)
- Configuration updated (.env.example)
Database
- Migration 002 applied successfully
- domains.verification_method updated to 'two_factor'
- No schema changes needed (existing schema works)
Integration
- All Phase 1 services integrated successfully
- DNS service used for TXT verification
- Email service used for code sending
- Database service used for storing verified domains
- In-memory storage used for codes
- Logging used throughout
Performance
- HTML fetching completes within 10 seconds
- rel="me" parsing completes within 1 second
- Full verification flow completes within 30 seconds
- Authorization endpoint responds within 2 seconds
- No memory leaks (codes expire and clean up)
Timeline Estimate
Phase 2 Implementation: 7-9 days
Breakdown:
- HTML Fetcher Service: 0.5 days
- rel="me" Discovery Service: 0.5 days
- Domain Verification Service: 1 day
- Database Migration: 0.5 days
- Verification Endpoints: 0.5 days
- Authorization Endpoint: 3-4 days
- Integration Testing: 1 day
- Documentation: 0.5 days (included in parallel)
Dependencies: Phase 1 complete and approved
Risk Buffer: +2 days (for unforeseen issues with HTML parsing or authorization flow complexity)
Sign-off
Design Status: Complete and ready for implementation
Architect: Claude (Architect Agent) Date: 2025-11-20
Next Steps:
- Developer reviews design document
- Developer asks clarification questions if needed
- Architect updates design based on feedback
- Developer begins implementation following design
- Developer creates implementation report upon completion
- Architect reviews implementation report
Related Documents:
/docs/architecture/overview.md- System architecture/docs/architecture/indieauth-protocol.md- IndieAuth protocol implementation/docs/architecture/security.md- Security architecture/docs/architecture/phase-1-impact-assessment.md- Phase 2 requirements/docs/decisions/ADR-005-email-based-authentication-v1-0-0.md- Two-factor verification decision/docs/decisions/ADR-008-rel-me-email-discovery.md- rel="me" pattern decision/docs/reports/2025-11-20-phase-1-foundation.md- Phase 1 implementation/docs/roadmap/v1.0.0.md- Version plan
DESIGN READY: Phase 2 Domain Verification - Please review /docs/designs/phase-2-domain-verification.md