Files
Gondulf/docs/architecture/indieauth-protocol.md
Phil Skentelbery 6f06aebf40 docs: add Phase 2 domain verification design and clarifications
Add comprehensive Phase 2 documentation:
- Complete design document for two-factor domain verification
- Implementation guide with code examples
- ADR for implementation decisions (ADR-0004)
- ADR for rel="me" email discovery (ADR-008)
- Phase 1 impact assessment
- All 23 clarification questions answered
- Updated architecture docs (indieauth-protocol, security)
- Updated ADR-005 with rel="me" approach
- Updated backlog with technical debt items

Design ready for Phase 2 implementation.

Generated with Claude Code https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 13:05:09 -07:00

689 lines
21 KiB
Markdown

# IndieAuth Protocol Implementation
## Specification Compliance
This document describes Gondulf's implementation of the W3C IndieAuth specification.
**Primary Reference**: https://www.w3.org/TR/indieauth/
**Reference Implementation**: https://github.com/aaronpk/indielogin.com
**Compliance Target**: Any compliant IndieAuth client MUST be able to authenticate successfully against Gondulf.
## Protocol Overview
IndieAuth is built on OAuth 2.0, extending it to enable decentralized authentication where users are identified by URLs (typically their own domain) rather than accounts on centralized services.
### Core Principle
Users prove ownership of a domain, and that domain becomes their identity. No usernames, no passwords stored by the server.
### IndieAuth vs OAuth 2.0
**Similarities**:
- Authorization code flow
- Token endpoint for code exchange
- State parameter for CSRF protection
- Redirect-based flow
**Differences**:
- User identity is a URL (`me` parameter), not an opaque user ID
- No client secrets (all clients are "public clients")
- Client IDs are URLs that must be fetchable
- Domain ownership verification instead of password authentication
## v1.0.0 Scope
Gondulf v1.0.0 implements **authentication only** (not authorization):
- Users can prove they own a domain
- Tokens are issued but carry no permissions (scope)
- Client applications can verify user identity
- NO resource server capabilities
- NO scope-based authorization
**Future versions** will add:
- Authorization with scopes
- Token refresh
- Token revocation
- Resource server capabilities
## Endpoints
### Discovery Endpoint (Optional)
**URL**: `/.well-known/oauth-authorization-server`
**Purpose**: Advertise server capabilities and endpoints per RFC 8414.
**Response** (JSON):
```json
{
"issuer": "https://auth.example.com",
"authorization_endpoint": "https://auth.example.com/authorize",
"token_endpoint": "https://auth.example.com/token",
"response_types_supported": ["code"],
"grant_types_supported": ["authorization_code"],
"code_challenge_methods_supported": ["S256"],
"token_endpoint_auth_methods_supported": ["none"]
}
```
**Implementation Notes**:
- Optional for v1.0.0 but recommended
- FastAPI endpoint: `GET /.well-known/oauth-authorization-server`
- Static response (no database access)
- Cache-Control: public, max-age=86400
### Authorization Endpoint
**URL**: `/authorize`
**Method**: GET
**Purpose**: Initiate authentication flow
#### Required Parameters
| Parameter | Description | Validation |
|-----------|-------------|------------|
| `me` | User's domain/URL | Must be valid URL, no fragments/credentials/ports |
| `client_id` | Client application URL | Must be valid URL, must be fetchable |
| `redirect_uri` | Where to send user after auth | Must be valid URL, must match client_id domain OR be registered |
| `state` | CSRF protection token | Required, opaque string, returned unchanged |
| `response_type` | Must be `code` | Exactly `code` for auth code flow |
#### Optional Parameters (v1.0.0)
| Parameter | Description | v1.0.0 Behavior |
|-----------|-------------|-----------------|
| `scope` | Requested permissions | Ignored (authentication only) |
| `code_challenge` | PKCE challenge | NOT supported in v1.0.0 |
| `code_challenge_method` | PKCE method | NOT supported in v1.0.0 |
**PKCE Decision**: Deferred to post-v1.0.0 to maintain MVP simplicity. See ADR-003.
#### Request Validation Sequence
1. **Validate `response_type`**
- MUST be exactly `code`
- Error: `unsupported_response_type`
2. **Validate `me` parameter**
- Must be a valid URL
- Must NOT contain fragment (#)
- Must NOT contain credentials (user:pass@)
- Must NOT contain port (except :443 for HTTPS)
- Must NOT be an IP address
- Normalize: lowercase domain, remove trailing slash
- Error: `invalid_request` with description
3. **Validate `client_id`**
- Must be a valid URL
- Must contain a domain component (not localhost in production)
- Fetch client_id URL to retrieve app info (see Client Validation)
- Error: `invalid_client` with description
4. **Validate `redirect_uri`**
- Must be a valid URL
- Must use HTTPS in production (HTTP allowed for localhost)
- If domain differs from client_id domain:
- Must match client_id subdomain pattern, OR
- Must be registered in client metadata (future), OR
- Display warning to user
- Error: `invalid_request` with description
5. **Validate `state`**
- Must be present
- Must be non-empty string
- Store for verification (not used server-side, returned to client)
- Error: `invalid_request` with description
#### Client Validation
When `client_id` is provided, fetch the URL to retrieve application information:
**HTTP Request**:
```
GET https://client.example.com/
Accept: text/html
```
**Extract Application Info**:
- Look for `h-app` microformat in HTML
- Extract: app name, icon, URL
- Extract registered redirect URIs from `<link rel="redirect_uri">` tags
- Cache result for 24 hours
**Fallback**:
- If no h-app found, use domain name as app name
- If no icon, use generic icon
- If no redirect URIs registered, rely on domain matching
**Security**:
- Follow redirects (max 5)
- Timeout after 5 seconds
- Validate SSL certificates
- Reject non-200 responses
- Log client_id fetch failures
#### Authentication Flow (v1.0.0: Two-Factor Domain Verification)
1. **DNS TXT Record Verification (Required)**
- Check if `me` domain has TXT record: `_gondulf.{domain}` = `verified`
- Query multiple DNS resolvers (Google 8.8.8.8, Cloudflare 1.1.1.1)
- Require consensus from at least 2 resolvers
- If not found: Display error with instructions to add TXT record
- If found: Proceed to email discovery
- Proves: User controls DNS for the domain
2. **Email Discovery via rel="me" (Required)**
- Fetch user's domain homepage (e.g., https://example.com)
- Parse HTML for `<link rel="me" href="mailto:user@example.com">` or `<a rel="me" href="mailto:user@example.com">`
- Extract email address from first matching mailto: link
- If not found: Display error with instructions to add rel="me" link
- If found: Proceed to email verification
- Proves: User has published email relationship on their site
- Reference: https://indieweb.org/rel-me
3. **Email Verification Code (Required)**
- Generate 6-digit verification code (cryptographically random)
- Store code in memory with 15-minute TTL
- Send code to discovered email address via SMTP
- Display code entry form showing discovered email (partially masked)
- User enters 6-digit code
- Validate code matches and hasn't expired (max 3 attempts)
- Proves: User controls the email account
- Mark domain as verified (store in database)
4. **User Consent**
- Display authorization prompt:
- "Sign in to [App Name] as [me]"
- Show client_id full URL
- Show redirect_uri if different domain
- Show scope (future)
- User approves or denies
5. **Authorization Code Generation**
- Generate cryptographically secure code (32 bytes, base64url)
- Store code in memory with 10-minute TTL
- Store associated data:
- `me` (user's domain)
- `client_id`
- `redirect_uri`
- `state`
- Timestamp
- Code is single-use only
6. **Redirect to Client**
```
HTTP/1.1 302 Found
Location: {redirect_uri}?code={code}&state={state}
```
**Security Model**: Two-factor verification requires BOTH DNS control AND email control. An attacker would need to compromise both to authenticate fraudulently.
#### Error Responses
Return error via redirect when possible:
```
HTTP/1.1 302 Found
Location: {redirect_uri}?error={error}&error_description={description}&state={state}
```
**Error Codes** (OAuth 2.0 standard):
- `invalid_request` - Malformed request
- `unauthorized_client` - Client not authorized
- `access_denied` - User denied authorization
- `unsupported_response_type` - response_type not `code`
- `invalid_scope` - Invalid scope requested (future)
- `server_error` - Internal server error
- `temporarily_unavailable` - Server temporarily unavailable
When redirect not possible (invalid redirect_uri), display error page.
### Token Endpoint
**URL**: `/token`
**Method**: POST
**Content-Type**: `application/x-www-form-urlencoded`
**Purpose**: Exchange authorization code for access token
#### Required Parameters
| Parameter | Description | Validation |
|-----------|-------------|------------|
| `grant_type` | Must be `authorization_code` | Exactly `authorization_code` |
| `code` | Authorization code from /authorize | Must be valid, unexpired, unused |
| `client_id` | Client application URL | Must match code's client_id |
| `redirect_uri` | Original redirect URI | Must match code's redirect_uri |
| `me` | User's domain | Must match code's me |
#### Request Validation Sequence
1. **Validate `grant_type`**
- MUST be `authorization_code`
- Error: `unsupported_grant_type`
2. **Validate `code`**
- Must exist in storage
- Must not have expired (10-minute TTL)
- Must not have been used already
- Mark as used immediately
- Error: `invalid_grant`
3. **Validate `client_id`**
- Must match the client_id associated with code
- Error: `invalid_client`
4. **Validate `redirect_uri`**
- Must exactly match the redirect_uri from authorization request
- Error: `invalid_grant`
5. **Validate `me`**
- Must exactly match the me from authorization request
- Error: `invalid_request`
#### Token Generation
**v1.0.0 Implementation: Opaque Tokens**
```python
import secrets
import hashlib
from datetime import datetime, timedelta
# Generate token
token = secrets.token_urlsafe(32) # 256 bits
# Store in database
token_record = {
"token_hash": hashlib.sha256(token.encode()).hexdigest(),
"me": me,
"client_id": client_id,
"scope": "", # Empty for authentication-only
"issued_at": datetime.utcnow(),
"expires_at": datetime.utcnow() + timedelta(hours=1)
}
```
**Why Opaque Tokens in v1.0.0**:
- Simpler than JWT (no signing, no key rotation)
- Easier to revoke (database lookup)
- Sufficient for authentication-only use case
- Can migrate to JWT in future versions
**Token Properties**:
- Length: 43 characters (base64url of 32 bytes)
- Entropy: 256 bits (cryptographically secure)
- Storage: SHA-256 hash in database
- Expiration: 1 hour (configurable)
- Revocable: Delete from database
#### Success Response
**HTTP 200 OK**:
```json
{
"access_token": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"token_type": "Bearer",
"me": "https://example.com",
"scope": ""
}
```
**Response Fields**:
- `access_token`: The opaque token (43 characters)
- `token_type`: Always `Bearer`
- `me`: User's canonical domain URL (normalized)
- `scope`: Empty string for authentication-only (future: space-separated scopes)
**Headers**:
```
Content-Type: application/json
Cache-Control: no-store
Pragma: no-cache
```
#### Error Responses
**HTTP 400 Bad Request**:
```json
{
"error": "invalid_grant",
"error_description": "Authorization code has expired"
}
```
**Error Codes** (OAuth 2.0 standard):
- `invalid_request` - Missing or invalid parameters
- `invalid_client` - Client authentication failed
- `invalid_grant` - Invalid or expired authorization code
- `unauthorized_client` - Client not authorized for grant type
- `unsupported_grant_type` - Grant type not `authorization_code`
### Token Verification Endpoint (Future)
**URL**: `/token/verify`
**Method**: GET
**Purpose**: Verify token validity (for resource servers)
**NOT implemented in v1.0.0** (authentication only, no resource servers).
Future implementation:
```
GET /token/verify
Authorization: Bearer {token}
Response 200 OK:
{
"me": "https://example.com",
"client_id": "https://client.example.com",
"scope": ""
}
```
### Token Revocation Endpoint (Future)
**URL**: `/token/revoke`
**Method**: POST
**Purpose**: Revoke access token
**NOT implemented in v1.0.0**.
Future implementation per RFC 7009.
## Data Models
### Authorization Code (In-Memory)
```python
{
"code": "abc123...", # 43-char base64url
"me": "https://example.com",
"client_id": "https://client.example.com",
"redirect_uri": "https://client.example.com/callback",
"state": "client-provided-state",
"created_at": datetime,
"expires_at": datetime, # created_at + 10 minutes
"used": False
}
```
**Storage**: Python dict with TTL management
**Expiration**: 10 minutes (per spec: "shortly after")
**Single-use**: Marked as used after redemption
**Cleanup**: Automatic expiration via TTL
### Email Verification Code (In-Memory)
```python
{
"email": "admin@example.com", # Discovered from rel="me", not user-provided
"code": "123456", # 6-digit string
"domain": "example.com",
"created_at": datetime,
"expires_at": datetime, # created_at + 15 minutes
"attempts": 0 # Rate limiting (max 3 attempts)
}
```
**Storage**: Python dict with TTL management
**Email Source**: Discovered from site's rel="me" link (not user input)
**Expiration**: 15 minutes
**Rate Limiting**: Max 3 attempts per email, max 3 codes per domain per hour
**Cleanup**: Automatic expiration via TTL
### Access Token (SQLite)
```sql
CREATE TABLE tokens (
id INTEGER PRIMARY KEY AUTOINCREMENT,
token_hash TEXT NOT NULL UNIQUE, -- SHA-256 hash
me TEXT NOT NULL,
client_id TEXT NOT NULL,
scope TEXT NOT NULL, -- Empty string for v1.0.0
issued_at TIMESTAMP NOT NULL,
expires_at TIMESTAMP NOT NULL,
revoked BOOLEAN DEFAULT 0,
INDEX idx_token_hash (token_hash),
INDEX idx_me (me),
INDEX idx_expires_at (expires_at)
);
```
**Lookup**: By token_hash (constant-time comparison)
**Expiration**: 1 hour default (configurable)
**Revocation**: Set `revoked = 1` (future feature)
**Cleanup**: Periodic deletion of expired tokens
### Verified Domain (SQLite)
```sql
CREATE TABLE domains (
id INTEGER PRIMARY KEY AUTOINCREMENT,
domain TEXT NOT NULL UNIQUE,
verification_method TEXT NOT NULL, -- 'two_factor' (DNS + Email)
verified_at TIMESTAMP NOT NULL,
last_dns_check TIMESTAMP,
dns_txt_valid BOOLEAN DEFAULT 0,
last_email_check TIMESTAMP,
INDEX idx_domain (domain)
);
```
**Purpose**: Cache domain ownership verification
**Verification Method**: Always 'two_factor' in v1.0.0 (DNS TXT + Email via rel="me")
**DNS TXT**: Re-verified periodically (daily check)
**Email**: NOT stored (only verification timestamp recorded)
**Re-verification**: DNS checked periodically, email re-verified on each login
**Cleanup**: Optional (admin decision)
## Security Considerations
### URL Validation
**Critical**: Prevent open redirect and phishing attacks.
**`me` Validation**:
```python
from urllib.parse import urlparse
def validate_me(me: str) -> tuple[bool, str, str]:
"""
Validate me parameter.
Returns: (valid, normalized_me, error_message)
"""
parsed = urlparse(me)
# Must have scheme and netloc
if not parsed.scheme or not parsed.netloc:
return False, "", "me must be a complete URL"
# Must be HTTP or HTTPS
if parsed.scheme not in ['http', 'https']:
return False, "", "me must use http or https"
# No fragments
if parsed.fragment:
return False, "", "me must not contain fragment"
# No credentials
if parsed.username or parsed.password:
return False, "", "me must not contain credentials"
# No ports (except default)
if parsed.port and not (parsed.port == 443 and parsed.scheme == 'https'):
return False, "", "me must not contain non-standard port"
# No IP addresses
import ipaddress
try:
ipaddress.ip_address(parsed.netloc)
return False, "", "me must be a domain, not IP address"
except ValueError:
pass # Good, not an IP
# Normalize
domain = parsed.netloc.lower()
path = parsed.path.rstrip('/')
normalized = f"{parsed.scheme}://{domain}{path}"
return True, normalized, ""
```
**`redirect_uri` Validation**:
```python
def validate_redirect_uri(redirect_uri: str, client_id: str) -> tuple[bool, str]:
"""
Validate redirect_uri against client_id.
Returns: (valid, error_message)
"""
parsed_redirect = urlparse(redirect_uri)
parsed_client = urlparse(client_id)
# Must be valid URL
if not parsed_redirect.scheme or not parsed_redirect.netloc:
return False, "redirect_uri must be a complete URL"
# Must be HTTPS in production (allow HTTP for localhost)
if not DEBUG:
if parsed_redirect.scheme != 'https':
if parsed_redirect.netloc != 'localhost':
return False, "redirect_uri must use HTTPS"
redirect_domain = parsed_redirect.netloc.lower()
client_domain = parsed_client.netloc.lower()
# Same domain: OK
if redirect_domain == client_domain:
return True, ""
# Subdomain of client domain: OK
if redirect_domain.endswith('.' + client_domain):
return True, ""
# Different domain: Check if registered (future)
# For v1.0.0: Display warning to user
return True, "warning: redirect_uri domain differs from client_id"
```
### Constant-Time Comparison
Prevent timing attacks on token verification:
```python
import secrets
def verify_token(provided_token: str, stored_hash: str) -> bool:
"""
Verify token using constant-time comparison.
"""
import hashlib
provided_hash = hashlib.sha256(provided_token.encode()).hexdigest()
return secrets.compare_digest(provided_hash, stored_hash)
```
### CSRF Protection
**State Parameter**:
- Client generates unguessable state
- Server returns state unchanged
- Client verifies state matches
- Server does NOT validate state (client's responsibility)
### HTTPS Enforcement
**Production Requirements**:
- All endpoints MUST use HTTPS
- HTTP allowed only for localhost in development
- HSTS header recommended: `Strict-Transport-Security: max-age=31536000`
### Rate Limiting (Future)
**v1.0.0**: Not implemented (acceptable for small deployments).
**Future versions**:
- Authorization requests: 10/minute per IP
- Token requests: 30/minute per client_id
- Email codes: 3/hour per email
- Failed verifications: 5/hour per IP
## Protocol Deviations
### Intentional Deviations from W3C Spec
**ADR-003**: PKCE deferred to post-v1.0.0
- **Reason**: Simplicity for MVP, small user base, HTTPS mitigates risk
- **Impact**: Slightly less secure against code interception
- **Mitigation**: Enforce HTTPS, short code TTL (10 minutes)
- **Upgrade Path**: Add PKCE in v1.1.0 without breaking changes
**ADR-004**: No client pre-registration required (TBD)
- **Reason**: Aligns with user requirement for simplified client onboarding
- **Impact**: Must validate client_id on every request
- **Mitigation**: Cache client metadata, implement rate limiting
- **Spec Compliance**: Spec allows this ("client IDs are resolvable URLs")
### Scope Limitations (v1.0.0)
**Authentication Only**:
- `scope` parameter accepted but ignored
- All tokens issued with empty scope
- Tokens prove identity, not authorization
- Future versions will support scopes
## Testing Strategy
### Compliance Testing
**Required Tests**:
1. Valid authorization request → code generation
2. Valid token request → token generation
3. Invalid client_id → error
4. Invalid redirect_uri → error
5. Missing state → error
6. Expired authorization code → error
7. Used authorization code → error
8. Mismatched client_id on token request → error
### Interoperability Testing
**Test Against**:
- IndieAuth.com test suite (if available)
- Real IndieAuth clients (IndieLogin, etc.)
- Reference implementation comparison
### Security Testing
**Required Tests**:
1. Open redirect prevention (invalid redirect_uri)
2. Timing attack resistance (token verification)
3. CSRF protection (state parameter)
4. Code reuse prevention (single-use codes)
5. URL validation (me parameter malformation)
## Implementation Checklist
- [ ] `/authorize` endpoint with parameter validation
- [ ] Client metadata fetching (h-app microformat)
- [ ] Email verification flow (code generation, sending, validation)
- [ ] Domain ownership caching (SQLite)
- [ ] Authorization code generation and storage (in-memory)
- [ ] `/token` endpoint with grant validation
- [ ] Access token generation and storage (SQLite, hashed)
- [ ] Error responses (OAuth 2.0 compliant)
- [ ] HTTPS enforcement (production)
- [ ] URL validation (me, client_id, redirect_uri)
- [ ] Constant-time token comparison
- [ ] Metadata endpoint `/.well-known/oauth-authorization-server`
- [ ] Comprehensive test suite (80%+ coverage)
## References
- W3C IndieAuth Specification: https://www.w3.org/TR/indieauth/
- OAuth 2.0 (RFC 6749): https://datatracker.ietf.org/doc/html/rfc6749
- OAuth 2.0 Security Best Practices: https://datatracker.ietf.org/doc/html/draft-ietf-oauth-security-topics
- PKCE (RFC 7636): https://datatracker.ietf.org/doc/html/rfc7636 (future)
- Token Revocation (RFC 7009): https://datatracker.ietf.org/doc/html/rfc7009 (future)
- Authorization Server Metadata (RFC 8414): https://datatracker.ietf.org/doc/html/rfc8414