Gondulf/docs/architecture/indieauth-protocol.md

# IndieAuth Protocol Implementation

## Specification Compliance

This document describes Gondulf's implementation of the W3C IndieAuth specification.

**Primary Reference**: https://www.w3.org/TR/indieauth/
**Reference Implementation**: https://github.com/aaronpk/indielogin.com

**Compliance Target**: Any compliant IndieAuth client MUST be able to authenticate successfully against Gondulf.

## Protocol Overview

IndieAuth is built on OAuth 2.0, extending it to enable decentralized authentication where users are identified by URLs (typically their own domain) rather than accounts on centralized services.

### Core Principle
Users prove ownership of a domain, and that domain becomes their identity. No usernames, no passwords stored by the server.

### IndieAuth vs OAuth 2.0

**Similarities**:
- Authorization code flow
- Token endpoint for code exchange
- State parameter for CSRF protection
- Redirect-based flow

**Differences**:
- User identity is a URL (`me` parameter), not an opaque user ID
- No client secrets (all clients are "public clients")
- Client IDs are URLs that must be fetchable
- Domain ownership verification instead of password authentication

## v1.0.0 Scope

Gondulf v1.0.0 implements **authentication only** (not authorization):
- Users can prove they own a domain
- Tokens are issued but carry no permissions (scope)
- Client applications can verify user identity
- NO resource server capabilities
- NO scope-based authorization

**Future versions** will add:
- Authorization with scopes
- Token refresh
- Token revocation
- Resource server capabilities

## Endpoints

### Discovery Endpoint (Optional)

**URL**: `/.well-known/oauth-authorization-server`

**Purpose**: Advertise server capabilities and endpoints per RFC 8414.

**Response** (JSON):
```json
{
  "issuer": "https://auth.example.com",
  "authorization_endpoint": "https://auth.example.com/authorize",
  "token_endpoint": "https://auth.example.com/token",
  "response_types_supported": ["code"],
  "grant_types_supported": ["authorization_code"],
  "code_challenge_methods_supported": ["S256"],
  "token_endpoint_auth_methods_supported": ["none"]
}
```

**Implementation Notes**:
- Optional for v1.0.0 but recommended
- FastAPI endpoint: `GET /.well-known/oauth-authorization-server`
- Static response (no database access)
- Cache-Control: public, max-age=86400

### Authorization Endpoint

**URL**: `/authorize`
**Method**: GET
**Purpose**: Initiate authentication flow

#### Required Parameters

| Parameter | Description | Validation |
|-----------|-------------|------------|
| `me` | User's domain/URL | Must be valid URL, no fragments/credentials/ports |
| `client_id` | Client application URL | Must be valid URL, must be fetchable |
| `redirect_uri` | Where to send user after auth | Must be valid URL, must match client_id domain OR be registered |
| `state` | CSRF protection token | Required, opaque string, returned unchanged |
| `response_type` | Must be `code` | Exactly `code` for auth code flow |

#### Optional Parameters (v1.0.0)

| Parameter | Description | v1.0.0 Behavior |
|-----------|-------------|-----------------|
| `scope` | Requested permissions | Ignored (authentication only) |
| `code_challenge` | PKCE challenge | NOT supported in v1.0.0 |
| `code_challenge_method` | PKCE method | NOT supported in v1.0.0 |

**PKCE Decision**: Deferred to post-v1.0.0 to maintain MVP simplicity. See ADR-003.

#### Request Validation Sequence

1. **Validate `response_type`**
   - MUST be exactly `code`
   - Error: `unsupported_response_type`

2. **Validate `me` parameter**
   - Must be a valid URL
   - Must NOT contain fragment (#)
   - Must NOT contain credentials (user:pass@)
   - Must NOT contain port (except :443 for HTTPS)
   - Must NOT be an IP address
   - Normalize: lowercase domain, remove trailing slash
   - Error: `invalid_request` with description

3. **Validate `client_id`**
   - Must be a valid URL
   - Must contain a domain component (not localhost in production)
   - Fetch client_id URL to retrieve app info (see Client Validation)
   - Error: `invalid_client` with description

4. **Validate `redirect_uri`**
   - Must be a valid URL
   - Must use HTTPS in production (HTTP allowed for localhost)
   - If domain differs from client_id domain:
     - Must match client_id subdomain pattern, OR
     - Must be registered in client metadata (future), OR
     - Display warning to user
   - Error: `invalid_request` with description

5. **Validate `state`**
   - Must be present
   - Must be non-empty string
   - Store for verification (not used server-side, returned to client)
   - Error: `invalid_request` with description

#### Client Validation

When `client_id` is provided, fetch the URL to retrieve application information:

**HTTP Request**:
```
GET https://client.example.com/
Accept: text/html
```

**Extract Application Info**:
- Look for `h-app` microformat in HTML
- Extract: app name, icon, URL
- Extract registered redirect URIs from `<link rel="redirect_uri">` tags
- Cache result for 24 hours

**Fallback**:
- If no h-app found, use domain name as app name
- If no icon, use generic icon
- If no redirect URIs registered, rely on domain matching

**Security**:
- Follow redirects (max 5)
- Timeout after 5 seconds
- Validate SSL certificates
- Reject non-200 responses
- Log client_id fetch failures

#### Authentication Flow (v1.0.0: Two-Factor Domain Verification)

1. **DNS TXT Record Verification (Required)**
   - Check if `me` domain has TXT record: `_gondulf.{domain}` = `verified`
   - Query multiple DNS resolvers (Google 8.8.8.8, Cloudflare 1.1.1.1)
   - Require consensus from at least 2 resolvers
   - If not found: Display error with instructions to add TXT record
   - If found: Proceed to email discovery
   - Proves: User controls DNS for the domain

2. **Email Discovery via rel="me" (Required)**
   - Fetch user's domain homepage (e.g., https://example.com)
   - Parse HTML for `<link rel="me" href="mailto:user@example.com">` or `<a rel="me" href="mailto:user@example.com">`
   - Extract email address from first matching mailto: link
   - If not found: Display error with instructions to add rel="me" link
   - If found: Proceed to email verification
   - Proves: User has published email relationship on their site
   - Reference: https://indieweb.org/rel-me

3. **Email Verification Code (Required)**
   - Generate 6-digit verification code (cryptographically random)
   - Store code in memory with 15-minute TTL
   - Send code to discovered email address via SMTP
   - Display code entry form showing discovered email (partially masked)
   - User enters 6-digit code
   - Validate code matches and hasn't expired (max 3 attempts)
   - Proves: User controls the email account
   - Mark domain as verified (store in database)

4. **User Consent**
   - Display authorization prompt:
     - "Sign in to [App Name] as [me]"
     - Show client_id full URL
     - Show redirect_uri if different domain
     - Show scope (future)
   - User approves or denies

5. **Authorization Code Generation**
   - Generate cryptographically secure code (32 bytes, base64url)
   - Store code in memory with 10-minute TTL
   - Store associated data:
     - `me` (user's domain)
     - `client_id`
     - `redirect_uri`
     - `state`
     - Timestamp
   - Code is single-use only

6. **Redirect to Client**
   ```
   HTTP/1.1 302 Found
   Location: {redirect_uri}?code={code}&state={state}
   ```

**Security Model**: Two-factor verification requires BOTH DNS control AND email control. An attacker would need to compromise both to authenticate fraudulently.

#### Error Responses

Return error via redirect when possible:
```
HTTP/1.1 302 Found
Location: {redirect_uri}?error={error}&error_description={description}&state={state}
```

**Error Codes** (OAuth 2.0 standard):
- `invalid_request` - Malformed request
- `unauthorized_client` - Client not authorized
- `access_denied` - User denied authorization
- `unsupported_response_type` - response_type not `code`
- `invalid_scope` - Invalid scope requested (future)
- `server_error` - Internal server error
- `temporarily_unavailable` - Server temporarily unavailable

When redirect not possible (invalid redirect_uri), display error page.

### Token Endpoint

**URL**: `/token`
**Method**: POST
**Content-Type**: `application/x-www-form-urlencoded`
**Purpose**: Exchange authorization code for access token

#### Required Parameters

| Parameter | Description | Validation |
|-----------|-------------|------------|
| `grant_type` | Must be `authorization_code` | Exactly `authorization_code` |
| `code` | Authorization code from /authorize | Must be valid, unexpired, unused |
| `client_id` | Client application URL | Must match code's client_id |
| `redirect_uri` | Original redirect URI | Must match code's redirect_uri |
| `me` | User's domain | Must match code's me |

#### Request Validation Sequence

1. **Validate `grant_type`**
   - MUST be `authorization_code`
   - Error: `unsupported_grant_type`

2. **Validate `code`**
   - Must exist in storage
   - Must not have expired (10-minute TTL)
   - Must not have been used already
   - Mark as used immediately
   - Error: `invalid_grant`

3. **Validate `client_id`**
   - Must match the client_id associated with code
   - Error: `invalid_client`

4. **Validate `redirect_uri`**
   - Must exactly match the redirect_uri from authorization request
   - Error: `invalid_grant`

5. **Validate `me`**
   - Must exactly match the me from authorization request
   - Error: `invalid_request`

#### Token Generation

**v1.0.0 Implementation: Opaque Tokens**

```python
import secrets
import hashlib
from datetime import datetime, timedelta

# Generate token
token = secrets.token_urlsafe(32)  # 256 bits

# Store in database
token_record = {
    "token_hash": hashlib.sha256(token.encode()).hexdigest(),
    "me": me,
    "client_id": client_id,
    "scope": "",  # Empty for authentication-only
    "issued_at": datetime.utcnow(),
    "expires_at": datetime.utcnow() + timedelta(hours=1)
}
```

**Why Opaque Tokens in v1.0.0**:
- Simpler than JWT (no signing, no key rotation)
- Easier to revoke (database lookup)
- Sufficient for authentication-only use case
- Can migrate to JWT in future versions

**Token Properties**:
- Length: 43 characters (base64url of 32 bytes)
- Entropy: 256 bits (cryptographically secure)
- Storage: SHA-256 hash in database
- Expiration: 1 hour (configurable)
- Revocable: Delete from database

#### Success Response

**HTTP 200 OK**:
```json
{
  "access_token": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "token_type": "Bearer",
  "me": "https://example.com",
  "scope": ""
}
```

**Response Fields**:
- `access_token`: The opaque token (43 characters)
- `token_type`: Always `Bearer`
- `me`: User's canonical domain URL (normalized)
- `scope`: Empty string for authentication-only (future: space-separated scopes)

**Headers**:
```
Content-Type: application/json
Cache-Control: no-store
Pragma: no-cache
```

#### Error Responses

**HTTP 400 Bad Request**:
```json
{
  "error": "invalid_grant",
  "error_description": "Authorization code has expired"
}
```

**Error Codes** (OAuth 2.0 standard):
- `invalid_request` - Missing or invalid parameters
- `invalid_client` - Client authentication failed
- `invalid_grant` - Invalid or expired authorization code
- `unauthorized_client` - Client not authorized for grant type
- `unsupported_grant_type` - Grant type not `authorization_code`

### Token Verification Endpoint (Future)

**URL**: `/token/verify`
**Method**: GET
**Purpose**: Verify token validity (for resource servers)

**NOT implemented in v1.0.0** (authentication only, no resource servers).

Future implementation:
```
GET /token/verify
Authorization: Bearer {token}

Response 200 OK:
{
  "me": "https://example.com",
  "client_id": "https://client.example.com",
  "scope": ""
}
```

### Token Revocation Endpoint (Future)

**URL**: `/token/revoke`
**Method**: POST
**Purpose**: Revoke access token

**NOT implemented in v1.0.0**.

Future implementation per RFC 7009.

## Data Models

### Authorization Code (In-Memory)

```python
{
    "code": "abc123...",  # 43-char base64url
    "me": "https://example.com",
    "client_id": "https://client.example.com",
    "redirect_uri": "https://client.example.com/callback",
    "state": "client-provided-state",
    "created_at": datetime,
    "expires_at": datetime,  # created_at + 10 minutes
    "used": False
}
```

**Storage**: Python dict with TTL management
**Expiration**: 10 minutes (per spec: "shortly after")
**Single-use**: Marked as used after redemption
**Cleanup**: Automatic expiration via TTL

### Email Verification Code (In-Memory)

```python
{
    "email": "admin@example.com",  # Discovered from rel="me", not user-provided
    "code": "123456",  # 6-digit string
    "domain": "example.com",
    "created_at": datetime,
    "expires_at": datetime,  # created_at + 15 minutes
    "attempts": 0  # Rate limiting (max 3 attempts)
}
```

**Storage**: Python dict with TTL management
**Email Source**: Discovered from site's rel="me" link (not user input)
**Expiration**: 15 minutes
**Rate Limiting**: Max 3 attempts per email, max 3 codes per domain per hour
**Cleanup**: Automatic expiration via TTL

### Access Token (SQLite)

```sql
CREATE TABLE tokens (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    token_hash TEXT NOT NULL UNIQUE,  -- SHA-256 hash
    me TEXT NOT NULL,
    client_id TEXT NOT NULL,
    scope TEXT NOT NULL,  -- Empty string for v1.0.0
    issued_at TIMESTAMP NOT NULL,
    expires_at TIMESTAMP NOT NULL,
    revoked BOOLEAN DEFAULT 0,

    INDEX idx_token_hash (token_hash),
    INDEX idx_me (me),
    INDEX idx_expires_at (expires_at)
);
```

**Lookup**: By token_hash (constant-time comparison)
**Expiration**: 1 hour default (configurable)
**Revocation**: Set `revoked = 1` (future feature)
**Cleanup**: Periodic deletion of expired tokens

### Verified Domain (SQLite)

```sql
CREATE TABLE domains (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    domain TEXT NOT NULL UNIQUE,
    verification_method TEXT NOT NULL,  -- 'two_factor' (DNS + Email)
    verified_at TIMESTAMP NOT NULL,
    last_dns_check TIMESTAMP,
    dns_txt_valid BOOLEAN DEFAULT 0,
    last_email_check TIMESTAMP,

    INDEX idx_domain (domain)
);
```

**Purpose**: Cache domain ownership verification
**Verification Method**: Always 'two_factor' in v1.0.0 (DNS TXT + Email via rel="me")
**DNS TXT**: Re-verified periodically (daily check)
**Email**: NOT stored (only verification timestamp recorded)
**Re-verification**: DNS checked periodically, email re-verified on each login
**Cleanup**: Optional (admin decision)

## Security Considerations

### URL Validation

**Critical**: Prevent open redirect and phishing attacks.

**`me` Validation**:
```python
from urllib.parse import urlparse

def validate_me(me: str) -> tuple[bool, str, str]:
    """
    Validate me parameter.

    Returns: (valid, normalized_me, error_message)
    """
    parsed = urlparse(me)

    # Must have scheme and netloc
    if not parsed.scheme or not parsed.netloc:
        return False, "", "me must be a complete URL"

    # Must be HTTP or HTTPS
    if parsed.scheme not in ['http', 'https']:
        return False, "", "me must use http or https"

    # No fragments
    if parsed.fragment:
        return False, "", "me must not contain fragment"

    # No credentials
    if parsed.username or parsed.password:
        return False, "", "me must not contain credentials"

    # No ports (except default)
    if parsed.port and not (parsed.port == 443 and parsed.scheme == 'https'):
        return False, "", "me must not contain non-standard port"

    # No IP addresses
    import ipaddress
    try:
        ipaddress.ip_address(parsed.netloc)
        return False, "", "me must be a domain, not IP address"
    except ValueError:
        pass  # Good, not an IP

    # Normalize
    domain = parsed.netloc.lower()
    path = parsed.path.rstrip('/')
    normalized = f"{parsed.scheme}://{domain}{path}"

    return True, normalized, ""
```

**`redirect_uri` Validation**:
```python
def validate_redirect_uri(redirect_uri: str, client_id: str) -> tuple[bool, str]:
    """
    Validate redirect_uri against client_id.

    Returns: (valid, error_message)
    """
    parsed_redirect = urlparse(redirect_uri)
    parsed_client = urlparse(client_id)

    # Must be valid URL
    if not parsed_redirect.scheme or not parsed_redirect.netloc:
        return False, "redirect_uri must be a complete URL"

    # Must be HTTPS in production (allow HTTP for localhost)
    if not DEBUG:
        if parsed_redirect.scheme != 'https':
            if parsed_redirect.netloc != 'localhost':
                return False, "redirect_uri must use HTTPS"

    redirect_domain = parsed_redirect.netloc.lower()
    client_domain = parsed_client.netloc.lower()

    # Same domain: OK
    if redirect_domain == client_domain:
        return True, ""

    # Subdomain of client domain: OK
    if redirect_domain.endswith('.' + client_domain):
        return True, ""

    # Different domain: Check if registered (future)
    # For v1.0.0: Display warning to user
    return True, "warning: redirect_uri domain differs from client_id"
```

### Constant-Time Comparison

Prevent timing attacks on token verification:

```python
import secrets

def verify_token(provided_token: str, stored_hash: str) -> bool:
    """
    Verify token using constant-time comparison.
    """
    import hashlib
    provided_hash = hashlib.sha256(provided_token.encode()).hexdigest()
    return secrets.compare_digest(provided_hash, stored_hash)
```

### CSRF Protection

**State Parameter**:
- Client generates unguessable state
- Server returns state unchanged
- Client verifies state matches
- Server does NOT validate state (client's responsibility)

### HTTPS Enforcement

**Production Requirements**:
- All endpoints MUST use HTTPS
- HTTP allowed only for localhost in development
- HSTS header recommended: `Strict-Transport-Security: max-age=31536000`

### Rate Limiting (Future)

**v1.0.0**: Not implemented (acceptable for small deployments).

**Future versions**:
- Authorization requests: 10/minute per IP
- Token requests: 30/minute per client_id
- Email codes: 3/hour per email
- Failed verifications: 5/hour per IP

## Protocol Deviations

### Intentional Deviations from W3C Spec

**ADR-003**: PKCE deferred to post-v1.0.0
- **Reason**: Simplicity for MVP, small user base, HTTPS mitigates risk
- **Impact**: Slightly less secure against code interception
- **Mitigation**: Enforce HTTPS, short code TTL (10 minutes)
- **Upgrade Path**: Add PKCE in v1.1.0 without breaking changes

**ADR-004**: No client pre-registration required (TBD)
- **Reason**: Aligns with user requirement for simplified client onboarding
- **Impact**: Must validate client_id on every request
- **Mitigation**: Cache client metadata, implement rate limiting
- **Spec Compliance**: Spec allows this ("client IDs are resolvable URLs")

### Scope Limitations (v1.0.0)

**Authentication Only**:
- `scope` parameter accepted but ignored
- All tokens issued with empty scope
- Tokens prove identity, not authorization
- Future versions will support scopes

## Testing Strategy

### Compliance Testing

**Required Tests**:
1. Valid authorization request → code generation
2. Valid token request → token generation
3. Invalid client_id → error
4. Invalid redirect_uri → error
5. Missing state → error
6. Expired authorization code → error
7. Used authorization code → error
8. Mismatched client_id on token request → error

### Interoperability Testing

**Test Against**:
- IndieAuth.com test suite (if available)
- Real IndieAuth clients (IndieLogin, etc.)
- Reference implementation comparison

### Security Testing

**Required Tests**:
1. Open redirect prevention (invalid redirect_uri)
2. Timing attack resistance (token verification)
3. CSRF protection (state parameter)
4. Code reuse prevention (single-use codes)
5. URL validation (me parameter malformation)

## Implementation Checklist

- [ ] `/authorize` endpoint with parameter validation
- [ ] Client metadata fetching (h-app microformat)
- [ ] Email verification flow (code generation, sending, validation)
- [ ] Domain ownership caching (SQLite)
- [ ] Authorization code generation and storage (in-memory)
- [ ] `/token` endpoint with grant validation
- [ ] Access token generation and storage (SQLite, hashed)
- [ ] Error responses (OAuth 2.0 compliant)
- [ ] HTTPS enforcement (production)
- [ ] URL validation (me, client_id, redirect_uri)
- [ ] Constant-time token comparison
- [ ] Metadata endpoint `/.well-known/oauth-authorization-server`
- [ ] Comprehensive test suite (80%+ coverage)

## References

- W3C IndieAuth Specification: https://www.w3.org/TR/indieauth/
- OAuth 2.0 (RFC 6749): https://datatracker.ietf.org/doc/html/rfc6749
- OAuth 2.0 Security Best Practices: https://datatracker.ietf.org/doc/html/draft-ietf-oauth-security-topics
- PKCE (RFC 7636): https://datatracker.ietf.org/doc/html/rfc7636 (future)
- Token Revocation (RFC 7009): https://datatracker.ietf.org/doc/html/rfc7009 (future)
- Authorization Server Metadata (RFC 8414): https://datatracker.ietf.org/doc/html/rfc8414