Files
Gondulf/docs/decisions/0009-phase-3-token-endpoint-clarifications.md
Phil Skentelbery 05b4ff7a6b feat(phase-3): implement token endpoint and OAuth 2.0 flow
Phase 3 Implementation:
- Token service with secure token generation and validation
- Token endpoint (POST /token) with OAuth 2.0 compliance
- Database migration 003 for tokens table
- Authorization code validation and single-use enforcement

Phase 1 Updates:
- Enhanced CodeStore to support dict values with JSON serialization
- Maintains backward compatibility

Phase 2 Updates:
- Authorization codes now include PKCE fields, used flag, timestamps
- Complete metadata structure for token exchange

Security:
- 256-bit cryptographically secure tokens (secrets.token_urlsafe)
- SHA-256 hashed storage (no plaintext)
- Constant-time comparison for validation
- Single-use code enforcement with replay detection

Testing:
- 226 tests passing (100%)
- 87.27% coverage (exceeds 80% requirement)
- OAuth 2.0 compliance verified

This completes the v1.0.0 MVP with full IndieAuth authorization code flow.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 14:24:06 -07:00

8.4 KiB

0009. Phase 3 Token Endpoint Implementation Clarifications

Date: 2025-11-20

Status

Accepted

Context

The Developer has reviewed the Phase 3 Token Endpoint design and identified 8 clarification questions that require architectural decisions. These questions range from critical (CodeStore value type compatibility) to minor (logging levels), but all require clear decisions to proceed with implementation.

Decision

We make the following architectural decisions for Phase 3 implementation:

1. Authorization Code Storage Format (CRITICAL)

Decision: Modify CodeStore to accept dict values directly, with JSON serialization handled internally.

Implementation:

# In CodeStore class
def store(self, key: str, value: Union[str, dict], ttl: int = 600) -> None:
    """Store key-value pair with TTL. Value can be string or dict."""
    if isinstance(value, dict):
        value_to_store = json.dumps(value)
    else:
        value_to_store = value

    expiry = time.time() + ttl
    self._data[key] = {
        'value': value_to_store,
        'expires': expiry
    }

def get(self, key: str) -> Optional[Union[str, dict]]:
    """Get value by key. Returns dict if value is JSON, string otherwise."""
    if key not in self._data:
        return None

    entry = self._data[key]
    if time.time() > entry['expires']:
        del self._data[key]
        return None

    value = entry['value']
    # Try to parse as JSON
    try:
        return json.loads(value)
    except (json.JSONDecodeError, TypeError):
        return value

Rationale: This is the simplest approach that maintains backward compatibility with Phase 1 (string values) while supporting Phase 2/3 needs (dict metadata). The CodeStore handles serialization internally, keeping the interface clean.

2. Authorization Code Single-Use Marking

Decision: Simplify to atomic check-and-delete operation. Do NOT mark-then-delete.

Implementation:

# In token endpoint handler
# STEP 5: Check if code already used
if metadata.get('used'):
    logger.error(f"Authorization code replay detected: {code[:8]}...")
    raise HTTPException(400, {"error": "invalid_grant", "error_description": "Authorization code has already been used"})

# STEP 6-8: Extract user data, validate PKCE if needed, generate token...

# STEP 9: Delete authorization code immediately after successful token generation
code_storage.delete(code)
logger.info(f"Authorization code exchanged and deleted: {code[:8]}...")

Rationale: The simpler approach avoids the race condition complexity of calculating remaining TTL and re-storing. Since we control both the authorization and token endpoints, we can ensure codes are generated with the 'used' field set to False initially, then simply delete them after use.

3. Token Endpoint Error Response Format

Decision: FastAPI automatically handles dict detail correctly for JSON responses. No custom handler needed.

Verification: FastAPI's HTTPException with dict detail automatically:

  • Sets Content-Type: application/json
  • Serializes the dict to JSON
  • Returns proper OAuth error response

Additional Headers: Add OAuth-required cache headers explicitly:

from fastapi import Response

@router.post("/token")
async def token_exchange(response: Response, ...):
    # Add OAuth cache headers
    response.headers["Cache-Control"] = "no-store"
    response.headers["Pragma"] = "no-cache"

    # ... rest of implementation

Rationale: Use FastAPI's built-in capabilities. Explicit headers ensure OAuth compliance.

4. Phase 2/3 Authorization Code Structure

Decision: Phase 2 must include PKCE fields with default values. Phase 3 does NOT need to handle missing keys.

Phase 2 Authorization Code Structure (UPDATE REQUIRED):

# Phase 2 authorization endpoint must store:
code_data = {
    'client_id': client_id,
    'redirect_uri': redirect_uri,
    'state': state,
    'me': verified_email,  # or domain
    'scope': scope,
    'code_challenge': code_challenge or "",  # Empty string if not provided
    'code_challenge_method': code_challenge_method or "",  # Empty string if not provided
    'created_at': int(time.time()),
    'expires_at': int(time.time() + 600),
    'used': False  # Always False when created
}

Rationale: Consistency is more important than backward compatibility within a single version. Since we're building v1.0.0, all components should use the same data structure.

5. Database Connection Pattern

Decision: The Phase 1 database connection context manager does NOT auto-commit. Explicit commit required.

Confirmation from Phase 1 implementation:

# Phase 1 uses SQLite connection directly
with self.database.get_connection() as conn:
    conn.execute(query, params)
    conn.commit()  # Explicit commit required

Rationale: Explicit commits give us transaction control and match SQLite's default behavior.

6. Token Hash Collision Handling

Decision: Do NOT handle UNIQUE constraint violations. Let them fail catastrophically.

Implementation:

def generate_token(self, me: str, client_id: str, scope: str = "") -> str:
    # Generate token (2^256 entropy)
    token = secrets.token_urlsafe(self.token_length)
    token_hash = hashlib.sha256(token.encode('utf-8')).hexdigest()

    # Store in database - if this fails, let it propagate
    with self.database.get_connection() as conn:
        conn.execute(
            """INSERT INTO tokens (token_hash, me, client_id, scope, issued_at, expires_at, revoked)
               VALUES (?, ?, ?, ?, ?, ?, 0)""",
            (token_hash, me, client_id, scope, issued_at, expires_at)
        )
        conn.commit()

    return token

Rationale: With 2^256 possible values, a collision is so astronomically unlikely that if it occurs, it indicates a fundamental problem (bad RNG, cosmic rays, etc.). Retrying won't help. The UNIQUE constraint violation will be logged as an ERROR and return 500 to client, which is appropriate for this "impossible" scenario.

7. Logging Token Validation

Decision: Use the Developer's suggested logging levels:

  • DEBUG for successful validations (high volume, not interesting)
  • INFO for token generation (important events)
  • WARNING for validation failures (potential attacks or misconfiguration)

Implementation:

# In validate_token
if valid:
    logger.debug(f"Token validated successfully (me: {token_data['me']})")
else:
    logger.warning(f"Token validation failed: {reason}")

# In generate_token
logger.info(f"Token generated for {me} (client: {client_id})")

Rationale: This provides appropriate visibility without flooding logs during normal operation.

8. Token Cleanup Configuration

Decision: Implement as utility method only for v1.0.0. No automatic calling.

Implementation:

# In TokenService
def cleanup_expired_tokens(self) -> int:
    """Delete expired tokens. Call manually or via cron/scheduled task."""
    # Implementation as designed

# Not called automatically in v1.0.0
# Future v1.1.0 can add background task if needed

Configuration: Keep TOKEN_CLEANUP_ENABLED and TOKEN_CLEANUP_INTERVAL in config for future use, but don't act on them in v1.0.0.

Rationale: Simplicity for v1.0.0. With small scale (10s of users), manual or cron-based cleanup is sufficient. Automatic background tasks add complexity we don't need yet.

Consequences

Positive

  • All decisions prioritize simplicity over complexity
  • No unnecessary defensive programming for "impossible" scenarios
  • Clear, consistent data structures across phases
  • Minimal changes to existing Phase 1/2 code
  • Appropriate logging levels for operational visibility

Negative

  • Phase 2 needs a minor update to include PKCE fields and 'used' flag
  • No automatic token cleanup in v1.0.0 (acceptable for small scale)
  • Token hash collisions cause hard failures (acceptable given probability)

Technical Debt Created

  • TOKEN_CLEANUP automation deferred to v1.1.0
  • CodeStore dict handling could be more elegant (but works fine)

Implementation Actions Required

  1. Update Phase 2 authorization endpoint to include all fields in code metadata (code_challenge, code_challenge_method, used)
  2. Modify CodeStore in Phase 1 to handle dict values with JSON serialization
  3. Implement Phase 3 with these clarifications
  4. Document the manual token cleanup process for operators

Sign-off

Architect: Claude (Architect Agent) Date: 2025-11-20 Status: Approved for implementation