feat(core): implement Phase 1 foundation infrastructure

Implements Phase 1 Foundation with all core services: Core Components: - Configuration management with GONDULF_ environment variables - Database layer with SQLAlchemy and migration system - In-memory code storage with TTL support - Email service with SMTP and TLS support (STARTTLS + implicit TLS) - DNS service with TXT record verification - Structured logging with Python standard logging - FastAPI application with health check endpoint Database Schema: - authorization_codes table for OAuth 2.0 authorization codes - domains table for domain verification - migrations table for tracking schema versions - Simple sequential migration system (001_initial_schema.sql) Configuration: - Environment-based configuration with validation - .env.example template with all GONDULF_ variables - Fail-fast validation on startup - Sensible defaults for optional settings Testing: - 96 comprehensive tests (77 unit, 5 integration) - 94.16% code coverage (exceeds 80% requirement) - All tests passing - Test coverage includes: - Configuration loading and validation - Database migrations and health checks - In-memory storage with expiration - Email service (STARTTLS, implicit TLS, authentication) - DNS service (TXT records, domain verification) - Health check endpoint integration Documentation: - Implementation report with test results - Phase 1 clarifications document - ADRs for key decisions (config, database, email, logging) Technical Details: - Python 3.10+ with type hints - SQLite with configurable database URL - System DNS with public DNS fallback - Port-based TLS detection (465=SSL, 587=STARTTLS) - Lazy configuration loading for testability Exit Criteria Met: ✓ All foundation services implemented ✓ Application starts without errors ✓ Health check endpoint operational ✓ Database migrations working ✓ Test coverage exceeds 80% ✓ All tests passing Ready for Architect review and Phase 2 development. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-20 12:21:42 -07:00
parent 7255867fde
commit bebd47955f
39 changed files with 8134 additions and 13 deletions
--- a/docs/decisions/ADR-003-pkce-deferred-to-v1-1-0.md
+++ b/docs/decisions/ADR-003-pkce-deferred-to-v1-1-0.md
@@ -0,0 +1,292 @@
+# ADR-003: PKCE Support Deferred to v1.1.0
+
+Date: 2025-11-20
+
+## Status
+Accepted
+
+## Context
+
+PKCE (Proof Key for Code Exchange, RFC 7636) is a security extension to OAuth 2.0 that protects against authorization code interception attacks. It works by having the client generate a random secret, send a hash of it during the authorization request, and prove knowledge of the original secret during token exchange.
+
+### PKCE Security Benefits
+1. **Code Interception Protection**: Even if an attacker intercepts the authorization code, they cannot exchange it for a token without the code_verifier
+2. **Public Client Security**: Essential for native apps and SPAs where client secrets cannot be stored securely
+3. **Best Practice**: OAuth 2.0 Security Best Practices (draft-ietf-oauth-security-topics) recommends PKCE for all clients
+
+### W3C IndieAuth Specification
+The current W3C IndieAuth specification (published January 2018) does not mention PKCE. However, PKCE has become a standard security measure in modern OAuth 2.0 implementations since then.
+
+### v1.0.0 Constraints
+For the MVP release, we face a simplicity vs. security trade-off:
+- Target users: 10s of users initially
+- Deployment: Single-process Docker container
+- Timeline: 6-8 weeks to v1.0.0
+- Focus: Prove core authentication functionality
+
+### PKCE Implementation Effort
+**Estimated effort**: 1-2 days
+
+**Required changes**:
+1. Accept `code_challenge` and `code_challenge_method` parameters in /authorize endpoint
+2. Store code challenge with authorization code
+3. Accept `code_verifier` parameter in /token endpoint
+4. Validate code_verifier hashes to stored challenge using S256 method
+5. Update metadata endpoint to advertise PKCE support
+6. Add comprehensive tests
+
+**Complexity**:
+- Low technical complexity (straightforward hashing and comparison)
+- Well-documented in RFC 7636
+- Standard library support (hashlib for SHA-256)
+
+## Decision
+
+**PKCE support is deferred to v1.1.0 and will NOT be included in v1.0.0.**
+
+### Rationale
+
+**Simplicity Over Complexity**:
+- v1.0.0 is an MVP focused on proving core authentication functionality
+- Every additional feature increases risk and development time
+- PKCE adds security but is not required for the W3C IndieAuth specification compliance
+- Deferring PKCE reduces v1.0.0 scope without compromising compliance
+
+**Acceptable Risk for MVP**:
+- Target deployment: Small scale (10s of users), controlled environment
+- Mitigation: HTTPS enforcement + short code lifetime (10 minutes) + single-use codes
+- Risk window: 10-minute authorization code lifetime only
+- Attack complexity: Requires MITM during specific 10-minute window despite HTTPS
+
+**Clear Upgrade Path**:
+- PKCE can be added in v1.1.0 without breaking changes
+- Implementation is well-understood and straightforward
+- Clients that don't use PKCE will continue working (backward compatible)
+
+**Development Focus**:
+- Free up 1-2 days of effort for core functionality
+- Reduce testing surface area for MVP
+- Simplify initial security review
+- Gather real-world usage data before adding complexity
+
+## Consequences
+
+### Positive Consequences
+
+1. **Faster Time to Market**: v1.0.0 ships 1-2 days earlier
+2. **Reduced Complexity**: Fewer parameters to validate, fewer edge cases
+3. **Simpler Testing**: Smaller test surface area for initial release
+4. **Focus**: Development effort concentrated on core authentication flow
+5. **Learning Opportunity**: Real-world usage informs PKCE implementation in v1.1.0
+
+### Negative Consequences
+
+1. **Slightly Reduced Security**: Authorization codes vulnerable to interception (mitigated by HTTPS)
+2. **Not Best Practice**: Modern OAuth 2.0 guidance recommends PKCE for all flows
+3. **Client Compatibility**: Clients requiring PKCE cannot use v1.0.0 (upgrade to v1.1.0)
+4. **Perception**: Security-conscious users may view absence as weakness
+
+### Mitigation Strategies
+
+**For v1.0.0 (Without PKCE)**:
+
+1. **Enforce HTTPS**: Strictly enforce HTTPS in production (mitigates interception)
+   ```python
+   if not DEBUG and request.url.scheme != 'https':
+       raise HTTPException(status_code=400, detail="HTTPS required")
+   ```
+
+2. **Short Code Lifetime**: 10-minute maximum (per W3C spec)
+   ```python
+   CODE_EXPIRATION = timedelta(minutes=10)  # Minimize attack window
+   ```
+
+3. **Single-Use Codes**: Immediately invalidate after use (detect replay attacks)
+   ```python
+   if code_data.get('used'):
+       logger.error(f"Code replay attack detected: {code[:8]}...")
+       raise HTTPException(status_code=400, detail="invalid_grant")
+   ```
+
+4. **Documentation**: Clearly document PKCE absence and planned support
+   - README security section
+   - Release notes
+   - Roadmap (v1.1.0 feature)
+
+5. **Logging**: Monitor for potential code interception attempts
+   ```python
+   if code_expired:
+       logger.warning(f"Expired code presented: {code[:8]}... (potential attack)")
+   ```
+
+**For v1.1.0 (Adding PKCE)**:
+
+1. **Backward Compatibility**: PKCE optional, not required
+   - Clients without PKCE continue working
+   - Clients with PKCE get enhanced security
+   - Gradual migration path
+
+2. **Client Detection**: Detect PKCE capability and encourage usage
+   ```python
+   if code_challenge is None:
+       logger.info(f"Client {client_id} not using PKCE (consider upgrading)")
+   ```
+
+3. **Future Enforcement**: Option to require PKCE in configuration
+   ```python
+   if config.REQUIRE_PKCE and not code_challenge:
+       raise HTTPException(status_code=400, detail="PKCE required")
+   ```
+
+### Implementation Plan for v1.1.0
+
+**Effort**: 1-2 days
+
+**Changes Required**:
+
+1. **Authorization Endpoint** (`/authorize`):
+   ```python
+   class AuthorizeRequest(BaseModel):
+       # ... existing fields ...
+       code_challenge: Optional[str] = None
+       code_challenge_method: Optional[Literal["S256"]] = None
+
+   # Validation
+   if code_challenge and code_challenge_method != "S256":
+       raise HTTPException(400, "Only S256 challenge method supported")
+
+   # Store challenge with code
+   code_data = {
+       # ... existing data ...
+       "code_challenge": code_challenge,
+       "code_challenge_method": code_challenge_method
+   }
+   ```
+
+2. **Token Endpoint** (`/token`):
+   ```python
+   class TokenRequest(BaseModel):
+       # ... existing fields ...
+       code_verifier: Optional[str] = None
+
+   # Validate PKCE
+   if code_data.get('code_challenge'):
+       if not code_verifier:
+           raise HTTPException(400, "code_verifier required")
+
+       # Verify S256(code_verifier) == code_challenge
+       import hashlib
+       import base64
+       computed = base64.urlsafe_b64encode(
+           hashlib.sha256(code_verifier.encode()).digest()
+       ).decode().rstrip('=')
+
+       if not secrets.compare_digest(computed, code_data['code_challenge']):
+           raise HTTPException(400, "Invalid code_verifier")
+   ```
+
+3. **Metadata Endpoint**:
+   ```json
+   {
+       "code_challenge_methods_supported": ["S256"]
+   }
+   ```
+
+4. **Tests**:
+   - PKCE flow success cases
+   - Invalid code_verifier rejection
+   - Missing code_verifier when challenge present
+   - Backward compatibility (no PKCE)
+
+### Risk Assessment
+
+**Attack Scenario Without PKCE**:
+1. Attacker performs MITM attack on authorization redirect (despite HTTPS)
+2. Attacker intercepts authorization code
+3. Attacker exchanges code for token (within 10-minute window)
+4. Attacker uses token to impersonate user
+
+**Likelihood**: Very Low
+- Requires MITM capability (difficult with proper TLS)
+- Requires attack during specific 10-minute window
+- Requires client_id and redirect_uri knowledge
+
+**Impact**: High
+- Complete account impersonation
+- Access to user's identity
+
+**Risk Level**: **Low** (Very Low likelihood × High impact = Low overall risk)
+
+**Acceptable for MVP?**: Yes
+- Controlled deployment (small user base)
+- Proper TLS mitigates primary attack vector
+- Short code lifetime limits exposure window
+- Clear upgrade path to full PKCE in v1.1.0
+
+### Monitoring and Review
+
+**v1.0.0 Deployment**:
+- Monitor logs for expired code presentations (potential interception attempts)
+- Track time between code generation and redemption
+- Document any security concerns from real-world usage
+
+**v1.1.0 Planning**:
+- Review security logs from v1.0.0 deployment
+- Prioritize PKCE based on actual risk observed
+- Implement with lessons learned from v1.0.0
+
+## References
+
+- RFC 7636 - PKCE: https://datatracker.ietf.org/doc/html/rfc7636
+- OAuth 2.0 Security Best Practices: https://datatracker.ietf.org/doc/html/draft-ietf-oauth-security-topics
+- W3C IndieAuth Specification: https://www.w3.org/TR/indieauth/
+- OWASP OAuth 2.0 Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/OAuth2_Cheat_Sheet.html
+
+## Alternatives Considered
+
+### Alternative 1: Implement PKCE in v1.0.0
+
+**Pros**:
+- Better security from day one
+- Follows modern OAuth 2.0 best practices
+- No perceived security weakness
+
+**Cons**:
+- Adds 1-2 days to development timeline
+- Increases testing surface area
+- More complexity for MVP
+- Not required for W3C spec compliance
+
+**Rejected**: Violates simplicity principle for MVP.
+
+### Alternative 2: Make PKCE Required (not optional) in v1.1.0
+
+**Pros**:
+- Forces clients to adopt best security practices
+- Simpler server logic (no conditional handling)
+
+**Cons**:
+- Breaking change for clients
+- Not backward compatible
+- W3C spec doesn't require PKCE
+
+**Rejected**: Breaks compatibility, not required by spec.
+
+### Alternative 3: Never Implement PKCE
+
+**Pros**:
+- Permanent simplicity
+- No additional development
+
+**Cons**:
+- Permanent security weakness
+- Not following industry best practices
+- May limit adoption by security-conscious users
+
+**Rejected**: Security should improve over time, not stagnate.
+
+## Decision History
+
+- 2025-11-20: Proposed (Architect)
+- 2025-11-20: Accepted (Architect)
+- TBD: Implementation in v1.1.0