Files
Gondulf/docs/roadmap/v1.1.0.md
Phil Skentelbery 404d723ef8 fix(auth): make PKCE optional per ADR-003
PKCE was incorrectly required in the /authorize endpoint,
contradicting ADR-003 which defers PKCE to v1.1.0.

Changes:
- PKCE parameters are now optional in /authorize
- If code_challenge provided, validates method is S256
- Defaults to S256 if method not specified
- Logs when clients don't use PKCE for monitoring
- Updated tests for optional PKCE behavior

This fixes authentication for clients that don't implement PKCE.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-17 15:23:44 -07:00

233 lines
8.4 KiB
Markdown

# v1.1.0 Release Plan: Security & Production Hardening
**Status**: Planning
**Target Release**: Q1 2026
**Duration**: 3-4 weeks (12-18 days)
**Theme**: Mixed approach - 30% technical debt cleanup, 70% new features
**Compatibility**: Backward compatible with v1.0.0, maintains single-process simplicity
## Goals
1. Address critical technical debt that could compound
2. Implement security best practices (PKCE, token revocation, refresh tokens)
3. Add production observability (Prometheus metrics)
4. Maintain backward compatibility with v1.0.0
5. Keep deployment simple (no Redis requirement)
## Success Criteria
- All technical debt items TD-001, TD-002, TD-003 resolved
- PKCE support implemented per ADR-003
- Token revocation and refresh functional
- Prometheus metrics available
- All tests passing with >90% coverage
- Zero breaking changes for v1.0.0 clients
- Documentation complete with migration guide
## Features & Technical Debt
### Phase 1: Technical Debt Cleanup (30% - 4-5 days)
#### TD-001: FastAPI Lifespan Migration
- **Effort**: <1 day
- **Priority**: P2
- **Type**: Technical Debt
- **Description**: Replace deprecated `@app.on_event()` decorators with modern lifespan handlers
- **Rationale**: Current implementation uses deprecated API that will break in future FastAPI versions
- **Impact**: Removes deprecation warnings, future-proofs codebase
- **Files Affected**: `src/gondulf/main.py`
#### TD-002: Alembic Database Migration System
- **Effort**: 1-2 days
- **Priority**: P2
- **Type**: Technical Debt
- **Description**: Replace custom migration system with Alembic
- **Rationale**: Current migrations are one-way only, no rollback capability
- **Impact**: Production deployment safety, standard migration tooling
- **Deliverables**:
- Alembic configuration
- Convert existing migrations to Alembic format
- Migration rollback capability
- Updated deployment documentation
#### TD-003: Async Email Support
- **Effort**: 1-2 days
- **Priority**: P2
- **Type**: Technical Debt
- **Description**: Replace synchronous SMTP with aiosmtplib
- **Rationale**: Current SMTP blocks request thread (1-5 sec delays during email sending)
- **Impact**: Improved UX, non-blocking email operations
- **Files Affected**: `src/gondulf/services/email_service.py`
### Phase 2: Security Features (40% - 5-7 days)
#### PKCE Support (RFC 7636)
- **Effort**: 1-2 days
- **Priority**: P1
- **Type**: Feature
- **ADR**: ADR-003 explicitly defers PKCE to v1.1.0
- **Description**: Implement Proof Key for Code Exchange
- **Rationale**: OAuth 2.0 security best practice, protects against authorization code interception
- **Backward Compatible**: Yes (PKCE is optional, non-PKCE clients continue working)
- **Implementation**:
- Accept `code_challenge` and `code_challenge_method` parameters in /authorize
- Store code challenge with authorization code
- Accept `code_verifier` parameter in /token endpoint
- Validate SHA256(code_verifier) matches stored code_challenge
- Update metadata endpoint to advertise PKCE support
- **Testing**: Comprehensive tests for S256 method, optional PKCE, validation failures
#### Token Revocation Endpoint (RFC 7009)
- **Effort**: 1-2 days
- **Priority**: P1
- **Type**: Feature
- **Description**: POST /token/revoke endpoint for revoking access and refresh tokens
- **Rationale**: Security improvement - allows clients to invalidate tokens
- **Backward Compatible**: Yes (new endpoint)
- **Implementation**:
- POST /token/revoke endpoint
- Accept `token` and `token_type_hint` parameters
- Mark tokens as revoked in database
- Update token verification to check revocation status
- **Testing**: Revoke access tokens, refresh tokens, invalid tokens, already-revoked tokens
#### Token Refresh (RFC 6749 Section 6)
- **Effort**: 3-5 days
- **Priority**: P1
- **Type**: Feature
- **Description**: Implement refresh token grant type for long-lived sessions
- **Rationale**: Standard OAuth 2.0 feature, enables long-lived sessions without re-authentication
- **Backward Compatible**: Yes (optional feature, clients must opt-in)
- **Implementation**:
- Generate refresh tokens alongside access tokens
- Store refresh tokens in database with expiration (30-90 days)
- Accept `grant_type=refresh_token` in /token endpoint
- Implement refresh token rotation (security best practice)
- Update metadata endpoint
- **Testing**: Token refresh flow, rotation, expiration, revocation
### Phase 3: Operational Features (30% - 3-4 days)
#### Prometheus Metrics Endpoint
- **Effort**: 1-2 days
- **Priority**: P2
- **Type**: Feature
- **Description**: /metrics endpoint exposing Prometheus-compatible metrics
- **Rationale**: Production observability, monitoring, alerting
- **Backward Compatible**: Yes (new endpoint)
- **Metrics**:
- HTTP request counters (by endpoint, method, status code)
- Response time histograms
- Active authorization sessions
- Token issuance/verification counters
- Error rates by type
- Database connection pool stats
- **Implementation**: Use prometheus_client library
- **Testing**: Metrics accuracy, format compliance
#### Testing & Documentation
- **Effort**: 2-3 days
- **Priority**: P1
- **Type**: Quality Assurance
- **Deliverables**:
- Unit tests for all new features (>90% coverage maintained)
- Integration tests for PKCE, revocation, refresh flows
- Update API documentation
- Migration guide: v1.0.0 → v1.1.0
- Update deployment documentation
- Changelog for v1.1.0
## Deferred to Future Releases
### v1.2.0 Candidates
- **Rate Limiting** - Requires Redis, breaks single-process simplicity
- Defer until scaling beyond single process is needed
- Will require Redis dependency decision
- **Redis Session Storage** (TD-004) - Not critical yet
- Current in-memory storage works for single process
- Codes are short-lived (10-15 min), minimal impact from restarts
- **Admin Dashboard** - Lower priority operational feature
- **PostgreSQL Support** - SQLite sufficient for target scale
### v2.0.0 Considerations
Not committing to v2.0.0 scope yet. Will evaluate after v1.1.0 and v1.2.0 to determine if breaking changes are needed.
Potential v2.0.0 candidates (breaking changes):
- Scope-based authorization (full OAuth 2.0 authz server)
- JWT tokens (instead of opaque tokens)
- Required PKCE (breaking for non-PKCE clients)
## Technical Debt Status
After v1.1.0, remaining technical debt:
- **TD-004: Redis for Session Storage** (deferred to when scaling needed)
All other critical technical debt will be resolved.
## Dependencies
### External Dependencies Added
- `aiosmtplib` - Async SMTP client
- `alembic` - Database migration tool
- `prometheus_client` - Metrics library
### Breaking Changes
**None** - v1.1.0 is fully backward compatible with v1.0.0
## Release Checklist
- [ ] Phase 1: Technical debt cleanup complete
- [ ] TD-001: FastAPI lifespan migration
- [ ] TD-002: Alembic integration
- [ ] TD-003: Async email support
- [ ] Phase 2: Security features complete
- [ ] PKCE support implemented and tested
- [ ] Token revocation endpoint functional
- [ ] Token refresh flow working
- [ ] Phase 3: Operational features complete
- [ ] Prometheus metrics endpoint
- [ ] Documentation updated
- [ ] Migration guide written
- [ ] All tests passing (>90% coverage)
- [ ] Security audit passed
- [ ] Real client testing (PKCE-enabled clients)
- [ ] Performance testing (async email, metrics overhead)
- [ ] Docker image built and tested
- [ ] Release notes written
- [ ] Tag v1.1.0 and push to registry
## Timeline
**Week 1**: Technical debt cleanup (TD-001, TD-002, TD-003)
**Week 2**: PKCE support + Token revocation
**Week 3**: Token refresh implementation
**Week 4**: Prometheus metrics + Testing + Documentation
## Risk Assessment
**Low Risk** - All changes are additive and backward compatible
Potential risks:
- Alembic migration conversion complexity (mitigation: thorough testing)
- PKCE validation edge cases (mitigation: comprehensive test suite)
- Refresh token security (mitigation: implement rotation best practices)
## Version Compatibility
- **v1.0.0 clients**: Fully compatible, no changes required
- **New features**: Opt-in (PKCE, refresh tokens)
- **Deployment**: Drop-in replacement, run migrations, no config changes required (unless using new features)
## References
- ADR-003: PKCE Deferred to v1.1.0
- RFC 7636: Proof Key for Code Exchange (PKCE)
- RFC 7009: OAuth 2.0 Token Revocation
- RFC 6749: OAuth 2.0 Framework (Refresh Tokens)
- Technical Debt Backlog: `/docs/roadmap/backlog.md`