PKCE was incorrectly required in the /authorize endpoint, contradicting ADR-003 which defers PKCE to v1.1.0. Changes: - PKCE parameters are now optional in /authorize - If code_challenge provided, validates method is S256 - Defaults to S256 if method not specified - Logs when clients don't use PKCE for monitoring - Updated tests for optional PKCE behavior This fixes authentication for clients that don't implement PKCE. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
233 lines
8.4 KiB
Markdown
233 lines
8.4 KiB
Markdown
# v1.1.0 Release Plan: Security & Production Hardening
|
|
|
|
**Status**: Planning
|
|
**Target Release**: Q1 2026
|
|
**Duration**: 3-4 weeks (12-18 days)
|
|
**Theme**: Mixed approach - 30% technical debt cleanup, 70% new features
|
|
**Compatibility**: Backward compatible with v1.0.0, maintains single-process simplicity
|
|
|
|
## Goals
|
|
|
|
1. Address critical technical debt that could compound
|
|
2. Implement security best practices (PKCE, token revocation, refresh tokens)
|
|
3. Add production observability (Prometheus metrics)
|
|
4. Maintain backward compatibility with v1.0.0
|
|
5. Keep deployment simple (no Redis requirement)
|
|
|
|
## Success Criteria
|
|
|
|
- All technical debt items TD-001, TD-002, TD-003 resolved
|
|
- PKCE support implemented per ADR-003
|
|
- Token revocation and refresh functional
|
|
- Prometheus metrics available
|
|
- All tests passing with >90% coverage
|
|
- Zero breaking changes for v1.0.0 clients
|
|
- Documentation complete with migration guide
|
|
|
|
## Features & Technical Debt
|
|
|
|
### Phase 1: Technical Debt Cleanup (30% - 4-5 days)
|
|
|
|
#### TD-001: FastAPI Lifespan Migration
|
|
- **Effort**: <1 day
|
|
- **Priority**: P2
|
|
- **Type**: Technical Debt
|
|
- **Description**: Replace deprecated `@app.on_event()` decorators with modern lifespan handlers
|
|
- **Rationale**: Current implementation uses deprecated API that will break in future FastAPI versions
|
|
- **Impact**: Removes deprecation warnings, future-proofs codebase
|
|
- **Files Affected**: `src/gondulf/main.py`
|
|
|
|
#### TD-002: Alembic Database Migration System
|
|
- **Effort**: 1-2 days
|
|
- **Priority**: P2
|
|
- **Type**: Technical Debt
|
|
- **Description**: Replace custom migration system with Alembic
|
|
- **Rationale**: Current migrations are one-way only, no rollback capability
|
|
- **Impact**: Production deployment safety, standard migration tooling
|
|
- **Deliverables**:
|
|
- Alembic configuration
|
|
- Convert existing migrations to Alembic format
|
|
- Migration rollback capability
|
|
- Updated deployment documentation
|
|
|
|
#### TD-003: Async Email Support
|
|
- **Effort**: 1-2 days
|
|
- **Priority**: P2
|
|
- **Type**: Technical Debt
|
|
- **Description**: Replace synchronous SMTP with aiosmtplib
|
|
- **Rationale**: Current SMTP blocks request thread (1-5 sec delays during email sending)
|
|
- **Impact**: Improved UX, non-blocking email operations
|
|
- **Files Affected**: `src/gondulf/services/email_service.py`
|
|
|
|
### Phase 2: Security Features (40% - 5-7 days)
|
|
|
|
#### PKCE Support (RFC 7636)
|
|
- **Effort**: 1-2 days
|
|
- **Priority**: P1
|
|
- **Type**: Feature
|
|
- **ADR**: ADR-003 explicitly defers PKCE to v1.1.0
|
|
- **Description**: Implement Proof Key for Code Exchange
|
|
- **Rationale**: OAuth 2.0 security best practice, protects against authorization code interception
|
|
- **Backward Compatible**: Yes (PKCE is optional, non-PKCE clients continue working)
|
|
- **Implementation**:
|
|
- Accept `code_challenge` and `code_challenge_method` parameters in /authorize
|
|
- Store code challenge with authorization code
|
|
- Accept `code_verifier` parameter in /token endpoint
|
|
- Validate SHA256(code_verifier) matches stored code_challenge
|
|
- Update metadata endpoint to advertise PKCE support
|
|
- **Testing**: Comprehensive tests for S256 method, optional PKCE, validation failures
|
|
|
|
#### Token Revocation Endpoint (RFC 7009)
|
|
- **Effort**: 1-2 days
|
|
- **Priority**: P1
|
|
- **Type**: Feature
|
|
- **Description**: POST /token/revoke endpoint for revoking access and refresh tokens
|
|
- **Rationale**: Security improvement - allows clients to invalidate tokens
|
|
- **Backward Compatible**: Yes (new endpoint)
|
|
- **Implementation**:
|
|
- POST /token/revoke endpoint
|
|
- Accept `token` and `token_type_hint` parameters
|
|
- Mark tokens as revoked in database
|
|
- Update token verification to check revocation status
|
|
- **Testing**: Revoke access tokens, refresh tokens, invalid tokens, already-revoked tokens
|
|
|
|
#### Token Refresh (RFC 6749 Section 6)
|
|
- **Effort**: 3-5 days
|
|
- **Priority**: P1
|
|
- **Type**: Feature
|
|
- **Description**: Implement refresh token grant type for long-lived sessions
|
|
- **Rationale**: Standard OAuth 2.0 feature, enables long-lived sessions without re-authentication
|
|
- **Backward Compatible**: Yes (optional feature, clients must opt-in)
|
|
- **Implementation**:
|
|
- Generate refresh tokens alongside access tokens
|
|
- Store refresh tokens in database with expiration (30-90 days)
|
|
- Accept `grant_type=refresh_token` in /token endpoint
|
|
- Implement refresh token rotation (security best practice)
|
|
- Update metadata endpoint
|
|
- **Testing**: Token refresh flow, rotation, expiration, revocation
|
|
|
|
### Phase 3: Operational Features (30% - 3-4 days)
|
|
|
|
#### Prometheus Metrics Endpoint
|
|
- **Effort**: 1-2 days
|
|
- **Priority**: P2
|
|
- **Type**: Feature
|
|
- **Description**: /metrics endpoint exposing Prometheus-compatible metrics
|
|
- **Rationale**: Production observability, monitoring, alerting
|
|
- **Backward Compatible**: Yes (new endpoint)
|
|
- **Metrics**:
|
|
- HTTP request counters (by endpoint, method, status code)
|
|
- Response time histograms
|
|
- Active authorization sessions
|
|
- Token issuance/verification counters
|
|
- Error rates by type
|
|
- Database connection pool stats
|
|
- **Implementation**: Use prometheus_client library
|
|
- **Testing**: Metrics accuracy, format compliance
|
|
|
|
#### Testing & Documentation
|
|
- **Effort**: 2-3 days
|
|
- **Priority**: P1
|
|
- **Type**: Quality Assurance
|
|
- **Deliverables**:
|
|
- Unit tests for all new features (>90% coverage maintained)
|
|
- Integration tests for PKCE, revocation, refresh flows
|
|
- Update API documentation
|
|
- Migration guide: v1.0.0 → v1.1.0
|
|
- Update deployment documentation
|
|
- Changelog for v1.1.0
|
|
|
|
## Deferred to Future Releases
|
|
|
|
### v1.2.0 Candidates
|
|
|
|
- **Rate Limiting** - Requires Redis, breaks single-process simplicity
|
|
- Defer until scaling beyond single process is needed
|
|
- Will require Redis dependency decision
|
|
|
|
- **Redis Session Storage** (TD-004) - Not critical yet
|
|
- Current in-memory storage works for single process
|
|
- Codes are short-lived (10-15 min), minimal impact from restarts
|
|
|
|
- **Admin Dashboard** - Lower priority operational feature
|
|
|
|
- **PostgreSQL Support** - SQLite sufficient for target scale
|
|
|
|
### v2.0.0 Considerations
|
|
|
|
Not committing to v2.0.0 scope yet. Will evaluate after v1.1.0 and v1.2.0 to determine if breaking changes are needed.
|
|
|
|
Potential v2.0.0 candidates (breaking changes):
|
|
- Scope-based authorization (full OAuth 2.0 authz server)
|
|
- JWT tokens (instead of opaque tokens)
|
|
- Required PKCE (breaking for non-PKCE clients)
|
|
|
|
## Technical Debt Status
|
|
|
|
After v1.1.0, remaining technical debt:
|
|
- **TD-004: Redis for Session Storage** (deferred to when scaling needed)
|
|
|
|
All other critical technical debt will be resolved.
|
|
|
|
## Dependencies
|
|
|
|
### External Dependencies Added
|
|
- `aiosmtplib` - Async SMTP client
|
|
- `alembic` - Database migration tool
|
|
- `prometheus_client` - Metrics library
|
|
|
|
### Breaking Changes
|
|
**None** - v1.1.0 is fully backward compatible with v1.0.0
|
|
|
|
## Release Checklist
|
|
|
|
- [ ] Phase 1: Technical debt cleanup complete
|
|
- [ ] TD-001: FastAPI lifespan migration
|
|
- [ ] TD-002: Alembic integration
|
|
- [ ] TD-003: Async email support
|
|
- [ ] Phase 2: Security features complete
|
|
- [ ] PKCE support implemented and tested
|
|
- [ ] Token revocation endpoint functional
|
|
- [ ] Token refresh flow working
|
|
- [ ] Phase 3: Operational features complete
|
|
- [ ] Prometheus metrics endpoint
|
|
- [ ] Documentation updated
|
|
- [ ] Migration guide written
|
|
- [ ] All tests passing (>90% coverage)
|
|
- [ ] Security audit passed
|
|
- [ ] Real client testing (PKCE-enabled clients)
|
|
- [ ] Performance testing (async email, metrics overhead)
|
|
- [ ] Docker image built and tested
|
|
- [ ] Release notes written
|
|
- [ ] Tag v1.1.0 and push to registry
|
|
|
|
## Timeline
|
|
|
|
**Week 1**: Technical debt cleanup (TD-001, TD-002, TD-003)
|
|
**Week 2**: PKCE support + Token revocation
|
|
**Week 3**: Token refresh implementation
|
|
**Week 4**: Prometheus metrics + Testing + Documentation
|
|
|
|
## Risk Assessment
|
|
|
|
**Low Risk** - All changes are additive and backward compatible
|
|
|
|
Potential risks:
|
|
- Alembic migration conversion complexity (mitigation: thorough testing)
|
|
- PKCE validation edge cases (mitigation: comprehensive test suite)
|
|
- Refresh token security (mitigation: implement rotation best practices)
|
|
|
|
## Version Compatibility
|
|
|
|
- **v1.0.0 clients**: Fully compatible, no changes required
|
|
- **New features**: Opt-in (PKCE, refresh tokens)
|
|
- **Deployment**: Drop-in replacement, run migrations, no config changes required (unless using new features)
|
|
|
|
## References
|
|
|
|
- ADR-003: PKCE Deferred to v1.1.0
|
|
- RFC 7636: Proof Key for Code Exchange (PKCE)
|
|
- RFC 7009: OAuth 2.0 Token Revocation
|
|
- RFC 6749: OAuth 2.0 Framework (Refresh Tokens)
|
|
- Technical Debt Backlog: `/docs/roadmap/backlog.md`
|