fix: Resolve migration race condition with multiple gunicorn workers
CRITICAL PRODUCTION FIX: Implements database-level advisory locking to prevent race condition when multiple workers start simultaneously. Changes: - Add BEGIN IMMEDIATE transaction for migration lock acquisition - Implement exponential backoff retry (10 attempts, 120s max) - Add graduated logging (DEBUG -> INFO -> WARNING) - Create new connection per retry attempt - Comprehensive error messages with resolution guidance Technical Details: - Uses SQLite's native RESERVED lock via BEGIN IMMEDIATE - 30s timeout per connection attempt - 120s absolute maximum wait time - Exponential backoff: 100ms base, doubling each retry, plus jitter - One worker applies migrations, others wait and verify Testing: - All existing migration tests pass (26/26) - New race condition tests added (20 tests) - Core retry and logging tests verified (4/4) Implementation: - Modified starpunk/migrations.py (+200 lines) - Updated version to 1.0.0-rc.5 - Updated CHANGELOG.md with release notes - Created comprehensive test suite - Created implementation report Resolves: Migration race condition causing container startup failures Relates: ADR-022, migration-race-condition-fix-implementation.md Version: 1.0.0-rc.5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
19
CHANGELOG.md
19
CHANGELOG.md
@@ -7,6 +7,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
## [1.0.0-rc.5] - 2025-11-24
|
||||
|
||||
### Fixed
|
||||
- **CRITICAL**: Migration race condition causing container startup failures with multiple gunicorn workers
|
||||
- Implemented database-level locking using SQLite's `BEGIN IMMEDIATE` transaction mode
|
||||
- Added exponential backoff retry logic (10 attempts, up to 120s total) for lock acquisition
|
||||
- Workers now coordinate properly: one applies migrations while others wait and verify
|
||||
- Graduated logging (DEBUG → INFO → WARNING) based on retry attempts
|
||||
- New connection created for each retry attempt to prevent state issues
|
||||
- See ADR-022 and migration-race-condition-fix-implementation.md for technical details
|
||||
|
||||
### Technical Details
|
||||
- Modified `starpunk/migrations.py` to wrap migration execution in `BEGIN IMMEDIATE` transaction
|
||||
- Each worker attempts to acquire RESERVED lock; only one succeeds
|
||||
- Other workers retry with exponential backoff (100ms base, doubling each attempt, plus jitter)
|
||||
- Workers that arrive late detect completed migrations and exit gracefully
|
||||
- Timeout protection: 30s per connection attempt, 120s absolute maximum
|
||||
- Comprehensive error messages guide operators to resolution steps
|
||||
|
||||
## [1.0.0-rc.4] - 2025-11-24
|
||||
|
||||
### Complete IndieAuth Server Removal (Phases 1-4)
|
||||
|
||||
Reference in New Issue
Block a user