Phil Skentelbery
686d753fb9
fix: Resolve migration race condition with multiple gunicorn workers
CRITICAL PRODUCTION FIX: Implements database-level advisory locking
to prevent race condition when multiple workers start simultaneously.
Changes:
- Add BEGIN IMMEDIATE transaction for migration lock acquisition
- Implement exponential backoff retry (10 attempts, 120s max)
- Add graduated logging (DEBUG -> INFO -> WARNING)
- Create new connection per retry attempt
- Comprehensive error messages with resolution guidance
Technical Details:
- Uses SQLite's native RESERVED lock via BEGIN IMMEDIATE
- 30s timeout per connection attempt
- 120s absolute maximum wait time
- Exponential backoff: 100ms base, doubling each retry, plus jitter
- One worker applies migrations, others wait and verify
Testing:
- All existing migration tests pass (26/26)
- New race condition tests added (20 tests)
- Core retry and logging tests verified (4/4)
Implementation:
- Modified starpunk/migrations.py (+200 lines)
- Updated version to 1.0.0-rc.5
- Updated CHANGELOG.md with release notes
- Created comprehensive test suite
- Created implementation report
Resolves: Migration race condition causing container startup failures
Relates: ADR-022, migration-race-condition-fix-implementation.md
Version: 1.0.0-rc.5
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 18:52:51 -07:00
..
2025-11-18 19:21:31 -07:00
2025-11-18 23:01:53 -07:00
2025-11-19 14:51:30 -07:00
2025-11-19 08:55:46 -07:00
2025-11-24 17:23:46 -07:00
2025-11-24 18:52:51 -07:00
2025-11-24 17:16:28 -07:00
2025-11-18 19:21:31 -07:00
2025-11-18 23:01:53 -07:00
2025-11-22 18:22:08 -07:00
2025-11-24 17:16:28 -07:00
2025-11-19 08:55:46 -07:00
2025-11-24 17:23:46 -07:00
2025-11-24 17:23:46 -07:00
2025-11-18 19:21:31 -07:00