# 0005. Database Migrations with Alembic Date: 2025-12-22 ## Status Accepted ## Context Sneaky Klaus uses SQLite as its database (see ADR-0001). As the application evolves, the database schema needs to change to support new features. There are two primary approaches to managing database schema: 1. **db.create_all()**: SQLAlchemy's create_all() method creates tables based on current model definitions. Simple but has critical limitations: - Cannot modify existing tables (add/remove columns, change types) - Cannot migrate data during schema changes - No version tracking or rollback capability - Unsafe for databases with existing data 2. **Schema Migrations**: Tools like Alembic track schema changes as versioned migration files: - Supports incremental schema changes (add columns, modify constraints, etc.) - Enables data migrations during schema evolution - Provides version tracking and rollback capability - Safe for production databases with existing data Key considerations: - Phase 1 (v0.1.0) established Admin and Exchange models - Phase 2 (v0.2.0) adds Participant and MagicToken models - Future phases will continue evolving the schema - Self-hosted deployments may have persistent data from day one - Users may skip versions or upgrade incrementally The question is: when should we start using proper database migrations? ## Decision We will use **Alembic for all database schema changes starting from Phase 2 (v0.2.0)** onward. Specifically: 1. **Alembic is already configured** in the codebase (alembic.ini, migrations/ directory) 2. **An initial migration already exists** for Admin and Exchange models (created in Phase 1) 3. **All new models and schema changes** will be managed through Alembic migrations 4. **db.create_all() must not be used** for schema creation in production environments ### Migration Workflow For all schema changes: 1. Modify SQLAlchemy models in `src/models/` 2. Generate migration: `uv run alembic revision --autogenerate -m "description"` 3. Review the generated migration file in `migrations/versions/` 4. Test the migration (upgrade and downgrade paths) 5. Commit the migration file with model changes 6. Apply in deployments: `uv run alembic upgrade head` ### Naming Conventions Migration messages should be: - Descriptive and imperative: "Add Participant model", "Add email index to Participant" - Under 80 characters - Use lowercase except for model/table names - Examples: - "Add Participant and MagicToken models" - "Add withdrawn_at column to Participant" - "Create composite index on exchange_id and email" ### Testing Requirements Every migration must be tested for: 1. **Upgrade path**: `alembic upgrade head` succeeds 2. **Downgrade path**: `alembic downgrade -1` and `alembic upgrade head` both succeed 3. **Schema correctness**: Database schema matches SQLAlchemy model definitions 4. **Application compatibility**: All tests pass after migration ### Handling Existing Databases For databases created with db.create_all() before migrations were established: **Option 1 - Stamp (preserves data)**: ```bash uv run alembic stamp head ``` This marks the database as being at the current migration version without running migrations. **Option 2 - Recreate (development only)**: ```bash rm data/sneaky-klaus.db uv run alembic upgrade head ``` This creates a fresh database from migrations. Only suitable for development. ### Removing db.create_all() The `db.create_all()` call currently in `src/app.py` should be: - Removed from production code paths - Only used in test fixtures where appropriate - Never used for schema initialization in deployments Production deployments must use `alembic upgrade head` for schema initialization and updates. ### Automatic Migrations for Self-Hosted Deployments For self-hosted deployments using containers, migrations must be applied automatically when the container starts. This ensures that: - Users pulling new container images automatically get schema updates - No manual migration commands required - Schema is always in sync with application code - First-run deployments get proper schema initialization **Implementation Approach: Container Entrypoint Script** An entrypoint script runs `alembic upgrade head` before starting the application server. This approach is chosen because: - **Timing**: Migrations run before application starts, avoiding race conditions - **Separation of concerns**: Database initialization is separate from application startup - **Clear error handling**: Migration failures prevent application startup - **Standard pattern**: Common practice for containerized applications with databases - **Works with gunicorn**: Gunicorn workers don't need to coordinate migrations **Entrypoint Script Responsibilities**: 1. Run `alembic upgrade head` to apply all pending migrations 2. Log migration status (success or failure) 3. Exit with error if migrations fail (preventing container startup) 4. Start application server (gunicorn) if migrations succeed **Implementation**: ```bash #!/bin/bash set -e # Exit on any error echo "Running database migrations..." if uv run alembic upgrade head; then echo "Database migrations completed successfully" else echo "ERROR: Database migration failed!" echo "Please check the logs above for details." exit 1 fi echo "Starting application server..." exec gunicorn --bind 0.0.0.0:8000 --workers 2 --threads 4 main:app ``` **Error Handling**: - Migration failures are logged to stderr - Container exits with code 1 on migration failure - Container orchestrator (podman/docker compose) will show failed state - Users can inspect logs with `podman logs sneaky-klaus` or `docker logs sneaky-klaus` **Containerfile Changes**: - Copy entrypoint script: `COPY entrypoint.sh /app/entrypoint.sh` - Make executable: `RUN chmod +x /app/entrypoint.sh` - Change CMD to use entrypoint: `CMD ["/app/entrypoint.sh"]` **First-Run Initialization**: When no database exists, `alembic upgrade head` will: 1. Create the database file (SQLite) 2. Create the `alembic_version` table to track migration state 3. Run all migrations from scratch 4. Leave database in up-to-date state **Update Scenarios**: When updating to a new container image with schema changes: 1. Container starts and runs entrypoint script 2. Alembic detects current schema version from `alembic_version` table 3. Applies only new migrations (incremental upgrade) 4. Application starts with updated schema **Development Workflow**: For local development (non-containerized), developers continue to run migrations manually: ```bash uv run alembic upgrade head ``` This gives developers explicit control over when migrations run during development. **Alternative Considered: Application Startup Migrations** Running migrations in `src/app.py` during Flask application startup was considered but rejected: - **Race conditions**: Multiple gunicorn workers could try to run migrations simultaneously - **Locking complexity**: Would need migration locks to prevent concurrent runs - **Startup delays**: Application health checks might fail during migration - **Error visibility**: Migration failures less visible than container startup failures - **Not idiomatic**: Flask apps typically don't modify their own schema on startup The entrypoint script approach is simpler, safer, and more aligned with containerized deployment best practices. ## Consequences ### Positive - **Safe schema evolution**: Can modify existing tables without data loss - **Version control**: Schema changes tracked in git alongside code changes - **Rollback capability**: Can revert problematic schema changes - **Data migrations**: Can transform data during schema changes (e.g., populate new required columns) - **Production ready**: Proper migration strategy from the start avoids migration debt - **Clear deployment process**: `alembic upgrade head` is explicit and auditable - **Multi-environment support**: Same migrations work across dev, staging, and production ### Negative - **Additional complexity**: Developers must learn Alembic workflow - **Migration review required**: Auto-generated migrations must be reviewed for correctness - **Migration discipline needed**: Schema changes require creating and testing migrations - **Downgrade path maintenance**: Must write downgrade logic for each migration - **Linear migration history**: Merge conflicts in migrations can require rebasing ### Neutral - **Learning curve**: Alembic has good documentation but requires initial learning - **Migration conflicts**: Multiple developers changing schema simultaneously may need coordination - **Test database setup**: Tests may need to apply migrations rather than using create_all() ## Implementation Notes ### Phase 2 Implementation For Phase 2 (v0.2.0), the developer should: 1. Create Participant and MagicToken models in `src/models/` 2. Generate migration: ```bash uv run alembic revision --autogenerate -m "Add Participant and MagicToken models" ``` 3. Review the generated migration file: - Verify all new tables, columns, and indexes are included - Check foreign key constraints are correct - Ensure indexes are created for performance-critical queries 4. Test the migration: ```bash # Test upgrade uv run alembic upgrade head # Test downgrade (optional but recommended) uv run alembic downgrade -1 uv run alembic upgrade head ``` 5. Run application tests to verify compatibility 6. Commit migration file with model changes ### Migration File Structure Migration files are in `migrations/versions/` and follow this structure: ```python """Add Participant and MagicToken models Revision ID: abc123def456 Revises: eeff6e1a89cd Create Date: 2025-12-22 10:30:00.000000 """ from alembic import op import sqlalchemy as sa # revision identifiers revision = 'abc123def456' down_revision = 'eeff6e1a89cd' branch_labels = None depends_on = None def upgrade(): # Schema changes for upgrade op.create_table('participant', sa.Column('id', sa.Integer(), nullable=False), # ... other columns ) def downgrade(): # Schema changes for rollback op.drop_table('participant') ``` ### Alembic Configuration Alembic is configured via `alembic.ini`: - Migration directory: `migrations/` - SQLAlchemy URL: Configured dynamically from Flask config in `migrations/env.py` - Auto-generate support: Enabled ### Documentation Updates The following documentation has been updated to reflect this decision: - Phase 2 Implementation Decisions (section 9.1) - Data Model v0.2.0 (Migration Strategy section) - System Architecture Overview v0.2.0 (Database Layer section) ## Alternatives Considered ### Continue using db.create_all() **Rejected**: While simpler initially, db.create_all() cannot handle schema evolution. Since: - Alembic infrastructure already exists in the codebase - We expect ongoing schema evolution across multiple phases - Self-hosted deployments may have persistent data - Production-ready approach prevents migration debt Starting with Alembic now is the right choice despite the added complexity. ### Manual SQL migrations **Rejected**: Writing raw SQL migrations is error-prone and doesn't integrate with SQLAlchemy models. Alembic's autogenerate feature significantly reduces migration creation effort while maintaining safety. ### Django-style migrations **Rejected**: Django's migration system is tightly coupled to Django ORM. Alembic is the standard for SQLAlchemy-based applications and integrates well with Flask. ### Defer migrations until schema is stable **Rejected**: The schema will evolve continuously as new features are added. Deferring migrations creates migration debt and makes it harder to support existing deployments. Starting with migrations from Phase 2 establishes good patterns early. ## References - Alembic documentation: https://alembic.sqlalchemy.org/ - SQLAlchemy documentation: https://docs.sqlalchemy.org/ - ADR-0001: Core Technology Stack - Phase 2 Implementation Decisions (section 9.1) - Data Model v0.2.0