# 0005. Database Migrations with Alembic

Date: 2025-12-22

## Status

Accepted

## Context

Sneaky Klaus uses SQLite as its database (see ADR-0001). As the application evolves, the database schema needs to change to support new features. There are two primary approaches to managing database schema:

1. **db.create_all()**: SQLAlchemy's create_all() method creates tables based on current model definitions. Simple but has critical limitations:
   - Cannot modify existing tables (add/remove columns, change types)
   - Cannot migrate data during schema changes
   - No version tracking or rollback capability
   - Unsafe for databases with existing data

2. **Schema Migrations**: Tools like Alembic track schema changes as versioned migration files:
   - Supports incremental schema changes (add columns, modify constraints, etc.)
   - Enables data migrations during schema evolution
   - Provides version tracking and rollback capability
   - Safe for production databases with existing data

Key considerations:

- Phase 1 (v0.1.0) established Admin and Exchange models
- Phase 2 (v0.2.0) adds Participant and MagicToken models
- Future phases will continue evolving the schema
- Self-hosted deployments may have persistent data from day one
- Users may skip versions or upgrade incrementally

The question is: when should we start using proper database migrations?

## Decision

We will use **Alembic for all database schema changes starting from Phase 2 (v0.2.0)** onward.

Specifically:

1. **Alembic is already configured** in the codebase (alembic.ini, migrations/ directory)
2. **An initial migration already exists** for Admin and Exchange models (created in Phase 1)
3. **All new models and schema changes** will be managed through Alembic migrations
4. **db.create_all() must not be used** for schema creation in production environments

### Migration Workflow

For all schema changes:

1. Modify SQLAlchemy models in `src/models/`
2. Generate migration: `uv run alembic revision --autogenerate -m "description"`
3. Review the generated migration file in `migrations/versions/`
4. Test the migration (upgrade and downgrade paths)
5. Commit the migration file with model changes
6. Apply in deployments: `uv run alembic upgrade head`

### Naming Conventions

Migration messages should be:
- Descriptive and imperative: "Add Participant model", "Add email index to Participant"
- Under 80 characters
- Use lowercase except for model/table names
- Examples:
  - "Add Participant and MagicToken models"
  - "Add withdrawn_at column to Participant"
  - "Create composite index on exchange_id and email"

### Testing Requirements

Every migration must be tested for:
1. **Upgrade path**: `alembic upgrade head` succeeds
2. **Downgrade path**: `alembic downgrade -1` and `alembic upgrade head` both succeed
3. **Schema correctness**: Database schema matches SQLAlchemy model definitions
4. **Application compatibility**: All tests pass after migration

### Handling Existing Databases

For databases created with db.create_all() before migrations were established:

**Option 1 - Stamp (preserves data)**:
```bash
uv run alembic stamp head
```
This marks the database as being at the current migration version without running migrations.

**Option 2 - Recreate (development only)**:
```bash
rm data/sneaky-klaus.db
uv run alembic upgrade head
```
This creates a fresh database from migrations. Only suitable for development.

### Removing db.create_all()

The `db.create_all()` call currently in `src/app.py` should be:
- Removed from production code paths
- Only used in test fixtures where appropriate
- Never used for schema initialization in deployments

Production deployments must use `alembic upgrade head` for schema initialization and updates.

### Automatic Migrations for Self-Hosted Deployments

For self-hosted deployments using containers, migrations must be applied automatically when the container starts. This ensures that:
- Users pulling new container images automatically get schema updates
- No manual migration commands required
- Schema is always in sync with application code
- First-run deployments get proper schema initialization

**Implementation Approach: Container Entrypoint Script**

An entrypoint script runs `alembic upgrade head` before starting the application server. This approach is chosen because:
- **Timing**: Migrations run before application starts, avoiding race conditions
- **Separation of concerns**: Database initialization is separate from application startup
- **Clear error handling**: Migration failures prevent application startup
- **Standard pattern**: Common practice for containerized applications with databases
- **Works with gunicorn**: Gunicorn workers don't need to coordinate migrations

**Entrypoint Script Responsibilities**:
1. Run `alembic upgrade head` to apply all pending migrations
2. Log migration status (success or failure)
3. Exit with error if migrations fail (preventing container startup)
4. Start application server (gunicorn) if migrations succeed

**Implementation**:
```bash
#!/bin/bash
set -e  # Exit on any error

echo "Running database migrations..."
if uv run alembic upgrade head; then
    echo "Database migrations completed successfully"
else
    echo "ERROR: Database migration failed!"
    echo "Please check the logs above for details."
    exit 1
fi

echo "Starting application server..."
exec gunicorn --bind 0.0.0.0:8000 --workers 2 --threads 4 main:app
```

**Error Handling**:
- Migration failures are logged to stderr
- Container exits with code 1 on migration failure
- Container orchestrator (podman/docker compose) will show failed state
- Users can inspect logs with `podman logs sneaky-klaus` or `docker logs sneaky-klaus`

**Containerfile Changes**:
- Copy entrypoint script: `COPY entrypoint.sh /app/entrypoint.sh`
- Make executable: `RUN chmod +x /app/entrypoint.sh`
- Change CMD to use entrypoint: `CMD ["/app/entrypoint.sh"]`

**First-Run Initialization**:
When no database exists, `alembic upgrade head` will:
1. Create the database file (SQLite)
2. Create the `alembic_version` table to track migration state
3. Run all migrations from scratch
4. Leave database in up-to-date state

**Update Scenarios**:
When updating to a new container image with schema changes:
1. Container starts and runs entrypoint script
2. Alembic detects current schema version from `alembic_version` table
3. Applies only new migrations (incremental upgrade)
4. Application starts with updated schema

**Development Workflow**:
For local development (non-containerized), developers continue to run migrations manually:
```bash
uv run alembic upgrade head
```

This gives developers explicit control over when migrations run during development.

**Alternative Considered: Application Startup Migrations**

Running migrations in `src/app.py` during Flask application startup was considered but rejected:
- **Race conditions**: Multiple gunicorn workers could try to run migrations simultaneously
- **Locking complexity**: Would need migration locks to prevent concurrent runs
- **Startup delays**: Application health checks might fail during migration
- **Error visibility**: Migration failures less visible than container startup failures
- **Not idiomatic**: Flask apps typically don't modify their own schema on startup

The entrypoint script approach is simpler, safer, and more aligned with containerized deployment best practices.

## Consequences

### Positive

- **Safe schema evolution**: Can modify existing tables without data loss
- **Version control**: Schema changes tracked in git alongside code changes
- **Rollback capability**: Can revert problematic schema changes
- **Data migrations**: Can transform data during schema changes (e.g., populate new required columns)
- **Production ready**: Proper migration strategy from the start avoids migration debt
- **Clear deployment process**: `alembic upgrade head` is explicit and auditable
- **Multi-environment support**: Same migrations work across dev, staging, and production

### Negative

- **Additional complexity**: Developers must learn Alembic workflow
- **Migration review required**: Auto-generated migrations must be reviewed for correctness
- **Migration discipline needed**: Schema changes require creating and testing migrations
- **Downgrade path maintenance**: Must write downgrade logic for each migration
- **Linear migration history**: Merge conflicts in migrations can require rebasing

### Neutral

- **Learning curve**: Alembic has good documentation but requires initial learning
- **Migration conflicts**: Multiple developers changing schema simultaneously may need coordination
- **Test database setup**: Tests may need to apply migrations rather than using create_all()

## Implementation Notes

### Phase 2 Implementation

For Phase 2 (v0.2.0), the developer should:

1. Create Participant and MagicToken models in `src/models/`
2. Generate migration:
   ```bash
   uv run alembic revision --autogenerate -m "Add Participant and MagicToken models"
   ```
3. Review the generated migration file:
   - Verify all new tables, columns, and indexes are included
   - Check foreign key constraints are correct
   - Ensure indexes are created for performance-critical queries
4. Test the migration:
   ```bash
   # Test upgrade
   uv run alembic upgrade head

   # Test downgrade (optional but recommended)
   uv run alembic downgrade -1
   uv run alembic upgrade head
   ```
5. Run application tests to verify compatibility
6. Commit migration file with model changes

### Migration File Structure

Migration files are in `migrations/versions/` and follow this structure:

```python
"""Add Participant and MagicToken models

Revision ID: abc123def456
Revises: eeff6e1a89cd
Create Date: 2025-12-22 10:30:00.000000

"""
from alembic import op
import sqlalchemy as sa

# revision identifiers
revision = 'abc123def456'
down_revision = 'eeff6e1a89cd'
branch_labels = None
depends_on = None

def upgrade():
    # Schema changes for upgrade
    op.create_table('participant',
        sa.Column('id', sa.Integer(), nullable=False),
        # ... other columns
    )

def downgrade():
    # Schema changes for rollback
    op.drop_table('participant')
```

### Alembic Configuration

Alembic is configured via `alembic.ini`:
- Migration directory: `migrations/`
- SQLAlchemy URL: Configured dynamically from Flask config in `migrations/env.py`
- Auto-generate support: Enabled

### Documentation Updates

The following documentation has been updated to reflect this decision:
- Phase 2 Implementation Decisions (section 9.1)
- Data Model v0.2.0 (Migration Strategy section)
- System Architecture Overview v0.2.0 (Database Layer section)

## Alternatives Considered

### Continue using db.create_all()

**Rejected**: While simpler initially, db.create_all() cannot handle schema evolution. Since:
- Alembic infrastructure already exists in the codebase
- We expect ongoing schema evolution across multiple phases
- Self-hosted deployments may have persistent data
- Production-ready approach prevents migration debt

Starting with Alembic now is the right choice despite the added complexity.

### Manual SQL migrations

**Rejected**: Writing raw SQL migrations is error-prone and doesn't integrate with SQLAlchemy models. Alembic's autogenerate feature significantly reduces migration creation effort while maintaining safety.

### Django-style migrations

**Rejected**: Django's migration system is tightly coupled to Django ORM. Alembic is the standard for SQLAlchemy-based applications and integrates well with Flask.

### Defer migrations until schema is stable

**Rejected**: The schema will evolve continuously as new features are added. Deferring migrations creates migration debt and makes it harder to support existing deployments. Starting with migrations from Phase 2 establishes good patterns early.

## References

- Alembic documentation: https://alembic.sqlalchemy.org/
- SQLAlchemy documentation: https://docs.sqlalchemy.org/
- ADR-0001: Core Technology Stack
- Phase 2 Implementation Decisions (section 9.1)
- Data Model v0.2.0