StarPunk/docs/decisions/ADR-031-database-migration-system-redesign.md

# ADR-031: Database Migration System Redesign

## Status
Proposed

## Context

The v1.0.0-rc.1 release exposed a critical flaw in our database initialization and migration system. The system fails when upgrading existing production databases because:

1. `SCHEMA_SQL` represents the current (latest) schema structure
2. `SCHEMA_SQL` is executed BEFORE migrations run
3. Existing databases have old table structures that conflict with SCHEMA_SQL's expectations
4. The system tries to create indexes on columns that don't exist yet

This creates an impossible situation where:
- Fresh databases work fine (SCHEMA_SQL creates the latest structure)
- Existing databases fail (SCHEMA_SQL conflicts with old structure)

## Decision

Redesign the database initialization system to follow these principles:

1. **SCHEMA_SQL represents the initial v0.1.0 schema**, not the current schema
2. **All schema evolution happens through migrations**
3. **Migrations run BEFORE schema creation attempts**
4. **Fresh databases get the initial schema then run ALL migrations**

### Implementation Strategy

#### Phase 1: Immediate Fix (v1.0.1)
Remove problematic index creation from SCHEMA_SQL since migrations create them:
```python
# Remove from SCHEMA_SQL:
# CREATE INDEX IF NOT EXISTS idx_tokens_hash ON tokens(token_hash);
# Let migration 002 handle this
```

#### Phase 2: Proper Redesign (v1.1.0)
1. Create `INITIAL_SCHEMA_SQL` with the v0.1.0 database structure
2. Modify `init_db()` logic:
   ```python
   def init_db(app=None):
       # 1. Check if database exists and has tables
       if database_exists_with_tables():
           # Existing database - only run migrations
           run_migrations()
       else:
           # Fresh database - create initial schema then migrate
           conn.executescript(INITIAL_SCHEMA_SQL)
           run_all_migrations()
   ```

3. Add explicit schema versioning:
   ```sql
   CREATE TABLE schema_info (
       version TEXT PRIMARY KEY,
       upgraded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
   );
   ```

## Rationale

### Why Initial Schema + Migrations?

1. **Predictable upgrade path**: Every database follows the same evolution
2. **Testable**: Can test upgrades from any version to any version
3. **Auditable**: Migration history shows exact evolution path
4. **Reversible**: Can potentially support rollbacks
5. **Industry standard**: Follows patterns from Rails, Django, Alembic

### Why Current Approach Failed

1. **Dual source of truth**: Schema defined in both SCHEMA_SQL and migrations
2. **Temporal coupling**: SCHEMA_SQL assumes post-migration state
3. **No upgrade path**: Can't get from old state to new state
4. **Hidden dependencies**: Index creation depends on migration execution

## Consequences

### Positive
- Reliable database upgrades from any version
- Clear separation of concerns (initial vs evolution)
- Easier to test migration paths
- Follows established patterns
- Supports future rollback capabilities

### Negative
- Requires maintaining historical schema (INITIAL_SCHEMA_SQL)
- Fresh databases take longer to initialize (run all migrations)
- More complex initialization logic
- Need to reconstruct v0.1.0 schema

### Migration Path
1. v1.0.1: Quick fix - remove conflicting indexes from SCHEMA_SQL
2. v1.0.1: Add manual upgrade instructions for production
3. v1.1.0: Implement full redesign with INITIAL_SCHEMA_SQL
4. v1.1.0: Add comprehensive migration testing

## Alternatives Considered

### 1. Dynamic Schema Detection
**Approach**: Detect existing table structure and conditionally apply indexes

**Rejected because**:
- Complex conditional logic
- Fragile heuristics
- Doesn't solve root cause
- Hard to test all paths

### 2. Schema Snapshots
**Approach**: Maintain schema snapshots for each version, apply appropriate one

**Rejected because**:
- Maintenance burden
- Storage overhead
- Complex version detection
- Still doesn't provide upgrade path

### 3. Migration-Only Schema
**Approach**: No SCHEMA_SQL at all, everything through migrations

**Rejected because**:
- Slower fresh installations
- Need to maintain migration 000 as "initial schema"
- Harder to see current schema structure
- Goes against SQLite's lightweight philosophy

## References

- [Rails Database Migrations](https://guides.rubyonrails.org/active_record_migrations.html)
- [Django Migrations](https://docs.djangoproject.com/en/stable/topics/migrations/)
- [Alembic Documentation](https://alembic.sqlalchemy.org/)
- Production incident: v1.0.0-rc.1 deployment failure
- `/docs/reports/migration-failure-diagnosis-v1.0.0-rc.1.md`

## Implementation Checklist

- [ ] Create INITIAL_SCHEMA_SQL from v0.1.0 structure
- [ ] Modify init_db() to check database state
- [ ] Update migration runner to handle fresh databases
- [ ] Add schema_info table for version tracking
- [ ] Create migration test suite
- [ ] Document upgrade procedures
- [ ] Test upgrade paths from all released versions