# Database Migration Architecture ## Overview StarPunk uses a dual-strategy database initialization system that combines immediate schema creation (SCHEMA_SQL) with evolutionary migrations. This architecture provides both fast fresh installations and safe upgrades for existing databases. ## Components ### 1. SCHEMA_SQL (database.py) **Purpose**: Define the current complete database schema for fresh installations **Location**: `/starpunk/database.py` lines 11-87 **Responsibilities**: - Create all tables with current structure - Create all columns with current types - Create base indexes for performance - Provide instant database initialization for new installations **Design Principle**: Always represents the latest schema version ### 2. Migration Files **Purpose**: Transform existing databases from one version to another **Location**: `/migrations/*.sql` **Format**: `{number}_{description}.sql` - Number: Three-digit zero-padded sequence (001, 002, etc.) - Description: Clear indication of changes **Responsibilities**: - Add new tables/columns to existing databases - Modify existing structures safely - Create indexes and constraints - Handle breaking changes with data preservation ### 3. Migration Runner (migrations.py) **Purpose**: Intelligent application of migrations based on database state **Location**: `/starpunk/migrations.py` **Key Features**: - Fresh database detection - Partial schema recognition - Smart migration skipping - Index-only application - Transaction safety ## Architecture Patterns ### Fresh Database Flow ``` 1. init_db() called 2. SCHEMA_SQL executed (creates all current tables/columns) 3. run_migrations() called 4. Detects fresh database (empty schema_migrations) 5. Checks if schema is current (is_schema_current()) 6. If current: marks all migrations as applied (no execution) 7. If partial: applies only needed migrations ``` ### Existing Database Flow ``` 1. init_db() called 2. SCHEMA_SQL executed (CREATE IF NOT EXISTS - no-op for existing tables) 3. run_migrations() called 4. Reads schema_migrations table 5. Discovers migration files 6. Applies only unapplied migrations in sequence ``` ### Hybrid Database Flow (Production Issue Case) ``` 1. Database has tables from SCHEMA_SQL but no migration records 2. run_migrations() detects migration_count == 0 3. For each migration, calls is_migration_needed() 4. Migration 002: detects tables exist, indexes missing 5. Creates only missing indexes 6. Marks migration as applied without full execution ``` ## State Detection Logic ### is_schema_current() Function Determines if database matches current schema version completely. **Checks**: 1. Table existence (authorization_codes) 2. Column existence (token_hash in tokens) 3. Index existence (idx_tokens_hash, etc.) **Returns**: - True: Schema is completely current (all migrations applied) - False: Schema needs migrations ### is_migration_needed() Function Determines if a specific migration should be applied. **For Migration 002**: 1. Check if authorization_codes table exists 2. Check if token_hash column exists in tokens 3. Check if indexes exist 4. Return True only if tables/columns are missing 5. Return False if only indexes are missing (handled separately) ## Design Decisions ### Why Dual Strategy? 1. **Fresh Install Speed**: SCHEMA_SQL provides instant, complete schema 2. **Upgrade Safety**: Migrations provide controlled, versioned changes 3. **Flexibility**: Can handle various database states gracefully ### Why Smart Detection? 1. **Idempotency**: Same code works for any database state 2. **Self-Healing**: Can fix partial schemas automatically 3. **No Data Loss**: Never drops tables unnecessarily ### Why Check Indexes Separately? 1. **SCHEMA_SQL Evolution**: As SCHEMA_SQL includes migration changes, we avoid conflicts 2. **Granular Control**: Can apply just missing pieces 3. **Performance**: Indexes can be added without table locks ## Migration Guidelines ### Writing Migrations 1. **Never use IF NOT EXISTS in migrations**: Migrations should fail if preconditions aren't met 2. **Always provide rollback path**: Document how to reverse changes 3. **One logical change per migration**: Keep migrations focused 4. **Test with various database states**: Fresh, existing, and hybrid ### SCHEMA_SQL Updates When updating SCHEMA_SQL after a migration: 1. Include all changes from the migration 2. Remove indexes that migrations will create (avoid conflicts) 3. Keep CREATE IF NOT EXISTS for idempotency 4. Test fresh installations ## Error Recovery ### Common Issues #### "Table already exists" Error **Cause**: Migration tries to create table that SCHEMA_SQL already created **Solution**: Smart detection should prevent this. If it fails: 1. Check if migration is already in schema_migrations 2. Verify is_migration_needed() logic 3. Manually mark migration as applied if needed #### Missing Indexes **Cause**: Tables exist from SCHEMA_SQL but indexes weren't created **Solution**: Migration system creates missing indexes separately #### Partial Migration Application **Cause**: Migration failed partway through **Solution**: Transactions ensure all-or-nothing. Rollback and retry. ## State Verification Queries ### Check Migration Status ```sql SELECT * FROM schema_migrations ORDER BY id; ``` ### Check Table Existence ```sql SELECT name FROM sqlite_master WHERE type='table' ORDER BY name; ``` ### Check Index Existence ```sql SELECT name FROM sqlite_master WHERE type='index' ORDER BY name; ``` ### Check Column Structure ```sql PRAGMA table_info(tokens); PRAGMA table_info(authorization_codes); ``` ## Future Improvements ### Potential Enhancements 1. **Migration Rollback**: Add down() migrations for reversibility 2. **Schema Versioning**: Add version table for faster state detection 3. **Migration Validation**: Pre-flight checks before application 4. **Dry Run Mode**: Test migrations without applying ### Considered Alternatives 1. **Migrations-Only**: Rejected - slow fresh installs 2. **SCHEMA_SQL-Only**: Rejected - no upgrade path 3. **ORM-Based**: Rejected - unnecessary complexity for single-user system 4. **External Tools**: Rejected - additional dependencies ## Security Considerations ### Migration Safety 1. All migrations run in transactions 2. Rollback on any error 3. No data destruction without explicit user action 4. Token invalidation documented when necessary ### Schema Security 1. Tokens stored as SHA256 hashes 2. Proper indexes for timing attack prevention 3. Expiration columns for automatic cleanup 4. Soft deletion support