Documents the diagnosis and resolution of database migration detection conflicts
6.4 KiB
Database Migration Architecture
Overview
StarPunk uses a dual-strategy database initialization system that combines immediate schema creation (SCHEMA_SQL) with evolutionary migrations. This architecture provides both fast fresh installations and safe upgrades for existing databases.
Components
1. SCHEMA_SQL (database.py)
Purpose: Define the current complete database schema for fresh installations
Location: /starpunk/database.py lines 11-87
Responsibilities:
- Create all tables with current structure
- Create all columns with current types
- Create base indexes for performance
- Provide instant database initialization for new installations
Design Principle: Always represents the latest schema version
2. Migration Files
Purpose: Transform existing databases from one version to another
Location: /migrations/*.sql
Format: {number}_{description}.sql
- Number: Three-digit zero-padded sequence (001, 002, etc.)
- Description: Clear indication of changes
Responsibilities:
- Add new tables/columns to existing databases
- Modify existing structures safely
- Create indexes and constraints
- Handle breaking changes with data preservation
3. Migration Runner (migrations.py)
Purpose: Intelligent application of migrations based on database state
Location: /starpunk/migrations.py
Key Features:
- Fresh database detection
- Partial schema recognition
- Smart migration skipping
- Index-only application
- Transaction safety
Architecture Patterns
Fresh Database Flow
1. init_db() called
2. SCHEMA_SQL executed (creates all current tables/columns)
3. run_migrations() called
4. Detects fresh database (empty schema_migrations)
5. Checks if schema is current (is_schema_current())
6. If current: marks all migrations as applied (no execution)
7. If partial: applies only needed migrations
Existing Database Flow
1. init_db() called
2. SCHEMA_SQL executed (CREATE IF NOT EXISTS - no-op for existing tables)
3. run_migrations() called
4. Reads schema_migrations table
5. Discovers migration files
6. Applies only unapplied migrations in sequence
Hybrid Database Flow (Production Issue Case)
1. Database has tables from SCHEMA_SQL but no migration records
2. run_migrations() detects migration_count == 0
3. For each migration, calls is_migration_needed()
4. Migration 002: detects tables exist, indexes missing
5. Creates only missing indexes
6. Marks migration as applied without full execution
State Detection Logic
is_schema_current() Function
Determines if database matches current schema version completely.
Checks:
- Table existence (authorization_codes)
- Column existence (token_hash in tokens)
- Index existence (idx_tokens_hash, etc.)
Returns:
- True: Schema is completely current (all migrations applied)
- False: Schema needs migrations
is_migration_needed() Function
Determines if a specific migration should be applied.
For Migration 002:
- Check if authorization_codes table exists
- Check if token_hash column exists in tokens
- Check if indexes exist
- Return True only if tables/columns are missing
- Return False if only indexes are missing (handled separately)
Design Decisions
Why Dual Strategy?
- Fresh Install Speed: SCHEMA_SQL provides instant, complete schema
- Upgrade Safety: Migrations provide controlled, versioned changes
- Flexibility: Can handle various database states gracefully
Why Smart Detection?
- Idempotency: Same code works for any database state
- Self-Healing: Can fix partial schemas automatically
- No Data Loss: Never drops tables unnecessarily
Why Check Indexes Separately?
- SCHEMA_SQL Evolution: As SCHEMA_SQL includes migration changes, we avoid conflicts
- Granular Control: Can apply just missing pieces
- Performance: Indexes can be added without table locks
Migration Guidelines
Writing Migrations
- Never use IF NOT EXISTS in migrations: Migrations should fail if preconditions aren't met
- Always provide rollback path: Document how to reverse changes
- One logical change per migration: Keep migrations focused
- Test with various database states: Fresh, existing, and hybrid
SCHEMA_SQL Updates
When updating SCHEMA_SQL after a migration:
- Include all changes from the migration
- Remove indexes that migrations will create (avoid conflicts)
- Keep CREATE IF NOT EXISTS for idempotency
- Test fresh installations
Error Recovery
Common Issues
"Table already exists" Error
Cause: Migration tries to create table that SCHEMA_SQL already created
Solution: Smart detection should prevent this. If it fails:
- Check if migration is already in schema_migrations
- Verify is_migration_needed() logic
- Manually mark migration as applied if needed
Missing Indexes
Cause: Tables exist from SCHEMA_SQL but indexes weren't created
Solution: Migration system creates missing indexes separately
Partial Migration Application
Cause: Migration failed partway through
Solution: Transactions ensure all-or-nothing. Rollback and retry.
State Verification Queries
Check Migration Status
SELECT * FROM schema_migrations ORDER BY id;
Check Table Existence
SELECT name FROM sqlite_master
WHERE type='table'
ORDER BY name;
Check Index Existence
SELECT name FROM sqlite_master
WHERE type='index'
ORDER BY name;
Check Column Structure
PRAGMA table_info(tokens);
PRAGMA table_info(authorization_codes);
Future Improvements
Potential Enhancements
- Migration Rollback: Add down() migrations for reversibility
- Schema Versioning: Add version table for faster state detection
- Migration Validation: Pre-flight checks before application
- Dry Run Mode: Test migrations without applying
Considered Alternatives
- Migrations-Only: Rejected - slow fresh installs
- SCHEMA_SQL-Only: Rejected - no upgrade path
- ORM-Based: Rejected - unnecessary complexity for single-user system
- External Tools: Rejected - additional dependencies
Security Considerations
Migration Safety
- All migrations run in transactions
- Rollback on any error
- No data destruction without explicit user action
- Token invalidation documented when necessary
Schema Security
- Tokens stored as SHA256 hashes
- Proper indexes for timing attack prevention
- Expiration columns for automatic cleanup
- Soft deletion support