docs: Add database migration architecture and conflict resolution documentation
Documents the diagnosis and resolution of database migration detection conflicts
This commit is contained in:
212
docs/architecture/database-migration-architecture.md
Normal file
212
docs/architecture/database-migration-architecture.md
Normal file
@@ -0,0 +1,212 @@
|
||||
# Database Migration Architecture
|
||||
|
||||
## Overview
|
||||
StarPunk uses a dual-strategy database initialization system that combines immediate schema creation (SCHEMA_SQL) with evolutionary migrations. This architecture provides both fast fresh installations and safe upgrades for existing databases.
|
||||
|
||||
## Components
|
||||
|
||||
### 1. SCHEMA_SQL (database.py)
|
||||
**Purpose**: Define the current complete database schema for fresh installations
|
||||
|
||||
**Location**: `/starpunk/database.py` lines 11-87
|
||||
|
||||
**Responsibilities**:
|
||||
- Create all tables with current structure
|
||||
- Create all columns with current types
|
||||
- Create base indexes for performance
|
||||
- Provide instant database initialization for new installations
|
||||
|
||||
**Design Principle**: Always represents the latest schema version
|
||||
|
||||
### 2. Migration Files
|
||||
**Purpose**: Transform existing databases from one version to another
|
||||
|
||||
**Location**: `/migrations/*.sql`
|
||||
|
||||
**Format**: `{number}_{description}.sql`
|
||||
- Number: Three-digit zero-padded sequence (001, 002, etc.)
|
||||
- Description: Clear indication of changes
|
||||
|
||||
**Responsibilities**:
|
||||
- Add new tables/columns to existing databases
|
||||
- Modify existing structures safely
|
||||
- Create indexes and constraints
|
||||
- Handle breaking changes with data preservation
|
||||
|
||||
### 3. Migration Runner (migrations.py)
|
||||
**Purpose**: Intelligent application of migrations based on database state
|
||||
|
||||
**Location**: `/starpunk/migrations.py`
|
||||
|
||||
**Key Features**:
|
||||
- Fresh database detection
|
||||
- Partial schema recognition
|
||||
- Smart migration skipping
|
||||
- Index-only application
|
||||
- Transaction safety
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Fresh Database Flow
|
||||
```
|
||||
1. init_db() called
|
||||
2. SCHEMA_SQL executed (creates all current tables/columns)
|
||||
3. run_migrations() called
|
||||
4. Detects fresh database (empty schema_migrations)
|
||||
5. Checks if schema is current (is_schema_current())
|
||||
6. If current: marks all migrations as applied (no execution)
|
||||
7. If partial: applies only needed migrations
|
||||
```
|
||||
|
||||
### Existing Database Flow
|
||||
```
|
||||
1. init_db() called
|
||||
2. SCHEMA_SQL executed (CREATE IF NOT EXISTS - no-op for existing tables)
|
||||
3. run_migrations() called
|
||||
4. Reads schema_migrations table
|
||||
5. Discovers migration files
|
||||
6. Applies only unapplied migrations in sequence
|
||||
```
|
||||
|
||||
### Hybrid Database Flow (Production Issue Case)
|
||||
```
|
||||
1. Database has tables from SCHEMA_SQL but no migration records
|
||||
2. run_migrations() detects migration_count == 0
|
||||
3. For each migration, calls is_migration_needed()
|
||||
4. Migration 002: detects tables exist, indexes missing
|
||||
5. Creates only missing indexes
|
||||
6. Marks migration as applied without full execution
|
||||
```
|
||||
|
||||
## State Detection Logic
|
||||
|
||||
### is_schema_current() Function
|
||||
Determines if database matches current schema version completely.
|
||||
|
||||
**Checks**:
|
||||
1. Table existence (authorization_codes)
|
||||
2. Column existence (token_hash in tokens)
|
||||
3. Index existence (idx_tokens_hash, etc.)
|
||||
|
||||
**Returns**:
|
||||
- True: Schema is completely current (all migrations applied)
|
||||
- False: Schema needs migrations
|
||||
|
||||
### is_migration_needed() Function
|
||||
Determines if a specific migration should be applied.
|
||||
|
||||
**For Migration 002**:
|
||||
1. Check if authorization_codes table exists
|
||||
2. Check if token_hash column exists in tokens
|
||||
3. Check if indexes exist
|
||||
4. Return True only if tables/columns are missing
|
||||
5. Return False if only indexes are missing (handled separately)
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Why Dual Strategy?
|
||||
1. **Fresh Install Speed**: SCHEMA_SQL provides instant, complete schema
|
||||
2. **Upgrade Safety**: Migrations provide controlled, versioned changes
|
||||
3. **Flexibility**: Can handle various database states gracefully
|
||||
|
||||
### Why Smart Detection?
|
||||
1. **Idempotency**: Same code works for any database state
|
||||
2. **Self-Healing**: Can fix partial schemas automatically
|
||||
3. **No Data Loss**: Never drops tables unnecessarily
|
||||
|
||||
### Why Check Indexes Separately?
|
||||
1. **SCHEMA_SQL Evolution**: As SCHEMA_SQL includes migration changes, we avoid conflicts
|
||||
2. **Granular Control**: Can apply just missing pieces
|
||||
3. **Performance**: Indexes can be added without table locks
|
||||
|
||||
## Migration Guidelines
|
||||
|
||||
### Writing Migrations
|
||||
1. **Never use IF NOT EXISTS in migrations**: Migrations should fail if preconditions aren't met
|
||||
2. **Always provide rollback path**: Document how to reverse changes
|
||||
3. **One logical change per migration**: Keep migrations focused
|
||||
4. **Test with various database states**: Fresh, existing, and hybrid
|
||||
|
||||
### SCHEMA_SQL Updates
|
||||
When updating SCHEMA_SQL after a migration:
|
||||
1. Include all changes from the migration
|
||||
2. Remove indexes that migrations will create (avoid conflicts)
|
||||
3. Keep CREATE IF NOT EXISTS for idempotency
|
||||
4. Test fresh installations
|
||||
|
||||
## Error Recovery
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### "Table already exists" Error
|
||||
**Cause**: Migration tries to create table that SCHEMA_SQL already created
|
||||
|
||||
**Solution**: Smart detection should prevent this. If it fails:
|
||||
1. Check if migration is already in schema_migrations
|
||||
2. Verify is_migration_needed() logic
|
||||
3. Manually mark migration as applied if needed
|
||||
|
||||
#### Missing Indexes
|
||||
**Cause**: Tables exist from SCHEMA_SQL but indexes weren't created
|
||||
|
||||
**Solution**: Migration system creates missing indexes separately
|
||||
|
||||
#### Partial Migration Application
|
||||
**Cause**: Migration failed partway through
|
||||
|
||||
**Solution**: Transactions ensure all-or-nothing. Rollback and retry.
|
||||
|
||||
## State Verification Queries
|
||||
|
||||
### Check Migration Status
|
||||
```sql
|
||||
SELECT * FROM schema_migrations ORDER BY id;
|
||||
```
|
||||
|
||||
### Check Table Existence
|
||||
```sql
|
||||
SELECT name FROM sqlite_master
|
||||
WHERE type='table'
|
||||
ORDER BY name;
|
||||
```
|
||||
|
||||
### Check Index Existence
|
||||
```sql
|
||||
SELECT name FROM sqlite_master
|
||||
WHERE type='index'
|
||||
ORDER BY name;
|
||||
```
|
||||
|
||||
### Check Column Structure
|
||||
```sql
|
||||
PRAGMA table_info(tokens);
|
||||
PRAGMA table_info(authorization_codes);
|
||||
```
|
||||
|
||||
## Future Improvements
|
||||
|
||||
### Potential Enhancements
|
||||
1. **Migration Rollback**: Add down() migrations for reversibility
|
||||
2. **Schema Versioning**: Add version table for faster state detection
|
||||
3. **Migration Validation**: Pre-flight checks before application
|
||||
4. **Dry Run Mode**: Test migrations without applying
|
||||
|
||||
### Considered Alternatives
|
||||
1. **Migrations-Only**: Rejected - slow fresh installs
|
||||
2. **SCHEMA_SQL-Only**: Rejected - no upgrade path
|
||||
3. **ORM-Based**: Rejected - unnecessary complexity for single-user system
|
||||
4. **External Tools**: Rejected - additional dependencies
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Migration Safety
|
||||
1. All migrations run in transactions
|
||||
2. Rollback on any error
|
||||
3. No data destruction without explicit user action
|
||||
4. Token invalidation documented when necessary
|
||||
|
||||
### Schema Security
|
||||
1. Tokens stored as SHA256 hashes
|
||||
2. Proper indexes for timing attack prevention
|
||||
3. Expiration columns for automatic cleanup
|
||||
4. Soft deletion support
|
||||
Reference in New Issue
Block a user