StarPunk/docs/architecture/database-migration-architecture.md

# Database Migration Architecture

## Overview
StarPunk uses a dual-strategy database initialization system that combines immediate schema creation (SCHEMA_SQL) with evolutionary migrations. This architecture provides both fast fresh installations and safe upgrades for existing databases.

## Components

### 1. SCHEMA_SQL (database.py)
**Purpose**: Define the current complete database schema for fresh installations

**Location**: `/starpunk/database.py` lines 11-87

**Responsibilities**:
- Create all tables with current structure
- Create all columns with current types
- Create base indexes for performance
- Provide instant database initialization for new installations

**Design Principle**: Always represents the latest schema version

### 2. Migration Files
**Purpose**: Transform existing databases from one version to another

**Location**: `/migrations/*.sql`

**Format**: `{number}_{description}.sql`
- Number: Three-digit zero-padded sequence (001, 002, etc.)
- Description: Clear indication of changes

**Responsibilities**:
- Add new tables/columns to existing databases
- Modify existing structures safely
- Create indexes and constraints
- Handle breaking changes with data preservation

### 3. Migration Runner (migrations.py)
**Purpose**: Intelligent application of migrations based on database state

**Location**: `/starpunk/migrations.py`

**Key Features**:
- Fresh database detection
- Partial schema recognition
- Smart migration skipping
- Index-only application
- Transaction safety

## Architecture Patterns

### Fresh Database Flow
```
1. init_db() called
2. SCHEMA_SQL executed (creates all current tables/columns)
3. run_migrations() called
4. Detects fresh database (empty schema_migrations)
5. Checks if schema is current (is_schema_current())
6. If current: marks all migrations as applied (no execution)
7. If partial: applies only needed migrations
```

### Existing Database Flow
```
1. init_db() called
2. SCHEMA_SQL executed (CREATE IF NOT EXISTS - no-op for existing tables)
3. run_migrations() called
4. Reads schema_migrations table
5. Discovers migration files
6. Applies only unapplied migrations in sequence
```

### Hybrid Database Flow (Production Issue Case)
```
1. Database has tables from SCHEMA_SQL but no migration records
2. run_migrations() detects migration_count == 0
3. For each migration, calls is_migration_needed()
4. Migration 002: detects tables exist, indexes missing
5. Creates only missing indexes
6. Marks migration as applied without full execution
```

## State Detection Logic

### is_schema_current() Function
Determines if database matches current schema version completely.

**Checks**:
1. Table existence (authorization_codes)
2. Column existence (token_hash in tokens)
3. Index existence (idx_tokens_hash, etc.)

**Returns**:
- True: Schema is completely current (all migrations applied)
- False: Schema needs migrations

### is_migration_needed() Function
Determines if a specific migration should be applied.

**For Migration 002**:
1. Check if authorization_codes table exists
2. Check if token_hash column exists in tokens
3. Check if indexes exist
4. Return True only if tables/columns are missing
5. Return False if only indexes are missing (handled separately)

## Design Decisions

### Why Dual Strategy?
1. **Fresh Install Speed**: SCHEMA_SQL provides instant, complete schema
2. **Upgrade Safety**: Migrations provide controlled, versioned changes
3. **Flexibility**: Can handle various database states gracefully

### Why Smart Detection?
1. **Idempotency**: Same code works for any database state
2. **Self-Healing**: Can fix partial schemas automatically
3. **No Data Loss**: Never drops tables unnecessarily

### Why Check Indexes Separately?
1. **SCHEMA_SQL Evolution**: As SCHEMA_SQL includes migration changes, we avoid conflicts
2. **Granular Control**: Can apply just missing pieces
3. **Performance**: Indexes can be added without table locks

## Migration Guidelines

### Writing Migrations
1. **Never use IF NOT EXISTS in migrations**: Migrations should fail if preconditions aren't met
2. **Always provide rollback path**: Document how to reverse changes
3. **One logical change per migration**: Keep migrations focused
4. **Test with various database states**: Fresh, existing, and hybrid

### SCHEMA_SQL Updates
When updating SCHEMA_SQL after a migration:
1. Include all changes from the migration
2. Remove indexes that migrations will create (avoid conflicts)
3. Keep CREATE IF NOT EXISTS for idempotency
4. Test fresh installations

## Error Recovery

### Common Issues

#### "Table already exists" Error
**Cause**: Migration tries to create table that SCHEMA_SQL already created

**Solution**: Smart detection should prevent this. If it fails:
1. Check if migration is already in schema_migrations
2. Verify is_migration_needed() logic
3. Manually mark migration as applied if needed

#### Missing Indexes
**Cause**: Tables exist from SCHEMA_SQL but indexes weren't created

**Solution**: Migration system creates missing indexes separately

#### Partial Migration Application
**Cause**: Migration failed partway through

**Solution**: Transactions ensure all-or-nothing. Rollback and retry.

## State Verification Queries

### Check Migration Status
```sql
SELECT * FROM schema_migrations ORDER BY id;
```

### Check Table Existence
```sql
SELECT name FROM sqlite_master
WHERE type='table'
ORDER BY name;
```

### Check Index Existence
```sql
SELECT name FROM sqlite_master
WHERE type='index'
ORDER BY name;
```

### Check Column Structure
```sql
PRAGMA table_info(tokens);
PRAGMA table_info(authorization_codes);
```

## Future Improvements

### Potential Enhancements
1. **Migration Rollback**: Add down() migrations for reversibility
2. **Schema Versioning**: Add version table for faster state detection
3. **Migration Validation**: Pre-flight checks before application
4. **Dry Run Mode**: Test migrations without applying

### Considered Alternatives
1. **Migrations-Only**: Rejected - slow fresh installs
2. **SCHEMA_SQL-Only**: Rejected - no upgrade path
3. **ORM-Based**: Rejected - unnecessary complexity for single-user system
4. **External Tools**: Rejected - additional dependencies

## Security Considerations

### Migration Safety
1. All migrations run in transactions
2. Rollback on any error
3. No data destruction without explicit user action
4. Token invalidation documented when necessary

### Schema Security
1. Tokens stored as SHA256 hashes
2. Proper indexes for timing attack prevention
3. Expiration columns for automatic cleanup
4. Soft deletion support