Files
StarPunk/docs/architecture/database-migration-architecture.md
Phil Skentelbery 2b2849a58d docs: Add database migration architecture and conflict resolution documentation
Documents the diagnosis and resolution of database migration detection conflicts
2025-11-24 13:27:19 -07:00

6.4 KiB

Database Migration Architecture

Overview

StarPunk uses a dual-strategy database initialization system that combines immediate schema creation (SCHEMA_SQL) with evolutionary migrations. This architecture provides both fast fresh installations and safe upgrades for existing databases.

Components

1. SCHEMA_SQL (database.py)

Purpose: Define the current complete database schema for fresh installations

Location: /starpunk/database.py lines 11-87

Responsibilities:

  • Create all tables with current structure
  • Create all columns with current types
  • Create base indexes for performance
  • Provide instant database initialization for new installations

Design Principle: Always represents the latest schema version

2. Migration Files

Purpose: Transform existing databases from one version to another

Location: /migrations/*.sql

Format: {number}_{description}.sql

  • Number: Three-digit zero-padded sequence (001, 002, etc.)
  • Description: Clear indication of changes

Responsibilities:

  • Add new tables/columns to existing databases
  • Modify existing structures safely
  • Create indexes and constraints
  • Handle breaking changes with data preservation

3. Migration Runner (migrations.py)

Purpose: Intelligent application of migrations based on database state

Location: /starpunk/migrations.py

Key Features:

  • Fresh database detection
  • Partial schema recognition
  • Smart migration skipping
  • Index-only application
  • Transaction safety

Architecture Patterns

Fresh Database Flow

1. init_db() called
2. SCHEMA_SQL executed (creates all current tables/columns)
3. run_migrations() called
4. Detects fresh database (empty schema_migrations)
5. Checks if schema is current (is_schema_current())
6. If current: marks all migrations as applied (no execution)
7. If partial: applies only needed migrations

Existing Database Flow

1. init_db() called
2. SCHEMA_SQL executed (CREATE IF NOT EXISTS - no-op for existing tables)
3. run_migrations() called
4. Reads schema_migrations table
5. Discovers migration files
6. Applies only unapplied migrations in sequence

Hybrid Database Flow (Production Issue Case)

1. Database has tables from SCHEMA_SQL but no migration records
2. run_migrations() detects migration_count == 0
3. For each migration, calls is_migration_needed()
4. Migration 002: detects tables exist, indexes missing
5. Creates only missing indexes
6. Marks migration as applied without full execution

State Detection Logic

is_schema_current() Function

Determines if database matches current schema version completely.

Checks:

  1. Table existence (authorization_codes)
  2. Column existence (token_hash in tokens)
  3. Index existence (idx_tokens_hash, etc.)

Returns:

  • True: Schema is completely current (all migrations applied)
  • False: Schema needs migrations

is_migration_needed() Function

Determines if a specific migration should be applied.

For Migration 002:

  1. Check if authorization_codes table exists
  2. Check if token_hash column exists in tokens
  3. Check if indexes exist
  4. Return True only if tables/columns are missing
  5. Return False if only indexes are missing (handled separately)

Design Decisions

Why Dual Strategy?

  1. Fresh Install Speed: SCHEMA_SQL provides instant, complete schema
  2. Upgrade Safety: Migrations provide controlled, versioned changes
  3. Flexibility: Can handle various database states gracefully

Why Smart Detection?

  1. Idempotency: Same code works for any database state
  2. Self-Healing: Can fix partial schemas automatically
  3. No Data Loss: Never drops tables unnecessarily

Why Check Indexes Separately?

  1. SCHEMA_SQL Evolution: As SCHEMA_SQL includes migration changes, we avoid conflicts
  2. Granular Control: Can apply just missing pieces
  3. Performance: Indexes can be added without table locks

Migration Guidelines

Writing Migrations

  1. Never use IF NOT EXISTS in migrations: Migrations should fail if preconditions aren't met
  2. Always provide rollback path: Document how to reverse changes
  3. One logical change per migration: Keep migrations focused
  4. Test with various database states: Fresh, existing, and hybrid

SCHEMA_SQL Updates

When updating SCHEMA_SQL after a migration:

  1. Include all changes from the migration
  2. Remove indexes that migrations will create (avoid conflicts)
  3. Keep CREATE IF NOT EXISTS for idempotency
  4. Test fresh installations

Error Recovery

Common Issues

"Table already exists" Error

Cause: Migration tries to create table that SCHEMA_SQL already created

Solution: Smart detection should prevent this. If it fails:

  1. Check if migration is already in schema_migrations
  2. Verify is_migration_needed() logic
  3. Manually mark migration as applied if needed

Missing Indexes

Cause: Tables exist from SCHEMA_SQL but indexes weren't created

Solution: Migration system creates missing indexes separately

Partial Migration Application

Cause: Migration failed partway through

Solution: Transactions ensure all-or-nothing. Rollback and retry.

State Verification Queries

Check Migration Status

SELECT * FROM schema_migrations ORDER BY id;

Check Table Existence

SELECT name FROM sqlite_master
WHERE type='table'
ORDER BY name;

Check Index Existence

SELECT name FROM sqlite_master
WHERE type='index'
ORDER BY name;

Check Column Structure

PRAGMA table_info(tokens);
PRAGMA table_info(authorization_codes);

Future Improvements

Potential Enhancements

  1. Migration Rollback: Add down() migrations for reversibility
  2. Schema Versioning: Add version table for faster state detection
  3. Migration Validation: Pre-flight checks before application
  4. Dry Run Mode: Test migrations without applying

Considered Alternatives

  1. Migrations-Only: Rejected - slow fresh installs
  2. SCHEMA_SQL-Only: Rejected - no upgrade path
  3. ORM-Based: Rejected - unnecessary complexity for single-user system
  4. External Tools: Rejected - additional dependencies

Security Considerations

Migration Safety

  1. All migrations run in transactions
  2. Rollback on any error
  3. No data destruction without explicit user action
  4. Token invalidation documented when necessary

Schema Security

  1. Tokens stored as SHA256 hashes
  2. Proper indexes for timing attack prevention
  3. Expiration columns for automatic cleanup
  4. Soft deletion support