Files
StarPunk/docs/reports/migration-failure-diagnosis-v1.0.0-rc.1.md
Phil Skentelbery 3ed77fd45f fix: Resolve database migration failure on existing databases
Fixes critical issue where migration 002 indexes already existed in SCHEMA_SQL,
causing 'index already exists' errors on databases created before v1.0.0-rc.1.

Changes:
- Removed duplicate index definitions from SCHEMA_SQL (database.py)
- Enhanced migration system to detect and handle indexes properly
- Added comprehensive documentation of the fix

Version bumped to 1.0.0-rc.2 with full changelog entry.

Refs: docs/reports/2025-11-24-migration-fix-v1.0.0-rc.2.md
2025-11-24 13:11:14 -07:00

5.0 KiB

Migration Failure Diagnosis - v1.0.0-rc.1

Executive Summary

The v1.0.0-rc.1 container is experiencing a critical startup failure due to a race condition in the database initialization and migration system. The error sqlite3.OperationalError: no such column: token_hash occurs when SCHEMA_SQL attempts to create indexes for a tokens table structure that no longer exists after migration 002 drops and recreates it.

Root Cause Analysis

The Execution Order Problem

  1. Database Initialization (init_db() in database.py:94-127)

    • Line 115: conn.executescript(SCHEMA_SQL) - Creates initial schema
    • Line 126: run_migrations() - Applies pending migrations
  2. SCHEMA_SQL Definition (database.py:46-60)

    • Creates tokens table WITH token_hash column (lines 46-56)
    • Creates indexes including idx_tokens_hash (line 58)
  3. Migration 002 (002_secure_tokens_and_authorization_codes.sql)

    • Line 17: DROP TABLE IF EXISTS tokens;
    • Lines 20-30: Creates NEW tokens table with same structure
    • Lines 49-51: Creates indexes again

The Critical Issue

For an existing production database (v0.9.5):

  1. Database already has an OLD tokens table (without token_hash column)
  2. init_db() runs SCHEMA_SQL which includes:
    CREATE TABLE IF NOT EXISTS tokens (
        ...
        token_hash TEXT UNIQUE NOT NULL,
        ...
    );
    CREATE INDEX IF NOT EXISTS idx_tokens_hash ON tokens(token_hash);
    
  3. The CREATE TABLE IF NOT EXISTS is a no-op (table exists)
  4. The CREATE INDEX tries to create an index on token_hash column
  5. ERROR: Column token_hash doesn't exist in the old table structure
  6. Container crashes before migrations can run

Why This Wasn't Caught Earlier

  • Fresh databases work fine - SCHEMA_SQL creates the correct structure
  • Test environments likely started fresh or had the new schema
  • Production has an existing v0.9.5 database with the old tokens table structure

The Schema Evolution Mismatch

Original tokens table (v0.9.5)

The old structure likely had columns like:

  • token (plain text - security issue)
  • me
  • client_id
  • scope
  • etc.

New tokens table (v1.0.0-rc.1)

  • token_hash (SHA256 hash - secure)
  • Same other columns

The Problem

SCHEMA_SQL was updated to match the POST-migration structure, but it runs BEFORE migrations. This creates an impossible situation for existing databases.

Migration System Design Flaw

The current system has a fundamental ordering issue:

  1. SCHEMA_SQL should represent the INITIAL schema (v0.1.0)
  2. Migrations should evolve from that base
  3. Current Reality: SCHEMA_SQL represents the LATEST schema

This works for fresh databases but fails for existing ones that need migration.

Option 1: Conditional Index Creation (Quick Fix)

Modify SCHEMA_SQL to use conditional logic or remove problematic indexes from SCHEMA_SQL since migration 002 creates them anyway.

Option 2: Fix Execution Order (Better)

  1. Run migrations BEFORE attempting schema creation
  2. Only use SCHEMA_SQL for truly fresh databases

Option 3: Proper Schema Versioning (Best)

  1. SCHEMA_SQL should be the v0.1.0 schema
  2. All evolution happens through migrations
  3. Fresh databases run all migrations from the beginning

Immediate Workaround

For the production deployment:

  1. Manual intervention before upgrade:

    -- Connect to production database
    -- Manually add the column before v1.0.0-rc.1 starts
    ALTER TABLE tokens ADD COLUMN token_hash TEXT;
    
  2. Then deploy v1.0.0-rc.1:

    • SCHEMA_SQL will succeed (column exists)
    • Migration 002 will drop and recreate the table properly
    • System will work correctly

Verification Steps

  1. Check production database structure:

    PRAGMA table_info(tokens);
    
  2. Verify migration status:

    SELECT * FROM schema_migrations;
    
  3. Test with a v0.9.5 database locally to reproduce

Long-term Architecture Recommendations

  1. Separate Initial Schema from Current Schema

    • INITIAL_SCHEMA_SQL - The v0.1.0 starting point
    • Migrations handle ALL evolution
  2. Migration-First Initialization

    • Check for existing database
    • Run migrations first if database exists
    • Only apply SCHEMA_SQL to truly empty databases
  3. Schema Version Tracking

    • Add a schema_version table
    • Track the current schema version explicitly
    • Make decisions based on version, not heuristics
  4. Testing Strategy

    • Always test upgrades from previous production version
    • Include migration testing in CI/CD pipeline
    • Maintain database snapshots for each released version

Conclusion

This is a critical architectural issue in the migration system that affects all existing production deployments. The immediate fix is straightforward, but the system needs architectural changes to prevent similar issues in future releases.

The core principle violated: SCHEMA_SQL should represent the beginning, not the end state.