Fixes critical issue where migration 002 indexes already existed in SCHEMA_SQL, causing 'index already exists' errors on databases created before v1.0.0-rc.1. Changes: - Removed duplicate index definitions from SCHEMA_SQL (database.py) - Enhanced migration system to detect and handle indexes properly - Added comprehensive documentation of the fix Version bumped to 1.0.0-rc.2 with full changelog entry. Refs: docs/reports/2025-11-24-migration-fix-v1.0.0-rc.2.md
5.0 KiB
Migration Failure Diagnosis - v1.0.0-rc.1
Executive Summary
The v1.0.0-rc.1 container is experiencing a critical startup failure due to a race condition in the database initialization and migration system. The error sqlite3.OperationalError: no such column: token_hash occurs when SCHEMA_SQL attempts to create indexes for a tokens table structure that no longer exists after migration 002 drops and recreates it.
Root Cause Analysis
The Execution Order Problem
-
Database Initialization (
init_db()indatabase.py:94-127)- Line 115:
conn.executescript(SCHEMA_SQL)- Creates initial schema - Line 126:
run_migrations()- Applies pending migrations
- Line 115:
-
SCHEMA_SQL Definition (
database.py:46-60)- Creates
tokenstable WITHtoken_hashcolumn (lines 46-56) - Creates indexes including
idx_tokens_hash(line 58)
- Creates
-
Migration 002 (
002_secure_tokens_and_authorization_codes.sql)- Line 17:
DROP TABLE IF EXISTS tokens; - Lines 20-30: Creates NEW
tokenstable with same structure - Lines 49-51: Creates indexes again
- Line 17:
The Critical Issue
For an existing production database (v0.9.5):
- Database already has an OLD
tokenstable (withouttoken_hashcolumn) init_db()runsSCHEMA_SQLwhich includes:CREATE TABLE IF NOT EXISTS tokens ( ... token_hash TEXT UNIQUE NOT NULL, ... ); CREATE INDEX IF NOT EXISTS idx_tokens_hash ON tokens(token_hash);- The
CREATE TABLE IF NOT EXISTSis a no-op (table exists) - The
CREATE INDEXtries to create an index ontoken_hashcolumn - ERROR: Column
token_hashdoesn't exist in the old table structure - Container crashes before migrations can run
Why This Wasn't Caught Earlier
- Fresh databases work fine - SCHEMA_SQL creates the correct structure
- Test environments likely started fresh or had the new schema
- Production has an existing v0.9.5 database with the old
tokenstable structure
The Schema Evolution Mismatch
Original tokens table (v0.9.5)
The old structure likely had columns like:
token(plain text - security issue)meclient_idscope- etc.
New tokens table (v1.0.0-rc.1)
token_hash(SHA256 hash - secure)- Same other columns
The Problem
SCHEMA_SQL was updated to match the POST-migration structure, but it runs BEFORE migrations. This creates an impossible situation for existing databases.
Migration System Design Flaw
The current system has a fundamental ordering issue:
- SCHEMA_SQL should represent the INITIAL schema (v0.1.0)
- Migrations should evolve from that base
- Current Reality: SCHEMA_SQL represents the LATEST schema
This works for fresh databases but fails for existing ones that need migration.
Recommended Fix
Option 1: Conditional Index Creation (Quick Fix)
Modify SCHEMA_SQL to use conditional logic or remove problematic indexes from SCHEMA_SQL since migration 002 creates them anyway.
Option 2: Fix Execution Order (Better)
- Run migrations BEFORE attempting schema creation
- Only use SCHEMA_SQL for truly fresh databases
Option 3: Proper Schema Versioning (Best)
- SCHEMA_SQL should be the v0.1.0 schema
- All evolution happens through migrations
- Fresh databases run all migrations from the beginning
Immediate Workaround
For the production deployment:
-
Manual intervention before upgrade:
-- Connect to production database -- Manually add the column before v1.0.0-rc.1 starts ALTER TABLE tokens ADD COLUMN token_hash TEXT; -
Then deploy v1.0.0-rc.1:
- SCHEMA_SQL will succeed (column exists)
- Migration 002 will drop and recreate the table properly
- System will work correctly
Verification Steps
-
Check production database structure:
PRAGMA table_info(tokens); -
Verify migration status:
SELECT * FROM schema_migrations; -
Test with a v0.9.5 database locally to reproduce
Long-term Architecture Recommendations
-
Separate Initial Schema from Current Schema
INITIAL_SCHEMA_SQL- The v0.1.0 starting point- Migrations handle ALL evolution
-
Migration-First Initialization
- Check for existing database
- Run migrations first if database exists
- Only apply SCHEMA_SQL to truly empty databases
-
Schema Version Tracking
- Add a
schema_versiontable - Track the current schema version explicitly
- Make decisions based on version, not heuristics
- Add a
-
Testing Strategy
- Always test upgrades from previous production version
- Include migration testing in CI/CD pipeline
- Maintain database snapshots for each released version
Conclusion
This is a critical architectural issue in the migration system that affects all existing production deployments. The immediate fix is straightforward, but the system needs architectural changes to prevent similar issues in future releases.
The core principle violated: SCHEMA_SQL should represent the beginning, not the end state.