Files
StarPunk/docs/decisions/ADR-041-database-migration-conflict-resolution.md
Phil Skentelbery 2b2849a58d docs: Add database migration architecture and conflict resolution documentation
Documents the diagnosis and resolution of database migration detection conflicts
2025-11-24 13:27:19 -07:00

5.1 KiB

ADR-041: Database Migration Conflict Resolution

Status

Accepted

Context

The v1.0.0-rc.2 container deployment is failing with the error:

Migration 002_secure_tokens_and_authorization_codes.sql failed: table authorization_codes already exists

The production database is in a hybrid state:

  1. v1.0.0-rc.1 Impact: The authorization_codes table was created by SCHEMA_SQL in database.py
  2. Missing Elements: The production database lacks the proper indexes that migration 002 would create
  3. Migration Tracking: The schema_migrations table likely shows migration 002 hasn't been applied
  4. Partial Schema: The database has tables/columns from SCHEMA_SQL but not the complete migration features

Root Cause Analysis

The conflict arose from an architectural mismatch between two database initialization strategies:

  1. SCHEMA_SQL Approach: Creates complete schema upfront (including authorization_codes table)
  2. Migration Approach: Expects to create tables that don't exist yet

In v1.0.0-rc.1, SCHEMA_SQL included the authorization_codes table creation (lines 58-76 in database.py). When migration 002 tries to run, it attempts to CREATE TABLE authorization_codes, which already exists.

Current Migration System Logic

The migrations.py file has sophisticated logic to handle this scenario:

  1. Fresh Database Detection (lines 352-368): If schema_migrations is empty and schema is current, mark all migrations as applied
  2. Partial Schema Handling (lines 176-211): For migration 002, it checks if tables exist and creates only missing indexes
  3. Smart Migration Application (lines 383-410): Can apply just indexes without running full migration

However, the production database doesn't trigger the "fresh database" path because:

  • The schema is NOT fully current (missing indexes)
  • The is_schema_current() check (lines 89-95) requires ALL indexes to exist

Decision

The architecture already has the correct solution implemented. The issue is that the production database falls into an edge case where:

  1. Tables exist (from SCHEMA_SQL)
  2. Indexes don't exist (never created)
  3. Migration tracking is empty or partial

The migrations.py file already handles this case correctly in lines 383-410:

  • If migration 002's tables exist but indexes don't, it creates just the indexes
  • Then marks the migration as applied without running the full SQL

Rationale

The existing architecture is sound and handles the hybrid state correctly. The migration system's sophisticated detection logic can:

  1. Identify when tables already exist
  2. Create only the missing pieces (indexes)
  3. Mark migrations as applied appropriately

This approach:

  • Avoids data loss
  • Handles partial schemas gracefully
  • Maintains idempotency
  • Provides clear logging

Consequences

Positive

  1. Zero Data Loss: Existing tables are preserved
  2. Graceful Recovery: System can heal partial schemas automatically
  3. Clear Audit Trail: Migration tracking shows what was applied
  4. Future-Proof: Handles various database states correctly

Negative

  1. Complexity: The migration logic is sophisticated and must be understood
  2. Edge Cases: Requires careful testing of various database states

Implementation Notes

Database State Detection

The system uses multiple checks to determine database state:

# Check for tables
table_exists(conn, 'authorization_codes')

# Check for columns
column_exists(conn, 'tokens', 'token_hash')

# Check for indexes (critical for determining if migration 002 ran)
index_exists(conn, 'idx_tokens_hash')

Hybrid State Resolution

When a database has tables but not indexes:

  1. Migration 002 is detected as "not needed" for table creation
  2. System creates missing indexes individually
  3. Migration is marked as applied

Production Fix Path

For the current production issue:

  1. The v1.0.0-rc.2 container should work correctly
  2. The migration system will detect the hybrid state
  3. It will create only the missing indexes
  4. Migration 002 will be marked as applied

If the error persists, it suggests the migration system isn't detecting the state correctly, which would require investigation of:

  • The exact schema_migrations table contents
  • Which tables/columns/indexes actually exist
  • The execution path through migrations.py

Alternatives Considered

Alternative 1: Remove Tables from SCHEMA_SQL

Rejected: Would break fresh installations

Alternative 2: Make Migration 002 Idempotent

Use CREATE TABLE IF NOT EXISTS in the migration. Rejected: Would hide partial application issues and not handle the DROP TABLE statement correctly

Alternative 3: Version-Specific SCHEMA_SQL

Have different SCHEMA_SQL for different versions. Rejected: Too complex to maintain

Alternative 4: Manual Intervention

Require manual database fixes. Rejected: Goes against the self-healing architecture principle

References

  • migrations.py lines 176-211 (migration 002 detection)
  • migrations.py lines 383-410 (index-only creation)
  • database.py lines 58-76 (authorization_codes in SCHEMA_SQL)
  • Migration file: 002_secure_tokens_and_authorization_codes.sql