docs: Add database migration architecture and conflict resolution documentation

Documents the diagnosis and resolution of database migration detection conflicts
2025-11-24 13:27:19 -07:00
parent 605681de42
commit 2b2849a58d
3 changed files with 526 additions and 0 deletions
--- a/docs/architecture/database-migration-architecture.md
+++ b/docs/architecture/database-migration-architecture.md
@@ -0,0 +1,212 @@
+# Database Migration Architecture
+
+## Overview
+StarPunk uses a dual-strategy database initialization system that combines immediate schema creation (SCHEMA_SQL) with evolutionary migrations. This architecture provides both fast fresh installations and safe upgrades for existing databases.
+
+## Components
+
+### 1. SCHEMA_SQL (database.py)
+**Purpose**: Define the current complete database schema for fresh installations
+
+**Location**: `/starpunk/database.py` lines 11-87
+
+**Responsibilities**:
+- Create all tables with current structure
+- Create all columns with current types
+- Create base indexes for performance
+- Provide instant database initialization for new installations
+
+**Design Principle**: Always represents the latest schema version
+
+### 2. Migration Files
+**Purpose**: Transform existing databases from one version to another
+
+**Location**: `/migrations/*.sql`
+
+**Format**: `{number}_{description}.sql`
+- Number: Three-digit zero-padded sequence (001, 002, etc.)
+- Description: Clear indication of changes
+
+**Responsibilities**:
+- Add new tables/columns to existing databases
+- Modify existing structures safely
+- Create indexes and constraints
+- Handle breaking changes with data preservation
+
+### 3. Migration Runner (migrations.py)
+**Purpose**: Intelligent application of migrations based on database state
+
+**Location**: `/starpunk/migrations.py`
+
+**Key Features**:
+- Fresh database detection
+- Partial schema recognition
+- Smart migration skipping
+- Index-only application
+- Transaction safety
+
+## Architecture Patterns
+
+### Fresh Database Flow
+```
+1. init_db() called
+2. SCHEMA_SQL executed (creates all current tables/columns)
+3. run_migrations() called
+4. Detects fresh database (empty schema_migrations)
+5. Checks if schema is current (is_schema_current())
+6. If current: marks all migrations as applied (no execution)
+7. If partial: applies only needed migrations
+```
+
+### Existing Database Flow
+```
+1. init_db() called
+2. SCHEMA_SQL executed (CREATE IF NOT EXISTS - no-op for existing tables)
+3. run_migrations() called
+4. Reads schema_migrations table
+5. Discovers migration files
+6. Applies only unapplied migrations in sequence
+```
+
+### Hybrid Database Flow (Production Issue Case)
+```
+1. Database has tables from SCHEMA_SQL but no migration records
+2. run_migrations() detects migration_count == 0
+3. For each migration, calls is_migration_needed()
+4. Migration 002: detects tables exist, indexes missing
+5. Creates only missing indexes
+6. Marks migration as applied without full execution
+```
+
+## State Detection Logic
+
+### is_schema_current() Function
+Determines if database matches current schema version completely.
+
+**Checks**:
+1. Table existence (authorization_codes)
+2. Column existence (token_hash in tokens)
+3. Index existence (idx_tokens_hash, etc.)
+
+**Returns**:
+- True: Schema is completely current (all migrations applied)
+- False: Schema needs migrations
+
+### is_migration_needed() Function
+Determines if a specific migration should be applied.
+
+**For Migration 002**:
+1. Check if authorization_codes table exists
+2. Check if token_hash column exists in tokens
+3. Check if indexes exist
+4. Return True only if tables/columns are missing
+5. Return False if only indexes are missing (handled separately)
+
+## Design Decisions
+
+### Why Dual Strategy?
+1. **Fresh Install Speed**: SCHEMA_SQL provides instant, complete schema
+2. **Upgrade Safety**: Migrations provide controlled, versioned changes
+3. **Flexibility**: Can handle various database states gracefully
+
+### Why Smart Detection?
+1. **Idempotency**: Same code works for any database state
+2. **Self-Healing**: Can fix partial schemas automatically
+3. **No Data Loss**: Never drops tables unnecessarily
+
+### Why Check Indexes Separately?
+1. **SCHEMA_SQL Evolution**: As SCHEMA_SQL includes migration changes, we avoid conflicts
+2. **Granular Control**: Can apply just missing pieces
+3. **Performance**: Indexes can be added without table locks
+
+## Migration Guidelines
+
+### Writing Migrations
+1. **Never use IF NOT EXISTS in migrations**: Migrations should fail if preconditions aren't met
+2. **Always provide rollback path**: Document how to reverse changes
+3. **One logical change per migration**: Keep migrations focused
+4. **Test with various database states**: Fresh, existing, and hybrid
+
+### SCHEMA_SQL Updates
+When updating SCHEMA_SQL after a migration:
+1. Include all changes from the migration
+2. Remove indexes that migrations will create (avoid conflicts)
+3. Keep CREATE IF NOT EXISTS for idempotency
+4. Test fresh installations
+
+## Error Recovery
+
+### Common Issues
+
+#### "Table already exists" Error
+**Cause**: Migration tries to create table that SCHEMA_SQL already created
+
+**Solution**: Smart detection should prevent this. If it fails:
+1. Check if migration is already in schema_migrations
+2. Verify is_migration_needed() logic
+3. Manually mark migration as applied if needed
+
+#### Missing Indexes
+**Cause**: Tables exist from SCHEMA_SQL but indexes weren't created
+
+**Solution**: Migration system creates missing indexes separately
+
+#### Partial Migration Application
+**Cause**: Migration failed partway through
+
+**Solution**: Transactions ensure all-or-nothing. Rollback and retry.
+
+## State Verification Queries
+
+### Check Migration Status
+```sql
+SELECT * FROM schema_migrations ORDER BY id;
+```
+
+### Check Table Existence
+```sql
+SELECT name FROM sqlite_master
+WHERE type='table'
+ORDER BY name;
+```
+
+### Check Index Existence
+```sql
+SELECT name FROM sqlite_master
+WHERE type='index'
+ORDER BY name;
+```
+
+### Check Column Structure
+```sql
+PRAGMA table_info(tokens);
+PRAGMA table_info(authorization_codes);
+```
+
+## Future Improvements
+
+### Potential Enhancements
+1. **Migration Rollback**: Add down() migrations for reversibility
+2. **Schema Versioning**: Add version table for faster state detection
+3. **Migration Validation**: Pre-flight checks before application
+4. **Dry Run Mode**: Test migrations without applying
+
+### Considered Alternatives
+1. **Migrations-Only**: Rejected - slow fresh installs
+2. **SCHEMA_SQL-Only**: Rejected - no upgrade path
+3. **ORM-Based**: Rejected - unnecessary complexity for single-user system
+4. **External Tools**: Rejected - additional dependencies
+
+## Security Considerations
+
+### Migration Safety
+1. All migrations run in transactions
+2. Rollback on any error
+3. No data destruction without explicit user action
+4. Token invalidation documented when necessary
+
+### Schema Security
+1. Tokens stored as SHA256 hashes
+2. Proper indexes for timing attack prevention
+3. Expiration columns for automatic cleanup
+4. Soft deletion support
--- a/docs/decisions/ADR-041-database-migration-conflict-resolution.md
+++ b/docs/decisions/ADR-041-database-migration-conflict-resolution.md
@@ -0,0 +1,123 @@
+# ADR-041: Database Migration Conflict Resolution
+
+## Status
+Accepted
+
+## Context
+The v1.0.0-rc.2 container deployment is failing with the error:
+```
+Migration 002_secure_tokens_and_authorization_codes.sql failed: table authorization_codes already exists
+```
+
+The production database is in a hybrid state:
+1. **v1.0.0-rc.1 Impact**: The `authorization_codes` table was created by SCHEMA_SQL in database.py
+2. **Missing Elements**: The production database lacks the proper indexes that migration 002 would create
+3. **Migration Tracking**: The schema_migrations table likely shows migration 002 hasn't been applied
+4. **Partial Schema**: The database has tables/columns from SCHEMA_SQL but not the complete migration features
+
+### Root Cause Analysis
+The conflict arose from an architectural mismatch between two database initialization strategies:
+1. **SCHEMA_SQL Approach**: Creates complete schema upfront (including authorization_codes table)
+2. **Migration Approach**: Expects to create tables that don't exist yet
+
+In v1.0.0-rc.1, SCHEMA_SQL included the `authorization_codes` table creation (lines 58-76 in database.py). When migration 002 tries to run, it attempts to CREATE TABLE authorization_codes, which already exists.
+
+### Current Migration System Logic
+The migrations.py file has sophisticated logic to handle this scenario:
+1. **Fresh Database Detection** (lines 352-368): If schema_migrations is empty and schema is current, mark all migrations as applied
+2. **Partial Schema Handling** (lines 176-211): For migration 002, it checks if tables exist and creates only missing indexes
+3. **Smart Migration Application** (lines 383-410): Can apply just indexes without running full migration
+
+However, the production database doesn't trigger the "fresh database" path because:
+- The schema is NOT fully current (missing indexes)
+- The is_schema_current() check (lines 89-95) requires ALL indexes to exist
+
+## Decision
+The architecture already has the correct solution implemented. The issue is that the production database falls into an edge case where:
+1. Tables exist (from SCHEMA_SQL)
+2. Indexes don't exist (never created)
+3. Migration tracking is empty or partial
+
+The migrations.py file already handles this case correctly in lines 383-410:
+- If migration 002's tables exist but indexes don't, it creates just the indexes
+- Then marks the migration as applied without running the full SQL
+
+## Rationale
+The existing architecture is sound and handles the hybrid state correctly. The migration system's sophisticated detection logic can:
+1. Identify when tables already exist
+2. Create only the missing pieces (indexes)
+3. Mark migrations as applied appropriately
+
+This approach:
+- Avoids data loss
+- Handles partial schemas gracefully
+- Maintains idempotency
+- Provides clear logging
+
+## Consequences
+
+### Positive
+1. **Zero Data Loss**: Existing tables are preserved
+2. **Graceful Recovery**: System can heal partial schemas automatically
+3. **Clear Audit Trail**: Migration tracking shows what was applied
+4. **Future-Proof**: Handles various database states correctly
+
+### Negative
+1. **Complexity**: The migration logic is sophisticated and must be understood
+2. **Edge Cases**: Requires careful testing of various database states
+
+## Implementation Notes
+
+### Database State Detection
+The system uses multiple checks to determine database state:
+```python
+# Check for tables
+table_exists(conn, 'authorization_codes')
+
+# Check for columns
+column_exists(conn, 'tokens', 'token_hash')
+
+# Check for indexes (critical for determining if migration 002 ran)
+index_exists(conn, 'idx_tokens_hash')
+```
+
+### Hybrid State Resolution
+When a database has tables but not indexes:
+1. Migration 002 is detected as "not needed" for table creation
+2. System creates missing indexes individually
+3. Migration is marked as applied
+
+### Production Fix Path
+For the current production issue:
+1. The v1.0.0-rc.2 container should work correctly
+2. The migration system will detect the hybrid state
+3. It will create only the missing indexes
+4. Migration 002 will be marked as applied
+
+If the error persists, it suggests the migration system isn't detecting the state correctly, which would require investigation of:
+- The exact schema_migrations table contents
+- Which tables/columns/indexes actually exist
+- The execution path through migrations.py
+
+## Alternatives Considered
+
+### Alternative 1: Remove Tables from SCHEMA_SQL
+**Rejected**: Would break fresh installations
+
+### Alternative 2: Make Migration 002 Idempotent
+Use CREATE TABLE IF NOT EXISTS in the migration.
+**Rejected**: Would hide partial application issues and not handle the DROP TABLE statement correctly
+
+### Alternative 3: Version-Specific SCHEMA_SQL
+Have different SCHEMA_SQL for different versions.
+**Rejected**: Too complex to maintain
+
+### Alternative 4: Manual Intervention
+Require manual database fixes.
+**Rejected**: Goes against the self-healing architecture principle
+
+## References
+- migrations.py lines 176-211 (migration 002 detection)
+- migrations.py lines 383-410 (index-only creation)
+- database.py lines 58-76 (authorization_codes in SCHEMA_SQL)
+- Migration file: 002_secure_tokens_and_authorization_codes.sql
--- a/docs/reports/database-migration-conflict-diagnosis.md
+++ b/docs/reports/database-migration-conflict-diagnosis.md
@@ -0,0 +1,191 @@
+# Database Migration Conflict Diagnosis Report
+
+## Executive Summary
+The v1.0.0-rc.2 container is failing because migration 002 attempts to CREATE TABLE authorization_codes, but this table already exists in the production database (created by v1.0.0-rc.1's SCHEMA_SQL).
+
+## Issue Details
+
+### Error Message
+```
+Migration 002_secure_tokens_and_authorization_codes.sql failed: table authorization_codes already exists
+```
+
+### Root Cause
+**Conflicting Database Initialization Strategies**
+
+1. **SCHEMA_SQL in database.py (lines 58-76)**: Creates the `authorization_codes` table directly
+2. **Migration 002 (line 33)**: Also attempts to CREATE TABLE authorization_codes
+
+The production database was initialized with v1.0.0-rc.1's SCHEMA_SQL, which created the table. When v1.0.0-rc.2 runs, migration 002 fails because the table already exists.
+
+## Database State Analysis
+
+### What v1.0.0-rc.1 Created (via SCHEMA_SQL)
+```sql
+-- From database.py lines 58-76
+CREATE TABLE IF NOT EXISTS authorization_codes (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    code_hash TEXT UNIQUE NOT NULL,
+    me TEXT NOT NULL,
+    client_id TEXT NOT NULL,
+    redirect_uri TEXT NOT NULL,
+    scope TEXT,
+    state TEXT,
+    code_challenge TEXT,
+    code_challenge_method TEXT,
+    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    expires_at TIMESTAMP NOT NULL,
+    used_at TIMESTAMP
+);
+
+CREATE INDEX IF NOT EXISTS idx_auth_codes_hash ON authorization_codes(code_hash);
+CREATE INDEX IF NOT EXISTS idx_auth_codes_expires ON authorization_codes(expires_at);
+```
+
+### What Migration 002 Tries to Do
+```sql
+-- From migration 002 lines 33-46
+CREATE TABLE authorization_codes (  -- NO "IF NOT EXISTS" clause!
+    -- Same structure as above
+);
+```
+
+The migration uses CREATE TABLE without IF NOT EXISTS, causing it to fail when the table already exists.
+
+## The Good News: System Already Has the Solution
+
+The migrations.py file has sophisticated logic to handle this exact scenario:
+
+### Detection Logic (migrations.py lines 176-211)
+```python
+def is_migration_needed(conn, migration_name):
+    if migration_name == "002_secure_tokens_and_authorization_codes.sql":
+        # Check if tables exist
+        if not table_exists(conn, 'authorization_codes'):
+            return True  # Run full migration
+        if not column_exists(conn, 'tokens', 'token_hash'):
+            return True  # Run full migration
+
+        # Check if indexes exist
+        has_all_indexes = (
+            index_exists(conn, 'idx_tokens_hash') and
+            index_exists(conn, 'idx_tokens_me') and
+            # ... other index checks
+        )
+
+        if not has_all_indexes:
+            # Tables exist but indexes missing
+            # Don't run full migration, handle separately
+            return False
+```
+
+### Resolution Logic (migrations.py lines 383-410)
+When tables exist but indexes are missing:
+```python
+if migration_name == "002_secure_tokens_and_authorization_codes.sql":
+    # Create only missing indexes
+    indexes_to_create = []
+    if not index_exists(conn, 'idx_tokens_hash'):
+        indexes_to_create.append("CREATE INDEX idx_tokens_hash ON tokens(token_hash)")
+    # ... check and create other indexes
+
+    # Apply indexes without running full migration
+    for index_sql in indexes_to_create:
+        conn.execute(index_sql)
+
+    # Mark migration as applied
+    conn.execute(
+        "INSERT INTO schema_migrations (migration_name) VALUES (?)",
+        (migration_name,)
+    )
+```
+
+## Why Is It Still Failing?
+
+The error suggests the smart detection logic isn't being triggered. Possible reasons:
+
+1. **Migration Already Marked as Applied**: Check if schema_migrations table already has migration 002 listed
+2. **Different Code Path**: The production container might not be using the smart detection path
+3. **Transaction Rollback**: An earlier error might have left the database in an inconsistent state
+
+## Immediate Solution
+
+### Option 1: Verify Smart Detection Is Working
+The system SHOULD handle this automatically. If it's not, check:
+1. Is migrations.py line 378 being reached? (migration_count == 0 check)
+2. Is is_migration_needed() being called for migration 002?
+3. Are the table existence checks working correctly?
+
+### Option 2: Manual Database Fix (if smart detection fails)
+```sql
+-- Check current state
+SELECT * FROM schema_migrations WHERE migration_name LIKE '%002%';
+
+-- If migration 002 is NOT listed, mark it as applied
+INSERT INTO schema_migrations (migration_name)
+VALUES ('002_secure_tokens_and_authorization_codes.sql');
+
+-- Ensure indexes exist (if missing)
+CREATE INDEX IF NOT EXISTS idx_tokens_hash ON tokens(token_hash);
+CREATE INDEX IF NOT EXISTS idx_tokens_me ON tokens(me);
+CREATE INDEX IF NOT EXISTS idx_tokens_expires ON tokens(expires_at);
+CREATE INDEX IF NOT EXISTS idx_auth_codes_hash ON authorization_codes(code_hash);
+CREATE INDEX IF NOT EXISTS idx_auth_codes_expires ON authorization_codes(expires_at);
+```
+
+## Long-term Architecture Fix
+
+### Current Issue
+SCHEMA_SQL and migrations have overlapping responsibilities:
+- SCHEMA_SQL creates authorization_codes table (v1.0.0-rc.1+)
+- Migration 002 also creates authorization_codes table
+
+### Recommended Solution
+**Already Implemented!** The smart detection in migrations.py handles this correctly.
+
+### Why It Should Work
+1. When database has tables from SCHEMA_SQL but no migration records:
+   - is_migration_needed() detects tables exist
+   - Returns False to skip full migration
+   - Creates only missing indexes
+   - Marks migration as applied
+
+2. The system is designed to be self-healing and handle partial schemas
+
+## Verification Steps
+
+1. **Check Migration Status**:
+   ```sql
+   SELECT * FROM schema_migrations;
+   ```
+
+2. **Check Table Existence**:
+   ```sql
+   SELECT name FROM sqlite_master
+   WHERE type='table' AND name='authorization_codes';
+   ```
+
+3. **Check Index Existence**:
+   ```sql
+   SELECT name FROM sqlite_master
+   WHERE type='index' AND name LIKE 'idx_%';
+   ```
+
+4. **Check Schema Version Detection**:
+   - The is_schema_current() function should return False (missing indexes)
+   - This should trigger the smart migration path
+
+## Conclusion
+
+The architecture already has the correct solution implemented in migrations.py. The smart detection logic should:
+1. Detect that authorization_codes table exists
+2. Skip the table creation
+3. Create only missing indexes
+4. Mark migration 002 as applied
+
+If this isn't working, the issue is likely:
+- A bug in the detection logic execution path
+- The production database already has migration 002 marked as applied (check schema_migrations)
+- A transaction rollback leaving the database in an inconsistent state
+
+The system is designed to handle this exact scenario. If it's failing, we need to debug why the smart detection isn't being triggered.