Files
StarPunk/docs/decisions/ADR-032-initial-schema-sql-implementation.md
Phil Skentelbery 3ed77fd45f fix: Resolve database migration failure on existing databases
Fixes critical issue where migration 002 indexes already existed in SCHEMA_SQL,
causing 'index already exists' errors on databases created before v1.0.0-rc.1.

Changes:
- Removed duplicate index definitions from SCHEMA_SQL (database.py)
- Enhanced migration system to detect and handle indexes properly
- Added comprehensive documentation of the fix

Version bumped to 1.0.0-rc.2 with full changelog entry.

Refs: docs/reports/2025-11-24-migration-fix-v1.0.0-rc.2.md
2025-11-24 13:11:14 -07:00

7.9 KiB

ADR-032: Initial Schema SQL Implementation for Migration System

Status

Accepted

Context

As documented in ADR-031, the current database migration system has a critical design flaw: SCHEMA_SQL represents the current (latest) schema structure rather than the initial v0.1.0 schema. This causes upgrade failures for existing databases because:

  1. The system tries to create indexes on columns that don't exist yet
  2. Schema creation happens BEFORE migrations run
  3. There's no clear upgrade path from old to new database structures

Phase 2 of ADR-031's redesign requires creating an INITIAL_SCHEMA_SQL constant that represents the v0.1.0 baseline schema, allowing all schema evolution to happen through migrations.

Decision

Create an INITIAL_SCHEMA_SQL constant that represents the exact database schema from the initial v0.1.0 release (commit a68fd57). This baseline schema will be used for:

  1. Fresh database initialization: Create initial schema then run ALL migrations
  2. Existing database detection: Skip initial schema if tables already exist
  3. Clear upgrade path: Every database follows the same evolution through migrations

INITIAL_SCHEMA_SQL Design

Based on analysis of the initial commit (a68fd57), the INITIAL_SCHEMA_SQL should contain:

-- Notes metadata (content is in files)
CREATE TABLE IF NOT EXISTS notes (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    slug TEXT UNIQUE NOT NULL,
    file_path TEXT UNIQUE NOT NULL,
    published BOOLEAN DEFAULT 0,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    deleted_at TIMESTAMP,
    content_hash TEXT
);

CREATE INDEX IF NOT EXISTS idx_notes_created_at ON notes(created_at DESC);
CREATE INDEX IF NOT EXISTS idx_notes_published ON notes(published);
CREATE INDEX IF NOT EXISTS idx_notes_slug ON notes(slug);
CREATE INDEX IF NOT EXISTS idx_notes_deleted_at ON notes(deleted_at);

-- Authentication sessions (IndieLogin)
CREATE TABLE IF NOT EXISTS sessions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    session_token TEXT UNIQUE NOT NULL,
    me TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL,
    last_used_at TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_sessions_token ON sessions(session_token);
CREATE INDEX IF NOT EXISTS idx_sessions_expires ON sessions(expires_at);

-- Micropub access tokens (original insecure version)
CREATE TABLE IF NOT EXISTS tokens (
    token TEXT PRIMARY KEY,
    me TEXT NOT NULL,
    client_id TEXT,
    scope TEXT,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tokens_me ON tokens(me);

-- CSRF state tokens (for IndieAuth flow)
CREATE TABLE IF NOT EXISTS auth_state (
    state TEXT PRIMARY KEY,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL
);

CREATE INDEX IF NOT EXISTS idx_auth_state_expires ON auth_state(expires_at);

Key Differences from Current SCHEMA_SQL

  1. sessions table: Uses session_token (plain text) instead of session_token_hash
  2. tokens table: Original insecure structure with plain text tokens as PRIMARY KEY
  3. auth_state table: No code_verifier column (added in migration 001)
  4. No authorization_codes table: Added in migration 002
  5. No secure token columns: token_hash, last_used_at, revoked_at added later

Implementation Architecture

# database.py structure
INITIAL_SCHEMA_SQL = """
-- V0.1.0 baseline schema (see ADR-032)
-- [SQL content as shown above]
"""

CURRENT_SCHEMA_SQL = """
-- Current complete schema for reference
-- NOT used for database initialization
-- [Current SCHEMA_SQL content - for documentation only]
"""

def init_db(app=None):
    """Initialize database with proper migration handling"""

    # 1. Check if database exists and has tables
    if database_exists_with_tables():
        # Existing database - only run migrations
        run_migrations(db_path, logger)
    else:
        # Fresh database - create initial schema then migrate
        conn = sqlite3.connect(db_path)
        try:
            # Create v0.1.0 baseline schema
            conn.executescript(INITIAL_SCHEMA_SQL)
            conn.commit()
            logger.info("Created initial v0.1.0 database schema")
        finally:
            conn.close()

        # Run all migrations to bring to current version
        run_migrations(db_path, logger)

Migration Evolution Path

Starting from INITIAL_SCHEMA_SQL, the database evolves through:

  1. Migration 001: Add code_verifier to auth_state (PKCE support)
  2. Migration 002: Secure token storage (complete tokens table rebuild)
  3. Future migrations: Continue evolution from this baseline

Rationale

Why This Specific Schema?

  1. Historical accuracy: Represents the actual v0.1.0 release state
  2. Clean evolution: All changes tracked through migrations
  3. Testable upgrades: Can test upgrade path from any version
  4. No ambiguity: Clear separation between initial and evolved state

Why Not Alternative Approaches?

  1. Not using migration 000: Migrations should represent changes, not initial state
  2. Not using current schema: Would skip migration history for new databases
  3. Not detecting schema dynamically: Too complex and fragile

Consequences

Positive

  • Reliable upgrades: Any database can upgrade to any version
  • Clear history: Migration path shows exact evolution
  • Testable: Can verify upgrade paths in CI/CD
  • Standard pattern: Follows Rails/Django migration patterns
  • Maintainable: Single source of truth for initial schema

Negative

  • Historical maintenance: Must preserve v0.1.0 schema forever
  • Slower fresh installs: Must run all migrations on new databases
  • Documentation burden: Need to explain two schema constants

Implementation Requirements

  1. Code Changes:

    • Add INITIAL_SCHEMA_SQL constant to database.py
    • Modify init_db() to use new initialization logic
    • Add database_exists_with_tables() helper function
    • Rename current SCHEMA_SQL to CURRENT_SCHEMA_SQL (documentation only)
  2. Testing Requirements:

    • Test fresh database initialization
    • Test upgrade from v0.1.0 schema
    • Test upgrade from each released version
    • Test migration replay detection
    • Verify all indexes created correctly
  3. Documentation Updates:

    • Update database.py docstrings
    • Document schema evolution in architecture docs
    • Add upgrade guide for production systems
    • Update deployment documentation

Migration Strategy

For v1.1.0 Release

  1. Implement INITIAL_SCHEMA_SQL as designed above
  2. Update init_db() with new logic
  3. Comprehensive testing of upgrade paths
  4. Documentation of upgrade procedures
  5. Release notes explaining the change

For Existing Production Systems

After v1.1.0 deployment:

  1. Existing databases will skip INITIAL_SCHEMA_SQL (tables exist)
  2. Migrations run normally to update schema
  3. No manual intervention required
  4. Full backward compatibility maintained

Testing Checklist

  • Fresh database gets v0.1.0 schema then migrations
  • Existing v0.1.0 database upgrades correctly
  • Existing v1.0.0 database upgrades correctly
  • All indexes created in correct order
  • No duplicate table/index creation errors
  • Migration history tracked correctly
  • Performance acceptable for fresh installs

References

  • ADR-031: Database Migration System Redesign
  • Original v0.1.0 schema (commit a68fd57)
  • Migration 001: Add code_verifier to auth_state
  • Migration 002: Secure tokens and authorization codes
  • SQLite documentation on schema management
  • Rails/Django migration patterns

Implementation Notes

Priority: HIGH - Required for v1.1.0 release Complexity: Medium - Clear requirements but needs careful testing Risk: Low - Backward compatible, well-understood pattern Effort: 4-6 hours including testing