Files

Phil Skentelbery 3ed77fd45f fix: Resolve database migration failure on existing databases

Fixes critical issue where migration 002 indexes already existed in SCHEMA_SQL,
causing 'index already exists' errors on databases created before v1.0.0-rc.1.

Changes:
- Removed duplicate index definitions from SCHEMA_SQL (database.py)
- Enhanced migration system to detect and handle indexes properly
- Added comprehensive documentation of the fix

Version bumped to 1.0.0-rc.2 with full changelog entry.

Refs: docs/reports/2025-11-24-migration-fix-v1.0.0-rc.2.md

2025-11-24 13:11:14 -07:00

7.9 KiB

Raw Blame History

ADR-032: Initial Schema SQL Implementation for Migration System

Status

Accepted

Context

As documented in ADR-031, the current database migration system has a critical design flaw: SCHEMA_SQL represents the current (latest) schema structure rather than the initial v0.1.0 schema. This causes upgrade failures for existing databases because:

The system tries to create indexes on columns that don't exist yet
Schema creation happens BEFORE migrations run
There's no clear upgrade path from old to new database structures

Phase 2 of ADR-031's redesign requires creating an INITIAL_SCHEMA_SQL constant that represents the v0.1.0 baseline schema, allowing all schema evolution to happen through migrations.

Decision

Create an INITIAL_SCHEMA_SQL constant that represents the exact database schema from the initial v0.1.0 release (commit a68fd57). This baseline schema will be used for:

Fresh database initialization: Create initial schema then run ALL migrations
Existing database detection: Skip initial schema if tables already exist
Clear upgrade path: Every database follows the same evolution through migrations

INITIAL_SCHEMA_SQL Design

Based on analysis of the initial commit (a68fd57), the INITIAL_SCHEMA_SQL should contain:

-- Notes metadata (content is in files)
CREATE TABLE IF NOT EXISTS notes (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    slug TEXT UNIQUE NOT NULL,
    file_path TEXT UNIQUE NOT NULL,
    published BOOLEAN DEFAULT 0,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    deleted_at TIMESTAMP,
    content_hash TEXT
);

CREATE INDEX IF NOT EXISTS idx_notes_created_at ON notes(created_at DESC);
CREATE INDEX IF NOT EXISTS idx_notes_published ON notes(published);
CREATE INDEX IF NOT EXISTS idx_notes_slug ON notes(slug);
CREATE INDEX IF NOT EXISTS idx_notes_deleted_at ON notes(deleted_at);

-- Authentication sessions (IndieLogin)
CREATE TABLE IF NOT EXISTS sessions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    session_token TEXT UNIQUE NOT NULL,
    me TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL,
    last_used_at TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_sessions_token ON sessions(session_token);
CREATE INDEX IF NOT EXISTS idx_sessions_expires ON sessions(expires_at);

-- Micropub access tokens (original insecure version)
CREATE TABLE IF NOT EXISTS tokens (
    token TEXT PRIMARY KEY,
    me TEXT NOT NULL,
    client_id TEXT,
    scope TEXT,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tokens_me ON tokens(me);

-- CSRF state tokens (for IndieAuth flow)
CREATE TABLE IF NOT EXISTS auth_state (
    state TEXT PRIMARY KEY,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL
);

CREATE INDEX IF NOT EXISTS idx_auth_state_expires ON auth_state(expires_at);

Key Differences from Current SCHEMA_SQL

sessions table: Uses session_token (plain text) instead of session_token_hash
tokens table: Original insecure structure with plain text tokens as PRIMARY KEY
auth_state table: No code_verifier column (added in migration 001)
No authorization_codes table: Added in migration 002
No secure token columns: token_hash, last_used_at, revoked_at added later

Implementation Architecture

# database.py structure
INITIAL_SCHEMA_SQL = """
-- V0.1.0 baseline schema (see ADR-032)
-- [SQL content as shown above]
"""

CURRENT_SCHEMA_SQL = """
-- Current complete schema for reference
-- NOT used for database initialization
-- [Current SCHEMA_SQL content - for documentation only]
"""

def init_db(app=None):
    """Initialize database with proper migration handling"""

    # 1. Check if database exists and has tables
    if database_exists_with_tables():
        # Existing database - only run migrations
        run_migrations(db_path, logger)
    else:
        # Fresh database - create initial schema then migrate
        conn = sqlite3.connect(db_path)
        try:
            # Create v0.1.0 baseline schema
            conn.executescript(INITIAL_SCHEMA_SQL)
            conn.commit()
            logger.info("Created initial v0.1.0 database schema")
        finally:
            conn.close()

        # Run all migrations to bring to current version
        run_migrations(db_path, logger)

Migration Evolution Path

Starting from INITIAL_SCHEMA_SQL, the database evolves through:

Migration 001: Add code_verifier to auth_state (PKCE support)
Migration 002: Secure token storage (complete tokens table rebuild)
Future migrations: Continue evolution from this baseline

Rationale

Why This Specific Schema?

Historical accuracy: Represents the actual v0.1.0 release state
Clean evolution: All changes tracked through migrations
Testable upgrades: Can test upgrade path from any version
No ambiguity: Clear separation between initial and evolved state

Why Not Alternative Approaches?

Not using migration 000: Migrations should represent changes, not initial state
Not using current schema: Would skip migration history for new databases
Not detecting schema dynamically: Too complex and fragile

Consequences

Positive

Reliable upgrades: Any database can upgrade to any version
Clear history: Migration path shows exact evolution
Testable: Can verify upgrade paths in CI/CD
Standard pattern: Follows Rails/Django migration patterns
Maintainable: Single source of truth for initial schema

Negative

Historical maintenance: Must preserve v0.1.0 schema forever
Slower fresh installs: Must run all migrations on new databases
Documentation burden: Need to explain two schema constants

Implementation Requirements

Code Changes:
- Add INITIAL_SCHEMA_SQL constant to database.py
- Modify init_db() to use new initialization logic
- Add database_exists_with_tables() helper function
- Rename current SCHEMA_SQL to CURRENT_SCHEMA_SQL (documentation only)
Testing Requirements:
- Test fresh database initialization
- Test upgrade from v0.1.0 schema
- Test upgrade from each released version
- Test migration replay detection
- Verify all indexes created correctly
Documentation Updates:
- Update database.py docstrings
- Document schema evolution in architecture docs
- Add upgrade guide for production systems
- Update deployment documentation

Migration Strategy

For v1.1.0 Release

Implement INITIAL_SCHEMA_SQL as designed above
Update init_db() with new logic
Comprehensive testing of upgrade paths
Documentation of upgrade procedures
Release notes explaining the change

For Existing Production Systems

After v1.1.0 deployment:

Existing databases will skip INITIAL_SCHEMA_SQL (tables exist)
Migrations run normally to update schema
No manual intervention required
Full backward compatibility maintained

Testing Checklist

Fresh database gets v0.1.0 schema then migrations
Existing v0.1.0 database upgrades correctly
Existing v1.0.0 database upgrades correctly
All indexes created in correct order
No duplicate table/index creation errors
Migration history tracked correctly
Performance acceptable for fresh installs

References

ADR-031: Database Migration System Redesign
Original v0.1.0 schema (commit a68fd57)
Migration 001: Add code_verifier to auth_state
Migration 002: Secure tokens and authorization codes
SQLite documentation on schema management
Rails/Django migration patterns

Implementation Notes

Priority: HIGH - Required for v1.1.0 release Complexity: Medium - Clear requirements but needs careful testing Risk: Low - Backward compatible, well-understood pattern Effort: 4-6 hours including testing

7.9 KiB Raw Blame History