Fixes critical issue where migration 002 indexes already existed in SCHEMA_SQL, causing 'index already exists' errors on databases created before v1.0.0-rc.1. Changes: - Removed duplicate index definitions from SCHEMA_SQL (database.py) - Enhanced migration system to detect and handle indexes properly - Added comprehensive documentation of the fix Version bumped to 1.0.0-rc.2 with full changelog entry. Refs: docs/reports/2025-11-24-migration-fix-v1.0.0-rc.2.md
7.9 KiB
ADR-032: Initial Schema SQL Implementation for Migration System
Status
Accepted
Context
As documented in ADR-031, the current database migration system has a critical design flaw: SCHEMA_SQL represents the current (latest) schema structure rather than the initial v0.1.0 schema. This causes upgrade failures for existing databases because:
- The system tries to create indexes on columns that don't exist yet
- Schema creation happens BEFORE migrations run
- There's no clear upgrade path from old to new database structures
Phase 2 of ADR-031's redesign requires creating an INITIAL_SCHEMA_SQL constant that represents the v0.1.0 baseline schema, allowing all schema evolution to happen through migrations.
Decision
Create an INITIAL_SCHEMA_SQL constant that represents the exact database schema from the initial v0.1.0 release (commit a68fd57). This baseline schema will be used for:
- Fresh database initialization: Create initial schema then run ALL migrations
- Existing database detection: Skip initial schema if tables already exist
- Clear upgrade path: Every database follows the same evolution through migrations
INITIAL_SCHEMA_SQL Design
Based on analysis of the initial commit (a68fd57), the INITIAL_SCHEMA_SQL should contain:
-- Notes metadata (content is in files)
CREATE TABLE IF NOT EXISTS notes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
slug TEXT UNIQUE NOT NULL,
file_path TEXT UNIQUE NOT NULL,
published BOOLEAN DEFAULT 0,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
deleted_at TIMESTAMP,
content_hash TEXT
);
CREATE INDEX IF NOT EXISTS idx_notes_created_at ON notes(created_at DESC);
CREATE INDEX IF NOT EXISTS idx_notes_published ON notes(published);
CREATE INDEX IF NOT EXISTS idx_notes_slug ON notes(slug);
CREATE INDEX IF NOT EXISTS idx_notes_deleted_at ON notes(deleted_at);
-- Authentication sessions (IndieLogin)
CREATE TABLE IF NOT EXISTS sessions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_token TEXT UNIQUE NOT NULL,
me TEXT NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP NOT NULL,
last_used_at TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_sessions_token ON sessions(session_token);
CREATE INDEX IF NOT EXISTS idx_sessions_expires ON sessions(expires_at);
-- Micropub access tokens (original insecure version)
CREATE TABLE IF NOT EXISTS tokens (
token TEXT PRIMARY KEY,
me TEXT NOT NULL,
client_id TEXT,
scope TEXT,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_tokens_me ON tokens(me);
-- CSRF state tokens (for IndieAuth flow)
CREATE TABLE IF NOT EXISTS auth_state (
state TEXT PRIMARY KEY,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_auth_state_expires ON auth_state(expires_at);
Key Differences from Current SCHEMA_SQL
- sessions table: Uses
session_token(plain text) instead ofsession_token_hash - tokens table: Original insecure structure with plain text tokens as PRIMARY KEY
- auth_state table: No
code_verifiercolumn (added in migration 001) - No authorization_codes table: Added in migration 002
- No secure token columns: token_hash, last_used_at, revoked_at added later
Implementation Architecture
# database.py structure
INITIAL_SCHEMA_SQL = """
-- V0.1.0 baseline schema (see ADR-032)
-- [SQL content as shown above]
"""
CURRENT_SCHEMA_SQL = """
-- Current complete schema for reference
-- NOT used for database initialization
-- [Current SCHEMA_SQL content - for documentation only]
"""
def init_db(app=None):
"""Initialize database with proper migration handling"""
# 1. Check if database exists and has tables
if database_exists_with_tables():
# Existing database - only run migrations
run_migrations(db_path, logger)
else:
# Fresh database - create initial schema then migrate
conn = sqlite3.connect(db_path)
try:
# Create v0.1.0 baseline schema
conn.executescript(INITIAL_SCHEMA_SQL)
conn.commit()
logger.info("Created initial v0.1.0 database schema")
finally:
conn.close()
# Run all migrations to bring to current version
run_migrations(db_path, logger)
Migration Evolution Path
Starting from INITIAL_SCHEMA_SQL, the database evolves through:
- Migration 001: Add code_verifier to auth_state (PKCE support)
- Migration 002: Secure token storage (complete tokens table rebuild)
- Future migrations: Continue evolution from this baseline
Rationale
Why This Specific Schema?
- Historical accuracy: Represents the actual v0.1.0 release state
- Clean evolution: All changes tracked through migrations
- Testable upgrades: Can test upgrade path from any version
- No ambiguity: Clear separation between initial and evolved state
Why Not Alternative Approaches?
- Not using migration 000: Migrations should represent changes, not initial state
- Not using current schema: Would skip migration history for new databases
- Not detecting schema dynamically: Too complex and fragile
Consequences
Positive
- Reliable upgrades: Any database can upgrade to any version
- Clear history: Migration path shows exact evolution
- Testable: Can verify upgrade paths in CI/CD
- Standard pattern: Follows Rails/Django migration patterns
- Maintainable: Single source of truth for initial schema
Negative
- Historical maintenance: Must preserve v0.1.0 schema forever
- Slower fresh installs: Must run all migrations on new databases
- Documentation burden: Need to explain two schema constants
Implementation Requirements
-
Code Changes:
- Add
INITIAL_SCHEMA_SQLconstant todatabase.py - Modify
init_db()to use new initialization logic - Add
database_exists_with_tables()helper function - Rename current
SCHEMA_SQLtoCURRENT_SCHEMA_SQL(documentation only)
- Add
-
Testing Requirements:
- Test fresh database initialization
- Test upgrade from v0.1.0 schema
- Test upgrade from each released version
- Test migration replay detection
- Verify all indexes created correctly
-
Documentation Updates:
- Update database.py docstrings
- Document schema evolution in architecture docs
- Add upgrade guide for production systems
- Update deployment documentation
Migration Strategy
For v1.1.0 Release
- Implement INITIAL_SCHEMA_SQL as designed above
- Update init_db() with new logic
- Comprehensive testing of upgrade paths
- Documentation of upgrade procedures
- Release notes explaining the change
For Existing Production Systems
After v1.1.0 deployment:
- Existing databases will skip INITIAL_SCHEMA_SQL (tables exist)
- Migrations run normally to update schema
- No manual intervention required
- Full backward compatibility maintained
Testing Checklist
- Fresh database gets v0.1.0 schema then migrations
- Existing v0.1.0 database upgrades correctly
- Existing v1.0.0 database upgrades correctly
- All indexes created in correct order
- No duplicate table/index creation errors
- Migration history tracked correctly
- Performance acceptable for fresh installs
References
- ADR-031: Database Migration System Redesign
- Original v0.1.0 schema (commit
a68fd57) - Migration 001: Add code_verifier to auth_state
- Migration 002: Secure tokens and authorization codes
- SQLite documentation on schema management
- Rails/Django migration patterns
Implementation Notes
Priority: HIGH - Required for v1.1.0 release Complexity: Medium - Clear requirements but needs careful testing Risk: Low - Backward compatible, well-understood pattern Effort: 4-6 hours including testing