fix: Resolve database migration failure on existing databases
Fixes critical issue where migration 002 indexes already existed in SCHEMA_SQL, causing 'index already exists' errors on databases created before v1.0.0-rc.1. Changes: - Removed duplicate index definitions from SCHEMA_SQL (database.py) - Enhanced migration system to detect and handle indexes properly - Added comprehensive documentation of the fix Version bumped to 1.0.0-rc.2 with full changelog entry. Refs: docs/reports/2025-11-24-migration-fix-v1.0.0-rc.2.md
This commit is contained in:
144
docs/decisions/ADR-031-database-migration-system-redesign.md
Normal file
144
docs/decisions/ADR-031-database-migration-system-redesign.md
Normal file
@@ -0,0 +1,144 @@
|
||||
# ADR-031: Database Migration System Redesign
|
||||
|
||||
## Status
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
|
||||
The v1.0.0-rc.1 release exposed a critical flaw in our database initialization and migration system. The system fails when upgrading existing production databases because:
|
||||
|
||||
1. `SCHEMA_SQL` represents the current (latest) schema structure
|
||||
2. `SCHEMA_SQL` is executed BEFORE migrations run
|
||||
3. Existing databases have old table structures that conflict with SCHEMA_SQL's expectations
|
||||
4. The system tries to create indexes on columns that don't exist yet
|
||||
|
||||
This creates an impossible situation where:
|
||||
- Fresh databases work fine (SCHEMA_SQL creates the latest structure)
|
||||
- Existing databases fail (SCHEMA_SQL conflicts with old structure)
|
||||
|
||||
## Decision
|
||||
|
||||
Redesign the database initialization system to follow these principles:
|
||||
|
||||
1. **SCHEMA_SQL represents the initial v0.1.0 schema**, not the current schema
|
||||
2. **All schema evolution happens through migrations**
|
||||
3. **Migrations run BEFORE schema creation attempts**
|
||||
4. **Fresh databases get the initial schema then run ALL migrations**
|
||||
|
||||
### Implementation Strategy
|
||||
|
||||
#### Phase 1: Immediate Fix (v1.0.1)
|
||||
Remove problematic index creation from SCHEMA_SQL since migrations create them:
|
||||
```python
|
||||
# Remove from SCHEMA_SQL:
|
||||
# CREATE INDEX IF NOT EXISTS idx_tokens_hash ON tokens(token_hash);
|
||||
# Let migration 002 handle this
|
||||
```
|
||||
|
||||
#### Phase 2: Proper Redesign (v1.1.0)
|
||||
1. Create `INITIAL_SCHEMA_SQL` with the v0.1.0 database structure
|
||||
2. Modify `init_db()` logic:
|
||||
```python
|
||||
def init_db(app=None):
|
||||
# 1. Check if database exists and has tables
|
||||
if database_exists_with_tables():
|
||||
# Existing database - only run migrations
|
||||
run_migrations()
|
||||
else:
|
||||
# Fresh database - create initial schema then migrate
|
||||
conn.executescript(INITIAL_SCHEMA_SQL)
|
||||
run_all_migrations()
|
||||
```
|
||||
|
||||
3. Add explicit schema versioning:
|
||||
```sql
|
||||
CREATE TABLE schema_info (
|
||||
version TEXT PRIMARY KEY,
|
||||
upgraded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
## Rationale
|
||||
|
||||
### Why Initial Schema + Migrations?
|
||||
|
||||
1. **Predictable upgrade path**: Every database follows the same evolution
|
||||
2. **Testable**: Can test upgrades from any version to any version
|
||||
3. **Auditable**: Migration history shows exact evolution path
|
||||
4. **Reversible**: Can potentially support rollbacks
|
||||
5. **Industry standard**: Follows patterns from Rails, Django, Alembic
|
||||
|
||||
### Why Current Approach Failed
|
||||
|
||||
1. **Dual source of truth**: Schema defined in both SCHEMA_SQL and migrations
|
||||
2. **Temporal coupling**: SCHEMA_SQL assumes post-migration state
|
||||
3. **No upgrade path**: Can't get from old state to new state
|
||||
4. **Hidden dependencies**: Index creation depends on migration execution
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- Reliable database upgrades from any version
|
||||
- Clear separation of concerns (initial vs evolution)
|
||||
- Easier to test migration paths
|
||||
- Follows established patterns
|
||||
- Supports future rollback capabilities
|
||||
|
||||
### Negative
|
||||
- Requires maintaining historical schema (INITIAL_SCHEMA_SQL)
|
||||
- Fresh databases take longer to initialize (run all migrations)
|
||||
- More complex initialization logic
|
||||
- Need to reconstruct v0.1.0 schema
|
||||
|
||||
### Migration Path
|
||||
1. v1.0.1: Quick fix - remove conflicting indexes from SCHEMA_SQL
|
||||
2. v1.0.1: Add manual upgrade instructions for production
|
||||
3. v1.1.0: Implement full redesign with INITIAL_SCHEMA_SQL
|
||||
4. v1.1.0: Add comprehensive migration testing
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### 1. Dynamic Schema Detection
|
||||
**Approach**: Detect existing table structure and conditionally apply indexes
|
||||
|
||||
**Rejected because**:
|
||||
- Complex conditional logic
|
||||
- Fragile heuristics
|
||||
- Doesn't solve root cause
|
||||
- Hard to test all paths
|
||||
|
||||
### 2. Schema Snapshots
|
||||
**Approach**: Maintain schema snapshots for each version, apply appropriate one
|
||||
|
||||
**Rejected because**:
|
||||
- Maintenance burden
|
||||
- Storage overhead
|
||||
- Complex version detection
|
||||
- Still doesn't provide upgrade path
|
||||
|
||||
### 3. Migration-Only Schema
|
||||
**Approach**: No SCHEMA_SQL at all, everything through migrations
|
||||
|
||||
**Rejected because**:
|
||||
- Slower fresh installations
|
||||
- Need to maintain migration 000 as "initial schema"
|
||||
- Harder to see current schema structure
|
||||
- Goes against SQLite's lightweight philosophy
|
||||
|
||||
## References
|
||||
|
||||
- [Rails Database Migrations](https://guides.rubyonrails.org/active_record_migrations.html)
|
||||
- [Django Migrations](https://docs.djangoproject.com/en/stable/topics/migrations/)
|
||||
- [Alembic Documentation](https://alembic.sqlalchemy.org/)
|
||||
- Production incident: v1.0.0-rc.1 deployment failure
|
||||
- `/docs/reports/migration-failure-diagnosis-v1.0.0-rc.1.md`
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Create INITIAL_SCHEMA_SQL from v0.1.0 structure
|
||||
- [ ] Modify init_db() to check database state
|
||||
- [ ] Update migration runner to handle fresh databases
|
||||
- [ ] Add schema_info table for version tracking
|
||||
- [ ] Create migration test suite
|
||||
- [ ] Document upgrade procedures
|
||||
- [ ] Test upgrade paths from all released versions
|
||||
229
docs/decisions/ADR-032-initial-schema-sql-implementation.md
Normal file
229
docs/decisions/ADR-032-initial-schema-sql-implementation.md
Normal file
@@ -0,0 +1,229 @@
|
||||
# ADR-032: Initial Schema SQL Implementation for Migration System
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
As documented in ADR-031, the current database migration system has a critical design flaw: `SCHEMA_SQL` represents the current (latest) schema structure rather than the initial v0.1.0 schema. This causes upgrade failures for existing databases because:
|
||||
|
||||
1. The system tries to create indexes on columns that don't exist yet
|
||||
2. Schema creation happens BEFORE migrations run
|
||||
3. There's no clear upgrade path from old to new database structures
|
||||
|
||||
Phase 2 of ADR-031's redesign requires creating an `INITIAL_SCHEMA_SQL` constant that represents the v0.1.0 baseline schema, allowing all schema evolution to happen through migrations.
|
||||
|
||||
## Decision
|
||||
|
||||
Create an `INITIAL_SCHEMA_SQL` constant that represents the exact database schema from the initial v0.1.0 release (commit a68fd57). This baseline schema will be used for:
|
||||
|
||||
1. **Fresh database initialization**: Create initial schema then run ALL migrations
|
||||
2. **Existing database detection**: Skip initial schema if tables already exist
|
||||
3. **Clear upgrade path**: Every database follows the same evolution through migrations
|
||||
|
||||
### INITIAL_SCHEMA_SQL Design
|
||||
|
||||
Based on analysis of the initial commit (a68fd57), the `INITIAL_SCHEMA_SQL` should contain:
|
||||
|
||||
```sql
|
||||
-- Notes metadata (content is in files)
|
||||
CREATE TABLE IF NOT EXISTS notes (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
slug TEXT UNIQUE NOT NULL,
|
||||
file_path TEXT UNIQUE NOT NULL,
|
||||
published BOOLEAN DEFAULT 0,
|
||||
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
deleted_at TIMESTAMP,
|
||||
content_hash TEXT
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_notes_created_at ON notes(created_at DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_notes_published ON notes(published);
|
||||
CREATE INDEX IF NOT EXISTS idx_notes_slug ON notes(slug);
|
||||
CREATE INDEX IF NOT EXISTS idx_notes_deleted_at ON notes(deleted_at);
|
||||
|
||||
-- Authentication sessions (IndieLogin)
|
||||
CREATE TABLE IF NOT EXISTS sessions (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
session_token TEXT UNIQUE NOT NULL,
|
||||
me TEXT NOT NULL,
|
||||
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
expires_at TIMESTAMP NOT NULL,
|
||||
last_used_at TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_sessions_token ON sessions(session_token);
|
||||
CREATE INDEX IF NOT EXISTS idx_sessions_expires ON sessions(expires_at);
|
||||
|
||||
-- Micropub access tokens (original insecure version)
|
||||
CREATE TABLE IF NOT EXISTS tokens (
|
||||
token TEXT PRIMARY KEY,
|
||||
me TEXT NOT NULL,
|
||||
client_id TEXT,
|
||||
scope TEXT,
|
||||
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
expires_at TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_tokens_me ON tokens(me);
|
||||
|
||||
-- CSRF state tokens (for IndieAuth flow)
|
||||
CREATE TABLE IF NOT EXISTS auth_state (
|
||||
state TEXT PRIMARY KEY,
|
||||
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
expires_at TIMESTAMP NOT NULL
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_auth_state_expires ON auth_state(expires_at);
|
||||
```
|
||||
|
||||
### Key Differences from Current SCHEMA_SQL
|
||||
|
||||
1. **sessions table**: Uses `session_token` (plain text) instead of `session_token_hash`
|
||||
2. **tokens table**: Original insecure structure with plain text tokens as PRIMARY KEY
|
||||
3. **auth_state table**: No `code_verifier` column (added in migration 001)
|
||||
4. **No authorization_codes table**: Added in migration 002
|
||||
5. **No secure token columns**: token_hash, last_used_at, revoked_at added later
|
||||
|
||||
### Implementation Architecture
|
||||
|
||||
```python
|
||||
# database.py structure
|
||||
INITIAL_SCHEMA_SQL = """
|
||||
-- V0.1.0 baseline schema (see ADR-032)
|
||||
-- [SQL content as shown above]
|
||||
"""
|
||||
|
||||
CURRENT_SCHEMA_SQL = """
|
||||
-- Current complete schema for reference
|
||||
-- NOT used for database initialization
|
||||
-- [Current SCHEMA_SQL content - for documentation only]
|
||||
"""
|
||||
|
||||
def init_db(app=None):
|
||||
"""Initialize database with proper migration handling"""
|
||||
|
||||
# 1. Check if database exists and has tables
|
||||
if database_exists_with_tables():
|
||||
# Existing database - only run migrations
|
||||
run_migrations(db_path, logger)
|
||||
else:
|
||||
# Fresh database - create initial schema then migrate
|
||||
conn = sqlite3.connect(db_path)
|
||||
try:
|
||||
# Create v0.1.0 baseline schema
|
||||
conn.executescript(INITIAL_SCHEMA_SQL)
|
||||
conn.commit()
|
||||
logger.info("Created initial v0.1.0 database schema")
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
# Run all migrations to bring to current version
|
||||
run_migrations(db_path, logger)
|
||||
```
|
||||
|
||||
### Migration Evolution Path
|
||||
|
||||
Starting from INITIAL_SCHEMA_SQL, the database evolves through:
|
||||
|
||||
1. **Migration 001**: Add code_verifier to auth_state (PKCE support)
|
||||
2. **Migration 002**: Secure token storage (complete tokens table rebuild)
|
||||
3. **Future migrations**: Continue evolution from this baseline
|
||||
|
||||
## Rationale
|
||||
|
||||
### Why This Specific Schema?
|
||||
|
||||
1. **Historical accuracy**: Represents the actual v0.1.0 release state
|
||||
2. **Clean evolution**: All changes tracked through migrations
|
||||
3. **Testable upgrades**: Can test upgrade path from any version
|
||||
4. **No ambiguity**: Clear separation between initial and evolved state
|
||||
|
||||
### Why Not Alternative Approaches?
|
||||
|
||||
1. **Not using migration 000**: Migrations should represent changes, not initial state
|
||||
2. **Not using current schema**: Would skip migration history for new databases
|
||||
3. **Not detecting schema dynamically**: Too complex and fragile
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- **Reliable upgrades**: Any database can upgrade to any version
|
||||
- **Clear history**: Migration path shows exact evolution
|
||||
- **Testable**: Can verify upgrade paths in CI/CD
|
||||
- **Standard pattern**: Follows Rails/Django migration patterns
|
||||
- **Maintainable**: Single source of truth for initial schema
|
||||
|
||||
### Negative
|
||||
|
||||
- **Historical maintenance**: Must preserve v0.1.0 schema forever
|
||||
- **Slower fresh installs**: Must run all migrations on new databases
|
||||
- **Documentation burden**: Need to explain two schema constants
|
||||
|
||||
### Implementation Requirements
|
||||
|
||||
1. **Code Changes**:
|
||||
- Add `INITIAL_SCHEMA_SQL` constant to `database.py`
|
||||
- Modify `init_db()` to use new initialization logic
|
||||
- Add `database_exists_with_tables()` helper function
|
||||
- Rename current `SCHEMA_SQL` to `CURRENT_SCHEMA_SQL` (documentation only)
|
||||
|
||||
2. **Testing Requirements**:
|
||||
- Test fresh database initialization
|
||||
- Test upgrade from v0.1.0 schema
|
||||
- Test upgrade from each released version
|
||||
- Test migration replay detection
|
||||
- Verify all indexes created correctly
|
||||
|
||||
3. **Documentation Updates**:
|
||||
- Update database.py docstrings
|
||||
- Document schema evolution in architecture docs
|
||||
- Add upgrade guide for production systems
|
||||
- Update deployment documentation
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### For v1.1.0 Release
|
||||
|
||||
1. **Implement INITIAL_SCHEMA_SQL** as designed above
|
||||
2. **Update init_db()** with new logic
|
||||
3. **Comprehensive testing** of upgrade paths
|
||||
4. **Documentation** of upgrade procedures
|
||||
5. **Release notes** explaining the change
|
||||
|
||||
### For Existing Production Systems
|
||||
|
||||
After v1.1.0 deployment:
|
||||
|
||||
1. Existing databases will skip INITIAL_SCHEMA_SQL (tables exist)
|
||||
2. Migrations run normally to update schema
|
||||
3. No manual intervention required
|
||||
4. Full backward compatibility maintained
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
- [ ] Fresh database gets v0.1.0 schema then migrations
|
||||
- [ ] Existing v0.1.0 database upgrades correctly
|
||||
- [ ] Existing v1.0.0 database upgrades correctly
|
||||
- [ ] All indexes created in correct order
|
||||
- [ ] No duplicate table/index creation errors
|
||||
- [ ] Migration history tracked correctly
|
||||
- [ ] Performance acceptable for fresh installs
|
||||
|
||||
## References
|
||||
|
||||
- ADR-031: Database Migration System Redesign
|
||||
- Original v0.1.0 schema (commit a68fd57)
|
||||
- Migration 001: Add code_verifier to auth_state
|
||||
- Migration 002: Secure tokens and authorization codes
|
||||
- SQLite documentation on schema management
|
||||
- Rails/Django migration patterns
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
**Priority**: HIGH - Required for v1.1.0 release
|
||||
**Complexity**: Medium - Clear requirements but needs careful testing
|
||||
**Risk**: Low - Backward compatible, well-understood pattern
|
||||
**Effort**: 4-6 hours including testing
|
||||
Reference in New Issue
Block a user