# ADR-032: Initial Schema SQL Implementation for Migration System

## Status
Accepted

## Context

As documented in ADR-031, the current database migration system has a critical design flaw: `SCHEMA_SQL` represents the current (latest) schema structure rather than the initial v0.1.0 schema. This causes upgrade failures for existing databases because:

1. The system tries to create indexes on columns that don't exist yet
2. Schema creation happens BEFORE migrations run
3. There's no clear upgrade path from old to new database structures

Phase 2 of ADR-031's redesign requires creating an `INITIAL_SCHEMA_SQL` constant that represents the v0.1.0 baseline schema, allowing all schema evolution to happen through migrations.

## Decision

Create an `INITIAL_SCHEMA_SQL` constant that represents the exact database schema from the initial v0.1.0 release (commit a68fd57). This baseline schema will be used for:

1. **Fresh database initialization**: Create initial schema then run ALL migrations
2. **Existing database detection**: Skip initial schema if tables already exist
3. **Clear upgrade path**: Every database follows the same evolution through migrations

### INITIAL_SCHEMA_SQL Design

Based on analysis of the initial commit (a68fd57), the `INITIAL_SCHEMA_SQL` should contain:

```sql
-- Notes metadata (content is in files)
CREATE TABLE IF NOT EXISTS notes (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    slug TEXT UNIQUE NOT NULL,
    file_path TEXT UNIQUE NOT NULL,
    published BOOLEAN DEFAULT 0,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    deleted_at TIMESTAMP,
    content_hash TEXT
);

CREATE INDEX IF NOT EXISTS idx_notes_created_at ON notes(created_at DESC);
CREATE INDEX IF NOT EXISTS idx_notes_published ON notes(published);
CREATE INDEX IF NOT EXISTS idx_notes_slug ON notes(slug);
CREATE INDEX IF NOT EXISTS idx_notes_deleted_at ON notes(deleted_at);

-- Authentication sessions (IndieLogin)
CREATE TABLE IF NOT EXISTS sessions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    session_token TEXT UNIQUE NOT NULL,
    me TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL,
    last_used_at TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_sessions_token ON sessions(session_token);
CREATE INDEX IF NOT EXISTS idx_sessions_expires ON sessions(expires_at);

-- Micropub access tokens (original insecure version)
CREATE TABLE IF NOT EXISTS tokens (
    token TEXT PRIMARY KEY,
    me TEXT NOT NULL,
    client_id TEXT,
    scope TEXT,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tokens_me ON tokens(me);

-- CSRF state tokens (for IndieAuth flow)
CREATE TABLE IF NOT EXISTS auth_state (
    state TEXT PRIMARY KEY,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP NOT NULL
);

CREATE INDEX IF NOT EXISTS idx_auth_state_expires ON auth_state(expires_at);
```

### Key Differences from Current SCHEMA_SQL

1. **sessions table**: Uses `session_token` (plain text) instead of `session_token_hash`
2. **tokens table**: Original insecure structure with plain text tokens as PRIMARY KEY
3. **auth_state table**: No `code_verifier` column (added in migration 001)
4. **No authorization_codes table**: Added in migration 002
5. **No secure token columns**: token_hash, last_used_at, revoked_at added later

### Implementation Architecture

```python
# database.py structure
INITIAL_SCHEMA_SQL = """
-- V0.1.0 baseline schema (see ADR-032)
-- [SQL content as shown above]
"""

CURRENT_SCHEMA_SQL = """
-- Current complete schema for reference
-- NOT used for database initialization
-- [Current SCHEMA_SQL content - for documentation only]
"""

def init_db(app=None):
    """Initialize database with proper migration handling"""

    # 1. Check if database exists and has tables
    if database_exists_with_tables():
        # Existing database - only run migrations
        run_migrations(db_path, logger)
    else:
        # Fresh database - create initial schema then migrate
        conn = sqlite3.connect(db_path)
        try:
            # Create v0.1.0 baseline schema
            conn.executescript(INITIAL_SCHEMA_SQL)
            conn.commit()
            logger.info("Created initial v0.1.0 database schema")
        finally:
            conn.close()

        # Run all migrations to bring to current version
        run_migrations(db_path, logger)
```

### Migration Evolution Path

Starting from INITIAL_SCHEMA_SQL, the database evolves through:

1. **Migration 001**: Add code_verifier to auth_state (PKCE support)
2. **Migration 002**: Secure token storage (complete tokens table rebuild)
3. **Future migrations**: Continue evolution from this baseline

## Rationale

### Why This Specific Schema?

1. **Historical accuracy**: Represents the actual v0.1.0 release state
2. **Clean evolution**: All changes tracked through migrations
3. **Testable upgrades**: Can test upgrade path from any version
4. **No ambiguity**: Clear separation between initial and evolved state

### Why Not Alternative Approaches?

1. **Not using migration 000**: Migrations should represent changes, not initial state
2. **Not using current schema**: Would skip migration history for new databases
3. **Not detecting schema dynamically**: Too complex and fragile

## Consequences

### Positive

- **Reliable upgrades**: Any database can upgrade to any version
- **Clear history**: Migration path shows exact evolution
- **Testable**: Can verify upgrade paths in CI/CD
- **Standard pattern**: Follows Rails/Django migration patterns
- **Maintainable**: Single source of truth for initial schema

### Negative

- **Historical maintenance**: Must preserve v0.1.0 schema forever
- **Slower fresh installs**: Must run all migrations on new databases
- **Documentation burden**: Need to explain two schema constants

### Implementation Requirements

1. **Code Changes**:
   - Add `INITIAL_SCHEMA_SQL` constant to `database.py`
   - Modify `init_db()` to use new initialization logic
   - Add `database_exists_with_tables()` helper function
   - Rename current `SCHEMA_SQL` to `CURRENT_SCHEMA_SQL` (documentation only)

2. **Testing Requirements**:
   - Test fresh database initialization
   - Test upgrade from v0.1.0 schema
   - Test upgrade from each released version
   - Test migration replay detection
   - Verify all indexes created correctly

3. **Documentation Updates**:
   - Update database.py docstrings
   - Document schema evolution in architecture docs
   - Add upgrade guide for production systems
   - Update deployment documentation

## Migration Strategy

### For v1.1.0 Release

1. **Implement INITIAL_SCHEMA_SQL** as designed above
2. **Update init_db()** with new logic
3. **Comprehensive testing** of upgrade paths
4. **Documentation** of upgrade procedures
5. **Release notes** explaining the change

### For Existing Production Systems

After v1.1.0 deployment:

1. Existing databases will skip INITIAL_SCHEMA_SQL (tables exist)
2. Migrations run normally to update schema
3. No manual intervention required
4. Full backward compatibility maintained

## Testing Checklist

- [ ] Fresh database gets v0.1.0 schema then migrations
- [ ] Existing v0.1.0 database upgrades correctly
- [ ] Existing v1.0.0 database upgrades correctly
- [ ] All indexes created in correct order
- [ ] No duplicate table/index creation errors
- [ ] Migration history tracked correctly
- [ ] Performance acceptable for fresh installs

## References

- ADR-031: Database Migration System Redesign
- Original v0.1.0 schema (commit a68fd57)
- Migration 001: Add code_verifier to auth_state
- Migration 002: Secure tokens and authorization codes
- SQLite documentation on schema management
- Rails/Django migration patterns

## Implementation Notes

**Priority**: HIGH - Required for v1.1.0 release
**Complexity**: Medium - Clear requirements but needs careful testing
**Risk**: Low - Backward compatible, well-understood pattern
**Effort**: 4-6 hours including testing