Files
StarPunk/docs/decisions/ADR-020-automatic-database-migrations.md
Phil Skentelbery ebca9064c5 docs: Add ADR-020 and migration system implementation guidance
Architecture documentation for automatic database migrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 16:11:17 -07:00

51 KiB

ADR-020: Automatic Database Migration System

Status

Accepted

Context

StarPunk currently requires manual database migration execution before starting the application. This creates operational friction and is particularly problematic in containerized deployments where the database schema must be initialized automatically on startup.

Current State

  • Database: SQLite at data/starpunk.db
  • Initial Schema: Defined in starpunk/database.py as SCHEMA_SQL constant
  • Migrations: SQL files in migrations/ directory (e.g., 001_add_code_verifier_to_auth_state.sql)
  • Initialization: init_db() creates tables using CREATE TABLE IF NOT EXISTS
  • Problem: Schema changes require manual SQL execution, no tracking of applied migrations

Pain Points

  1. Manual Intervention Required: Deploying schema changes requires SSH access and manual SQL execution
  2. No Migration History: No way to know which migrations have been applied to a database
  3. Error-Prone: Easy to forget migrations or apply them out of order
  4. Container Unfriendly: Containers should be stateless and self-initializing
  5. Development Friction: Each developer must manually track and apply migrations
  6. Testing Complexity: Test databases require manual migration setup

Requirements

  1. Automatic Execution: Migrations run on application startup
  2. Idempotency: Safe to run multiple times, only applies pending migrations
  3. Order Preservation: Migrations applied in deterministic order
  4. Tracking: Record which migrations have been applied
  5. Safety: Clear errors, no partial application of migrations
  6. Simplicity: Minimal complexity, easy to understand and debug
  7. Container Compatible: Works in ephemeral container environments
  8. Developer Friendly: Easy to add new migrations

Decision

Implement an automatic sequential migration system that runs on application startup, using numbered SQL files and a migration tracking table.

Core Components

  1. Migration Tracking Table: schema_migrations table in SQLite
  2. Migration Files: Sequentially numbered .sql files in migrations/ directory
  3. Migration Runner: run_migrations() function in starpunk/migrations.py
  4. Integration Point: Called from init_db() in starpunk/database.py

Migration File Format

Naming Convention: {number:03d}_{description}.sql

Examples:

  • 001_add_code_verifier_to_auth_state.sql
  • 002_add_tags_table.sql
  • 003_add_note_syndication_urls.sql

Format Rules:

  • Three-digit zero-padded number (001, 002, 003, ...)
  • Underscore separator
  • Lowercase descriptive name with underscores
  • .sql extension

File Content:

-- Migration: {Description}
-- Date: {YYYY-MM-DD}
-- ADR: {ADR reference if applicable}

{SQL statements}

-- Each statement should be idempotent where possible
-- Use IF NOT EXISTS for CREATE TABLE/INDEX
-- Use default values for ALTER TABLE ADD COLUMN

Migration Tracking Schema

CREATE TABLE IF NOT EXISTS schema_migrations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    migration_name TEXT UNIQUE NOT NULL,
    applied_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_schema_migrations_name
    ON schema_migrations(migration_name);

Fields:

  • id: Auto-increment primary key
  • migration_name: Filename of migration (e.g., 001_add_code_verifier_to_auth_state.sql)
  • applied_at: Timestamp when migration was applied

Migration Discovery and Execution

Algorithm:

  1. Initialize tracking table (if not exists)
  2. Discover migration files in migrations/ directory
  3. Sort by filename (numeric prefix ensures order)
  4. Check each migration against schema_migrations table
  5. Apply pending migrations in order
  6. Record successful migrations in tracking table
  7. Fail fast on any error with clear message

Execution Order:

  • Alphanumeric sort of filenames ensures correct order
  • 001_*.sql runs before 002_*.sql
  • New migrations added with next available number

SQLite Transaction Handling

Approach: Execute each migration in a transaction

Implementation:

try:
    conn.execute("BEGIN")
    conn.executescript(migration_sql)
    conn.execute(
        "INSERT INTO schema_migrations (migration_name) VALUES (?)",
        (migration_file,)
    )
    conn.commit()
except Exception as e:
    conn.rollback()
    raise MigrationError(f"Migration {migration_file} failed: {e}")

Note on SQLite DDL: SQLite does not support full rollback of DDL statements (CREATE TABLE, ALTER TABLE) within transactions. However:

  • Most DDL is auto-committed immediately
  • Migration failures leave partial state
  • Mitigation: Write idempotent migrations using IF NOT EXISTS, DEFAULT values, etc.
  • Recovery: Failed migrations must be manually fixed, then re-run

Integration Points

In starpunk/database.py:

def init_db(app=None):
    """
    Initialize database schema and run migrations

    Args:
        app: Flask application instance (optional, for config access)
    """
    if app:
        db_path = app.config["DATABASE_PATH"]
        logger = app.logger
    else:
        db_path = Path("./data/starpunk.db")
        logger = None

    # Ensure parent directory exists
    db_path.parent.mkdir(parents=True, exist_ok=True)

    # Create initial schema
    conn = sqlite3.connect(db_path)
    try:
        conn.executescript(SCHEMA_SQL)
        conn.commit()
        if logger:
            logger.info(f"Database initialized: {db_path}")
    finally:
        conn.close()

    # Run migrations
    from starpunk.migrations import run_migrations
    run_migrations(db_path, logger=logger)

Call Order:

  1. create_app()init_db(app)
  2. init_db() → create base schema → run_migrations()
  3. run_migrations() → apply pending migrations
  4. Application starts serving requests

Error Handling

Error Types:

  1. Migration File Error: Invalid SQL syntax

    • Action: Log error with filename and line number
    • Result: Application fails to start
    • Recovery: Fix migration SQL, restart
  2. Migration Conflict: Two migrations with same number

    • Action: Log error listing conflicting files
    • Result: Application fails to start
    • Recovery: Renumber migrations, restart
  3. Database Lock: SQLite database locked

    • Action: Retry with exponential backoff (3 attempts)
    • Result: Fail if still locked after retries
    • Recovery: Ensure no other processes accessing database
  4. Partial Migration: Migration fails mid-execution

    • Action: Log error with migration name and error details
    • Result: Application fails to start
    • Recovery: Fix issue manually, restart (migration will retry)

Error Message Format:

[ERROR] Database migration failed: 002_add_tags_table.sql
Reason: near "CRATE": syntax error at line 5
Action required: Fix migration file and restart application

Logging Strategy

Log Levels:

  • INFO: Migration discovery and successful application

    [INFO] Discovered 3 migration files
    [INFO] Applied migration: 001_add_code_verifier_to_auth_state.sql
    [INFO] All migrations applied successfully (3 total, 1 pending)
    
  • DEBUG: Detailed migration execution

    [DEBUG] Migration file: 001_add_code_verifier_to_auth_state.sql
    [DEBUG] Migration status: pending
    [DEBUG] Executing migration SQL...
    [DEBUG] Migration recorded in schema_migrations
    
  • WARNING: Unusual but non-fatal conditions

    [WARNING] No migrations directory found, skipping migrations
    [WARNING] Migrations directory empty
    
  • ERROR: Migration failures

    [ERROR] Migration failed: 002_add_tags_table.sql
    [ERROR] Database error: near "CRATE": syntax error
    

Logging Output:

  • Development: Console (handled by Flask logger)
  • Production: Container logs (stdout/stderr)
  • Format: Timestamp, level, message

Developer Workflow

Adding a New Migration:

  1. Create migration file:

    # Determine next number
    ls migrations/ | tail -1
    # Creates: 002_add_tags_table.sql
    
    touch migrations/002_add_tags_table.sql
    
  2. Write migration SQL:

    -- Migration: Add tags table
    -- Date: 2025-11-19
    -- ADR: ADR-025-tags-feature
    
    CREATE TABLE IF NOT EXISTS tags (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT UNIQUE NOT NULL,
        created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
    );
    
    CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);
    
  3. Test migration:

    # Start application (migration runs automatically)
    flask --app app.py run
    
    # Check logs for migration success
    # [INFO] Applied migration: 002_add_tags_table.sql
    
  4. Commit migration:

    git add migrations/002_add_tags_table.sql
    git commit -m "Add tags table migration"
    

Migration Best Practices:

  1. Make migrations additive: Add columns, don't remove (mark as deprecated instead)
  2. Use defaults for new columns: ALTER TABLE ... ADD COLUMN ... DEFAULT ...
  3. Write idempotent SQL: Use IF NOT EXISTS where possible
  4. Test on copy of production database: Verify migration works with real data
  5. Keep migrations small: One logical change per migration
  6. Document purpose: Include header comment explaining change

Rationale

Why Sequential Numbers Instead of Timestamps?

Decision: Use sequential numbers (001, 002, 003)

Alternatives Considered:

  • Timestamps (20251119_143522_add_tags.sql)
  • UUIDs (a7b3c9d1-add-tags.sql)
  • Git SHAs (a7b3c9d-add-tags.sql)

Rationale:

  1. Simplicity: Easy to see order at a glance
  2. No Conflicts: Single developer unlikely to have conflicts
  3. Readability: Shorter filenames
  4. Team Compatible: Even with multiple developers, merge conflicts explicit
  5. Sortability: Lexicographic sort equals execution order

Trade-off: Two developers working on separate branches may create conflicting numbers. Resolution is simple (renumber before merge).

Why Run on Startup Instead of Manual Command?

Decision: Automatic execution on create_app()

Alternatives Considered:

  • CLI command: flask db migrate
  • Separate initialization script
  • Container entrypoint script

Rationale:

  1. Container Friendly: Containers self-initialize on startup
  2. Developer Friendly: git pull + flask run just works
  3. No Forgotten Migrations: Impossible to skip migrations
  4. Idempotent: Safe to run multiple times
  5. Fail Fast: Application won't start with incomplete schema

Trade-off: Application startup slightly slower (negligible for SQLite). Migrations must be fast (<1s each).

Why SQLite Transaction Per Migration?

Decision: Each migration executes in its own transaction

Alternatives Considered:

  • Single transaction for all migrations
  • No transaction (auto-commit)

Rationale:

  1. Isolation: Failed migration doesn't affect previously successful ones
  2. Resume: Can continue from last successful migration
  3. SQLite Limitation: DDL statements auto-commit anyway
  4. Tracking: Each successful migration recorded immediately

Trade-off: SQLite DDL rollback is limited. Partial migration may leave inconsistent state requiring manual fix.

Why No Down Migrations?

Decision: Only forward migrations, no rollback

Alternatives Considered:

  • Paired up/down migrations (Django, Rails style)
  • Snapshot-based rollback

Rationale:

  1. Simplicity: Half the code, half the complexity
  2. IndieWeb Philosophy: Own your data, fix forward
  3. SQLite Limitations: Limited ALTER TABLE support makes rollbacks difficult
  4. Production Reality: Rollbacks rarely used, risky
  5. Alternative: Restore from backup if needed

Trade-off: Cannot automatically rollback. Must fix forward or restore from backup.

Why In-Application Instead of External Tool?

Decision: Migration runner built into application

Alternatives Considered:

  • Alembic (SQLAlchemy migrations)
  • Flask-Migrate (Flask + Alembic)
  • Custom CLI tool

Rationale:

  1. No Dependencies: Alembic adds complexity and dependencies
  2. Perfect for SQLite: Simple file-based migrations sufficient
  3. Single Codebase: No separate migration tool to maintain
  4. Minimal Code: ~100 lines vs. thousands in Alembic
  5. Alignment: "Every line must justify its existence"

Trade-off: Less powerful than Alembic (no auto-generation, model diffing). For StarPunk's simple schema, this is acceptable.

Consequences

Positive

  1. Zero-Touch Deployment: Containers start with correct schema automatically
  2. Developer Productivity: No manual migration tracking or execution
  3. Safer Deployments: Migrations always applied in correct order
  4. Better Testing: Test databases automatically migrated
  5. Audit Trail: Clear history of schema changes in schema_migrations
  6. Idempotent: Safe to run migrations multiple times
  7. Simple: Easy to understand, debug, and maintain
  8. No Dependencies: Pure Python + SQLite, no external tools

Negative

  1. Startup Time: Migrations add ~50-200ms to startup (negligible)
  2. No Auto-Generation: Migrations must be written manually (acceptable for simple schema)
  3. No Rollback: Cannot automatically undo migrations (restore from backup instead)
  4. SQLite Limitations: Limited ALTER TABLE support, no full DDL transactions
  5. Sequential Conflicts: Multiple developers may create conflicting numbers (rare, easy to fix)

Neutral

  1. Migration File Management: Developers must number migrations correctly
  2. Testing Requirement: Migrations should be tested on production-like data
  3. Documentation Need: Migration best practices should be documented

Implementation Specification

File Structure

starpunk/
├── migrations.py          # NEW: Migration runner
├── database.py            # MODIFIED: Call run_migrations()
├── __init__.py            # No changes
└── config.py              # No changes

migrations/                # EXISTING DIRECTORY
├── 001_add_code_verifier_to_auth_state.sql  # EXISTING
└── 002_*.sql             # Future migrations

New File: starpunk/migrations.py

"""
Database migration runner for StarPunk

Automatically discovers and applies pending migrations on startup.
Migrations are numbered SQL files in the migrations/ directory.
"""

import sqlite3
from pathlib import Path
import logging


class MigrationError(Exception):
    """Raised when a migration fails to apply"""
    pass


def create_migrations_table(conn):
    """
    Create schema_migrations tracking table if it doesn't exist

    Args:
        conn: SQLite connection
    """
    conn.execute("""
        CREATE TABLE IF NOT EXISTS schema_migrations (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            migration_name TEXT UNIQUE NOT NULL,
            applied_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
        )
    """)

    conn.execute("""
        CREATE INDEX IF NOT EXISTS idx_schema_migrations_name
            ON schema_migrations(migration_name)
    """)

    conn.commit()


def get_applied_migrations(conn):
    """
    Get set of already-applied migration names

    Args:
        conn: SQLite connection

    Returns:
        set: Set of migration filenames that have been applied
    """
    cursor = conn.execute(
        "SELECT migration_name FROM schema_migrations ORDER BY id"
    )
    return set(row[0] for row in cursor.fetchall())


def discover_migration_files(migrations_dir):
    """
    Discover all migration files in migrations directory

    Args:
        migrations_dir: Path to migrations directory

    Returns:
        list: Sorted list of (filename, full_path) tuples
    """
    if not migrations_dir.exists():
        return []

    migration_files = []
    for file_path in migrations_dir.glob("*.sql"):
        migration_files.append((file_path.name, file_path))

    # Sort by filename (numeric prefix ensures correct order)
    migration_files.sort(key=lambda x: x[0])

    return migration_files


def apply_migration(conn, migration_name, migration_path, logger=None):
    """
    Apply a single migration file

    Args:
        conn: SQLite connection
        migration_name: Filename of migration
        migration_path: Full path to migration file
        logger: Optional logger for output

    Raises:
        MigrationError: If migration fails to apply
    """
    try:
        # Read migration SQL
        migration_sql = migration_path.read_text()

        if logger:
            logger.debug(f"Applying migration: {migration_name}")

        # Execute migration in transaction
        conn.execute("BEGIN")
        conn.executescript(migration_sql)

        # Record migration as applied
        conn.execute(
            "INSERT INTO schema_migrations (migration_name) VALUES (?)",
            (migration_name,)
        )

        conn.commit()

        if logger:
            logger.info(f"Applied migration: {migration_name}")

    except Exception as e:
        conn.rollback()
        error_msg = f"Migration {migration_name} failed: {e}"
        if logger:
            logger.error(error_msg)
        raise MigrationError(error_msg)


def run_migrations(db_path, logger=None):
    """
    Run all pending database migrations

    Called automatically during database initialization.
    Discovers migration files, checks which have been applied,
    and applies any pending migrations in order.

    Args:
        db_path: Path to SQLite database file
        logger: Optional logger for output

    Raises:
        MigrationError: If any migration fails to apply
    """
    if logger is None:
        logger = logging.getLogger(__name__)

    # Determine migrations directory
    # Assumes migrations/ is in project root, sibling to starpunk/
    migrations_dir = Path(__file__).parent.parent / "migrations"

    if not migrations_dir.exists():
        logger.warning(f"Migrations directory not found: {migrations_dir}")
        return

    # Connect to database
    conn = sqlite3.connect(db_path)

    try:
        # Ensure migrations tracking table exists
        create_migrations_table(conn)

        # Get already-applied migrations
        applied = get_applied_migrations(conn)

        # Discover migration files
        migration_files = discover_migration_files(migrations_dir)

        if not migration_files:
            logger.info("No migration files found")
            return

        # Apply pending migrations
        pending_count = 0
        for migration_name, migration_path in migration_files:
            if migration_name not in applied:
                apply_migration(conn, migration_name, migration_path, logger)
                pending_count += 1

        # Summary
        total_count = len(migration_files)
        if pending_count > 0:
            logger.info(
                f"Migrations complete: {pending_count} applied, "
                f"{total_count} total"
            )
        else:
            logger.info(f"All migrations up to date ({total_count} total)")

    except MigrationError:
        # Re-raise migration errors (already logged)
        raise

    except Exception as e:
        error_msg = f"Migration system error: {e}"
        logger.error(error_msg)
        raise MigrationError(error_msg)

    finally:
        conn.close()

Modified File: starpunk/database.py

Changes:

  1. Import migration runner at top:

    from starpunk.migrations import run_migrations
    
  2. Modify init_db() to call migrations:

    def init_db(app=None):
        """
        Initialize database schema and run migrations
    
        Args:
            app: Flask application instance (optional, for config access)
        """
        if app:
            db_path = app.config["DATABASE_PATH"]
            logger = app.logger
        else:
            # Fallback to default path
            db_path = Path("./data/starpunk.db")
            logger = None
    
        # Ensure parent directory exists
        db_path.parent.mkdir(parents=True, exist_ok=True)
    
        # Create database and initial schema
        conn = sqlite3.connect(db_path)
        try:
            conn.executescript(SCHEMA_SQL)
            conn.commit()
            if logger:
                logger.info(f"Database initialized: {db_path}")
            else:
                print(f"Database initialized: {db_path}")
        finally:
            conn.close()
    
        # Run migrations
        run_migrations(db_path, logger=logger)
    

Migration Tracking Table SQL

Location: Created automatically by create_migrations_table()

CREATE TABLE IF NOT EXISTS schema_migrations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    migration_name TEXT UNIQUE NOT NULL,
    applied_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_schema_migrations_name
    ON schema_migrations(migration_name);

Example Migration File

File: migrations/002_add_tags_table.sql

-- Migration: Add tags table for note categorization
-- Date: 2025-11-19
-- ADR: ADR-025-tags-feature

-- Tags table
CREATE TABLE IF NOT EXISTS tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);

-- Note-Tag junction table
CREATE TABLE IF NOT EXISTS note_tags (
    note_id INTEGER NOT NULL,
    tag_id INTEGER NOT NULL,
    PRIMARY KEY (note_id, tag_id),
    FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_note_tags_note ON note_tags(note_id);
CREATE INDEX IF NOT EXISTS idx_note_tags_tag ON note_tags(tag_id);

Testing Strategy

Unit Tests

Test File: tests/test_migrations.py

Test Cases:

  1. test_create_migrations_table(): Verify table created with correct schema
  2. test_get_applied_migrations(): Verify retrieval of applied migrations
  3. test_discover_migration_files(): Verify discovery and sorting
  4. test_apply_migration_success(): Verify successful migration application
  5. test_apply_migration_failure(): Verify error handling and rollback
  6. test_run_migrations_empty(): Verify behavior with no migrations
  7. test_run_migrations_all_applied(): Verify idempotency
  8. test_run_migrations_partial(): Verify applying only pending migrations
  9. test_run_migrations_order(): Verify migrations applied in correct order

Integration Tests

Test File: tests/test_database_init.py

Test Cases:

  1. test_init_db_creates_schema_and_migrations(): Verify full initialization
  2. test_init_db_idempotent(): Verify safe to call multiple times
  3. test_migration_applied_on_startup(): Verify app startup applies migrations

Manual Testing

Procedure:

  1. Fresh Database:

    rm data/starpunk.db
    flask --app app.py run
    # Verify: [INFO] Applied migration: 001_add_code_verifier_to_auth_state.sql
    
  2. Existing Database:

    flask --app app.py run
    # Verify: [INFO] All migrations up to date (1 total)
    
  3. Add New Migration:

    echo "-- Test migration" > migrations/002_test.sql
    flask --app app.py run
    # Verify: [INFO] Applied migration: 002_test.sql
    
  4. Migration Failure:

    echo "INVALID SQL;" > migrations/003_fail.sql
    flask --app app.py run
    # Verify: [ERROR] Migration 003_fail.sql failed: near "INVALID": syntax error
    
  5. Container Startup:

    docker run -v $(pwd)/data:/app/data starpunk
    # Verify: Migrations applied automatically
    

Migration Management Guide

Adding a New Migration

Step-by-Step:

  1. Determine next number:

    ls migrations/ | tail -1
    # Output: 001_add_code_verifier_to_auth_state.sql
    # Next: 002
    
  2. Create migration file:

    touch migrations/002_add_tags_table.sql
    
  3. Write migration SQL:

    -- Migration: Add tags table
    -- Date: 2025-11-19
    -- ADR: ADR-025-tags-feature
    
    CREATE TABLE IF NOT EXISTS tags (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT UNIQUE NOT NULL,
        created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
    );
    
    CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);
    
  4. Test migration locally:

    # Backup database
    cp data/starpunk.db data/starpunk.db.backup
    
    # Run application (migration auto-applies)
    flask --app app.py run
    
    # Check logs for success
    # Verify database schema
    sqlite3 data/starpunk.db ".schema tags"
    
  5. Commit migration:

    git add migrations/002_add_tags_table.sql
    git commit -m "Add tags table migration"
    

Handling Migration Conflicts

Scenario: Two developers create migration 002 on different branches

Resolution:

  1. Developer A: Created 002_add_tags_table.sql on feature/tags
  2. Developer B: Created 002_add_comments_table.sql on feature/comments
  3. Developer A merges first: 002_add_tags_table.sql is in main
  4. Developer B rebases:
    git checkout feature/comments
    git rebase main
    # Conflict: both have 002_*.sql
    
    # Renumber Developer B's migration
    git mv migrations/002_add_comments_table.sql \
            migrations/003_add_comments_table.sql
    
    git add migrations/003_add_comments_table.sql
    git rebase --continue
    

Rolling Back a Migration

Not Supported Automatically

Manual Procedure:

  1. Restore from backup:

    cp data/starpunk.db.backup data/starpunk.db
    
  2. OR Fix forward:

    -- Create new migration that undoes change
    -- migrations/004_remove_tags_table.sql
    DROP TABLE IF EXISTS tags;
    
  3. OR Manual SQL:

    sqlite3 data/starpunk.db
    sqlite> DROP TABLE tags;
    sqlite> DELETE FROM schema_migrations
            WHERE migration_name = '002_add_tags_table.sql';
    sqlite> .quit
    

Best Practices for Writing Migrations

  1. Always use IF NOT EXISTS for CREATE statements
  2. Always use DEFAULT for new NOT NULL columns
  3. Test on production data copy before deploying
  4. Keep migrations small - one logical change per file
  5. Document purpose in header comment
  6. Make migrations additive when possible
  7. Avoid data transformations in structure migrations
  8. Use descriptive names for migration files

Good Migration:

-- Migration: Add published_url column to notes
-- Date: 2025-11-19
-- ADR: ADR-026-syndication-tracking

ALTER TABLE notes
    ADD COLUMN published_url TEXT DEFAULT NULL;

CREATE INDEX IF NOT EXISTS idx_notes_published_url
    ON notes(published_url);

Bad Migration:

-- Migration: Updates
-- Date: 2025-11-19

ALTER TABLE notes ADD COLUMN url TEXT NOT NULL;  -- FAILS: no default!
DROP TABLE old_stuff;  -- Destructive!
UPDATE notes SET ...;  -- Data transformation, hard to debug

Version Impact

Change Type: Infrastructure improvement (new feature)

Semantic Versioning Analysis:

  • Adds new functionality: Automatic migrations
  • Backward compatible: Existing databases work, migrations optional
  • No breaking changes: API unchanged, behavior compatible
  • Infrastructure improvement: Developer experience enhancement

Recommended Version: MINOR increment (e.g., 0.8.0 → 0.9.0)

Rationale: Adds significant new functionality (automatic migrations) but maintains full backward compatibility.

Compliance

Project Standards

  • Minimal Code: ~150 lines for complete migration system
  • No Dependencies: Pure Python + SQLite, no external tools
  • Standards First: Follows standard migration patterns
  • Single Responsibility: Migration system does one thing well
  • Documentation as Code: Migrations self-document schema changes

Security Considerations

  • SQL Injection: Migration files are trusted code (not user input)
  • File Access: Only reads from trusted migrations/ directory
  • Database Access: Uses existing database connection patterns
  • Error Exposure: Logs sanitized error messages only

IndieWeb Compatibility

  • Data Ownership: Migration tracking stored in user's database
  • Portability: Standard SQL migrations, easily portable
  • Self-Hosting: No external services required
  • Transparency: Clear audit trail of schema changes

References

Migration System Patterns

  • Django Migrations: Inspiration for tracking table
  • Rails ActiveRecord Migrations: Inspiration for sequential numbering
  • Flyway: Inspiration for SQL-based migrations
  • Alembic: Considered but rejected (too complex for needs)

SQLite Documentation

Internal Documentation

  • ADR-004: File-based note storage (similar pattern)
  • ADR-008: Versioning strategy (migration impact)
  • docs/standards/versioning-strategy.md: Version management

Developer Questions & Architectural Responses

This section addresses critical implementation questions identified during developer review.

Q1: SCHEMA_SQL Chicken-and-Egg Problem

Question: Current SCHEMA_SQL (line 60 in database.py) already includes code_verifier TEXT NOT NULL DEFAULT '' in the auth_state table. Migration 001_add_code_verifier_to_auth_state.sql tries to add the same column. On fresh databases, this fails because the column already exists.

Decision: SCHEMA_SQL represents the complete target state (current schema after all migrations applied)

Rationale:

  • Fresh installs should get the latest schema immediately (no migration overhead)
  • Existing installs need migrations to reach the target state
  • This is the standard pattern used by Django, Rails, and other frameworks
  • Migrations are time-based snapshots, SCHEMA_SQL is the destination

Implementation:

  1. Keep code_verifier in SCHEMA_SQL - It's part of the current schema
  2. Migration 001 is for existing databases only - Databases created before PKCE feature
  3. Auto-skip migrations on fresh installs - Detect and skip migrations already in SCHEMA_SQL

Solution Pattern:

def run_migrations(db_path, logger=None):
    # ... existing code ...

    # Check if this is a fresh database (no schema_migrations table existed before we created it)
    cursor = conn.execute(
        "SELECT COUNT(*) FROM schema_migrations"
    )
    migration_count = cursor.fetchone()[0]

    # If fresh database (0 migrations recorded), mark all migrations as applied
    # since SCHEMA_SQL already contains all changes
    if migration_count == 0:
        for migration_name, _ in migration_files:
            conn.execute(
                "INSERT INTO schema_migrations (migration_name) VALUES (?)",
                (migration_name,)
            )
        conn.commit()
        logger.info(f"Fresh database: marked {len(migration_files)} migrations as applied")
        return

    # Otherwise, apply pending migrations normally
    # ... existing migration application code ...

Consequence: Fresh installs never run migrations (already at target state), existing installs run only pending migrations.

Q2: schema_migrations Table Location

Question: Should the schema_migrations table be in SCHEMA_SQL or only created by migrations.py?

Decision: Only in migrations.py - Do NOT add to SCHEMA_SQL

Rationale:

  1. Separation of Concerns: Migration tracking is infrastructure, not application schema
  2. Detection Mechanism: Absence of table indicates fresh database (see Q1 solution)
  3. Cleaner Schema: Application schema stays focused on application tables
  4. Migration System Ownership: Migration system creates its own tracking table

Implementation:

  • create_migrations_table() in migrations.py creates the table
  • SCHEMA_SQL remains unchanged (no schema_migrations table)
  • Fresh database detection relies on table non-existence

Q3: ALTER TABLE Idempotency

Question: SQLite doesn't support IF NOT EXISTS for ALTER TABLE ADD COLUMN. How do we make migrations idempotent?

Decision: Accept non-idempotency, rely on migration tracking

Rationale:

  1. SQL Limitation: SQLite ALTER TABLE operations are not inherently idempotent
  2. Tracking Is Sufficient: schema_migrations table prevents re-application
  3. Failure Handling: Failed migrations leave clear error messages
  4. Production Reality: Migrations rarely fail, and when they do, they need manual intervention anyway

Implementation:

For Fresh Databases (Q1 solution):

  • All migrations automatically marked as applied
  • Never actually executed (schema already complete)
  • No idempotency issue

For Existing Databases:

  • Migration tracking prevents re-running
  • If migration fails, manual intervention required:
    # Option 1: Fix the issue and re-run (migration will retry)
    sqlite3 data/starpunk.db "ALTER TABLE ..."
    # Then restart app - migration will succeed
    
    # Option 2: Mark as applied manually (if change already exists)
    sqlite3 data/starpunk.db \
      "INSERT INTO schema_migrations (migration_name) VALUES ('001_...');"
    

Helper Function (optional, if needed):

def column_exists(conn, table_name, column_name):
    """Check if column exists in table (helper for conditional migrations)"""
    cursor = conn.execute(f"PRAGMA table_info({table_name})")
    columns = [row[1] for row in cursor.fetchall()]
    return column_name in columns

Use in Migration (if absolutely necessary):

# In migration file (rare case where you need idempotency)
if not column_exists(conn, 'auth_state', 'code_verifier'):
    conn.execute("ALTER TABLE auth_state ADD COLUMN code_verifier TEXT NOT NULL DEFAULT ''")

Decision: Do NOT use helper functions by default. Only add if specific migration requires it. Prefer Q1 solution (fresh database detection).

Q4: Migration Filename Validation

Question: Should we enforce strict \d{3}_description.sql pattern or be flexible?

Decision: Flexible with strong convention

Pattern:

  • Recommended: \d{3}_lowercase_with_underscores.sql (e.g., 001_add_code_verifier.sql)
  • Required: Must be .sql file, must start with digits, must be sortable
  • Sorting: Alphanumeric sort determines execution order

Rationale:

  1. Simplicity: Glob pattern *.sql + alphanumeric sort is simplest
  2. Error Tolerance: Don't fail on filename format (warn instead)
  3. Developer Freedom: Allow variations (001.sql, 0001_desc.sql, etc.)
  4. Order Matters: Only requirement is deterministic sort order

Implementation:

def discover_migration_files(migrations_dir):
    """
    Discover all migration files in migrations directory
    Files must be .sql and sortable alphanumerically
    """
    if not migrations_dir.exists():
        return []

    migration_files = []
    for file_path in migrations_dir.glob("*.sql"):
        migration_files.append((file_path.name, file_path))

    # Sort alphanumerically (001_... before 002_...)
    migration_files.sort(key=lambda x: x[0])

    return migration_files

Validation (optional warning):

import re

RECOMMENDED_PATTERN = re.compile(r'^\d{3}_[a-z0-9_]+\.sql$')

for migration_name, _ in migration_files:
    if not RECOMMENDED_PATTERN.match(migration_name):
        logger.warning(
            f"Migration {migration_name} doesn't follow recommended pattern: "
            f"NNN_lowercase_description.sql"
        )

Decision: Implement glob + sort (required), skip validation (optional).

Q5: Existing Database Migration Path

Question: How do existing StarPunk users transition when they upgrade to the version with automatic migrations?

Decision: Automatic and transparent

Scenario Analysis:

Scenario A: Database created BEFORE code_verifier feature

  • Database exists, has auth_state table WITHOUT code_verifier column
  • User upgrades to version with automatic migrations
  • On startup: run_migrations() executes
  • Migration 001 runs: ALTER TABLE auth_state ADD COLUMN code_verifier...
  • Result: Database updated, migration tracked

Scenario B: Database created AFTER code_verifier feature but BEFORE automatic migrations

  • Database exists, has auth_state table WITH code_verifier column
  • schema_migrations table does NOT exist
  • User upgrades to version with automatic migrations
  • On startup: run_migrations() executes
  • create_migrations_table() creates tracking table
  • Problem: Migration 001 will try to add existing column and FAIL

Solution for Scenario B: Fresh database detection (Q1 solution)

# Detect if database has code_verifier but no migration tracking
# This indicates database created between PKCE feature and migration system

cursor = conn.execute("PRAGMA table_info(auth_state)")
columns = [row[1] for row in cursor.fetchall()]
has_code_verifier = 'code_verifier' in columns

cursor = conn.execute("SELECT COUNT(*) FROM schema_migrations")
migration_count = cursor.fetchone()[0]

if migration_count == 0 and has_code_verifier:
    # Database created after PKCE but before migrations
    # Mark all migrations as applied
    for migration_name, _ in migration_files:
        conn.execute(
            "INSERT INTO schema_migrations (migration_name) VALUES (?)",
            (migration_name,)
        )
    conn.commit()
    logger.info("Existing database: migrations marked as applied")
    return

Refined Solution: Check if SCHEMA_SQL is already applied

def is_schema_current(conn):
    """
    Check if database schema matches current SCHEMA_SQL
    Heuristic: Check for latest schema feature (code_verifier column)
    """
    cursor = conn.execute("PRAGMA table_info(auth_state)")
    columns = [row[1] for row in cursor.fetchall()]
    return 'code_verifier' in columns

def run_migrations(db_path, logger=None):
    # ... setup ...

    cursor = conn.execute("SELECT COUNT(*) FROM schema_migrations")
    migration_count = cursor.fetchone()[0]

    # If no migrations recorded AND schema is current, mark all as applied
    if migration_count == 0 and is_schema_current(conn):
        for migration_name, _ in migration_files:
            conn.execute(
                "INSERT INTO schema_migrations (migration_name) VALUES (?)",
                (migration_name,)
            )
        conn.commit()
        logger.info(f"Database up-to-date: marked {len(migration_files)} migrations as applied")
        return

    # Otherwise apply pending migrations...

User Impact: None. Upgrade is automatic and transparent.

Q6: Column Existence Helpers

Question: Should we provide helper functions for checking column/table existence, or keep it pure SQL?

Decision: Provide optional helper, but don't use by default

Rationale:

  • Primary solution is fresh database detection (Q1/Q5)
  • Helpers useful for edge cases only
  • Keep migration system simple by default
  • Document helpers for future use

Helpers to Provide (in migrations.py):

def table_exists(conn, table_name):
    """Check if table exists in database"""
    cursor = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='table' AND name=?",
        (table_name,)
    )
    return cursor.fetchone() is not None

def column_exists(conn, table_name, column_name):
    """Check if column exists in table"""
    cursor = conn.execute(f"PRAGMA table_info({table_name})")
    columns = [row[1] for row in cursor.fetchall()]
    return column_name in columns

def index_exists(conn, index_name):
    """Check if index exists in database"""
    cursor = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='index' AND name=?",
        (index_name,)
    )
    return cursor.fetchone() is not None

Documentation:

"""
Helper functions for conditional migrations (advanced usage only)

These are provided for edge cases where migrations need conditional logic.
In most cases, the migration system's fresh database detection handles
idempotency automatically.

Example usage in migration:
    from starpunk.migrations import column_exists

    if not column_exists(conn, 'notes', 'published_url'):
        conn.execute("ALTER TABLE notes ADD COLUMN published_url TEXT")
"""

Decision: Include helpers in migrations.py, document as "advanced usage", don't use in migration 001 (use fresh DB detection instead).

Q7: SCHEMA_SQL Purpose Clarification

Question: What should SCHEMA_SQL represent - initial state, current state, or minimal state?

Decision: SCHEMA_SQL is the complete current state (target schema after all migrations)

Definition:

SCHEMA_SQL = {Complete database schema as of the current version}
           = {Initial schema} + {All migrations applied}

Guidelines:

  1. When adding a new feature with schema changes:

    • Add new tables/columns to SCHEMA_SQL
    • Create migration file for existing databases
    • Example: PKCE feature added code_verifier to both SCHEMA_SQL and migration 001
  2. When creating a fresh database:

    • Execute SCHEMA_SQL → complete schema immediately
    • Mark all migrations as applied (never execute them)
  3. When upgrading existing database:

    • SCHEMA_SQL already executed (during original creation)
    • Run pending migrations to reach current state
    • Each migration is a "delta" to reach SCHEMA_SQL

Maintenance Rules:

DO:

  • Update SCHEMA_SQL when schema changes
  • Create migration for same change
  • Keep SCHEMA_SQL as single source of truth for "current state"

DON'T:

  • Remove changes from SCHEMA_SQL (only add)
  • Create migration without updating SCHEMA_SQL
  • Expect SCHEMA_SQL to be "minimal" or "initial"

Example Workflow:

Adding Tags Feature:

  1. Update SCHEMA_SQL: Add tags table and note_tags junction table
  2. Create migration: 002_add_tags_table.sql with same SQL
  3. Fresh installs: Get tags via SCHEMA_SQL, migration 002 marked as applied
  4. Existing installs: Migration 002 executes, adds tags table

Migration 002 SQL (mirrors SCHEMA_SQL):

-- Migration: Add tags table
-- Date: 2025-11-19

CREATE TABLE IF NOT EXISTS tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);

CREATE TABLE IF NOT EXISTS note_tags (
    note_id INTEGER NOT NULL,
    tag_id INTEGER NOT NULL,
    PRIMARY KEY (note_id, tag_id),
    FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_note_tags_note ON note_tags(note_id);
CREATE INDEX IF NOT EXISTS idx_note_tags_tag ON note_tags(tag_id);

SCHEMA_SQL Update (same content as migration):

SCHEMA_SQL = """
-- Notes metadata (content is in files)
CREATE TABLE IF NOT EXISTS notes (
    -- ... existing ...
);

-- ... existing tables ...

-- Tags table (added in migration 002)
CREATE TABLE IF NOT EXISTS tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);

-- Note-Tag junction table (added in migration 002)
CREATE TABLE IF NOT EXISTS note_tags (
    note_id INTEGER NOT NULL,
    tag_id INTEGER NOT NULL,
    PRIMARY KEY (note_id, tag_id),
    FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_note_tags_note ON note_tags(note_id);
CREATE INDEX IF NOT EXISTS idx_note_tags_tag ON note_tags(tag_id);
"""

Summary of Architectural Decisions

Question Decision Implementation
Q1: Chicken-and-egg problem SCHEMA_SQL is target state, auto-skip migrations on fresh DBs Fresh database detection in run_migrations()
Q2: schema_migrations location Only in migrations.py, NOT in SCHEMA_SQL create_migrations_table() creates it
Q3: ALTER TABLE idempotency Accept non-idempotency, rely on tracking Migration tracking prevents re-runs
Q4: Filename validation Flexible: *.sql + alphanumeric sort No strict validation, warn if off-pattern
Q5: Existing database transition Automatic via fresh DB detection Check code_verifier existence heuristic
Q6: Column helpers Provide but don't use by default Include in migrations.py for advanced use
Q7: SCHEMA_SQL purpose Complete current state (target schema) Update SCHEMA_SQL with every schema change

Implementation Specification Updates

Modified: starpunk/migrations.py

Add fresh database detection:

def is_schema_current(conn):
    """
    Check if database schema is current (matches SCHEMA_SQL)

    Uses heuristic: Check for presence of latest schema features
    Currently checks for code_verifier column in auth_state table

    Args:
        conn: SQLite connection

    Returns:
        bool: True if schema appears current, False if legacy
    """
    try:
        cursor = conn.execute("PRAGMA table_info(auth_state)")
        columns = [row[1] for row in cursor.fetchall()]
        return 'code_verifier' in columns
    except sqlite3.OperationalError:
        # Table doesn't exist - definitely not current
        return False


def table_exists(conn, table_name):
    """Check if table exists in database"""
    cursor = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='table' AND name=?",
        (table_name,)
    )
    return cursor.fetchone() is not None


def column_exists(conn, table_name, column_name):
    """Check if column exists in table"""
    try:
        cursor = conn.execute(f"PRAGMA table_info({table_name})")
        columns = [row[1] for row in cursor.fetchall()]
        return column_name in columns
    except sqlite3.OperationalError:
        return False


def index_exists(conn, index_name):
    """Check if index exists in database"""
    cursor = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='index' AND name=?",
        (index_name,)
    )
    return cursor.fetchone() is not None


def run_migrations(db_path, logger=None):
    """
    Run all pending database migrations

    Fresh Database Behavior:
    - If schema_migrations table is empty AND schema is current
    - Marks all migrations as applied (skip execution)
    - This handles databases created with current SCHEMA_SQL

    Existing Database Behavior:
    - Applies only pending migrations
    - Migrations already in schema_migrations are skipped

    Args:
        db_path: Path to SQLite database file
        logger: Optional logger for output

    Raises:
        MigrationError: If any migration fails to apply
    """
    if logger is None:
        logger = logging.getLogger(__name__)

    # Determine migrations directory
    migrations_dir = Path(__file__).parent.parent / "migrations"

    if not migrations_dir.exists():
        logger.warning(f"Migrations directory not found: {migrations_dir}")
        return

    # Connect to database
    conn = sqlite3.connect(db_path)

    try:
        # Ensure migrations tracking table exists
        create_migrations_table(conn)

        # Check if this is a fresh database with current schema
        cursor = conn.execute("SELECT COUNT(*) FROM schema_migrations")
        migration_count = cursor.fetchone()[0]

        # Discover migration files
        migration_files = discover_migration_files(migrations_dir)

        if not migration_files:
            logger.info("No migration files found")
            return

        # Fresh database detection
        if migration_count == 0:
            if is_schema_current(conn):
                # Schema is current - mark all migrations as applied
                for migration_name, _ in migration_files:
                    conn.execute(
                        "INSERT INTO schema_migrations (migration_name) VALUES (?)",
                        (migration_name,)
                    )
                conn.commit()
                logger.info(
                    f"Fresh database detected: marked {len(migration_files)} "
                    f"migrations as applied (schema already current)"
                )
                return
            else:
                logger.info("Legacy database detected: applying all migrations")

        # Get already-applied migrations
        applied = get_applied_migrations(conn)

        # Apply pending migrations
        pending_count = 0
        for migration_name, migration_path in migration_files:
            if migration_name not in applied:
                apply_migration(conn, migration_name, migration_path, logger)
                pending_count += 1

        # Summary
        total_count = len(migration_files)
        if pending_count > 0:
            logger.info(
                f"Migrations complete: {pending_count} applied, "
                f"{total_count} total"
            )
        else:
            logger.info(f"All migrations up to date ({total_count} total)")

    except MigrationError:
        raise

    except Exception as e:
        error_msg = f"Migration system error: {e}"
        logger.error(error_msg)
        raise MigrationError(error_msg)

    finally:
        conn.close()

Modified: SCHEMA_SQL Maintenance

SCHEMA_SQL does NOT change - it already includes code_verifier (correct).

Rule for Future Changes:

  1. Add new schema elements to SCHEMA_SQL
  2. Create corresponding migration file
  3. Migration contains same SQL as SCHEMA_SQL addition
  4. Fresh installs get it from SCHEMA_SQL
  5. Existing installs get it from migration

Migration 001 Status

No changes needed to 001_add_code_verifier_to_auth_state.sql

Behavior:

  • Fresh databases: Never executes (marked as applied via fresh DB detection)
  • Legacy databases (before PKCE): Executes successfully (column doesn't exist)
  • Mid-version databases (after PKCE, before migrations): Never executes (fresh DB detection)

This is correct and requires no changes.

What We Learned

  1. Simplicity Wins: 150 lines beats thousands in Alembic for our use case
  2. Container Requirements: Modern deployment requires automatic initialization
  3. SQLite Is Sufficient: No need for complex migration frameworks
  4. Sequential Works: Numbering beats timestamps for small teams
  5. Forward-Only Is OK: Rollback capability rarely needed in practice
  6. Fresh DB Detection Solves Bootstrap: Heuristic check prevents chicken-and-egg problems
  7. SCHEMA_SQL as Target State: Clearest mental model for developers
  8. Migration Tracking Is Primary Safety: SQL idempotency is secondary

Decided: 2025-11-19 Updated: 2025-11-19 (Developer Q&A section added) Author: StarPunk Architect Implements: Automatic database migration system Version Impact: MINOR increment recommended