Files

Phil Skentelbery ebca9064c5 docs: Add ADR-020 and migration system implementation guidance

Architecture documentation for automatic database migrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-19 16:11:17 -07:00

51 KiB

Raw Blame History

ADR-020: Automatic Database Migration System

Status

Accepted

Context

StarPunk currently requires manual database migration execution before starting the application. This creates operational friction and is particularly problematic in containerized deployments where the database schema must be initialized automatically on startup.

Current State

Database: SQLite at data/starpunk.db
Initial Schema: Defined in starpunk/database.py as SCHEMA_SQL constant
Migrations: SQL files in migrations/ directory (e.g., 001_add_code_verifier_to_auth_state.sql)
Initialization: init_db() creates tables using CREATE TABLE IF NOT EXISTS
Problem: Schema changes require manual SQL execution, no tracking of applied migrations

Pain Points

Manual Intervention Required: Deploying schema changes requires SSH access and manual SQL execution
No Migration History: No way to know which migrations have been applied to a database
Error-Prone: Easy to forget migrations or apply them out of order
Container Unfriendly: Containers should be stateless and self-initializing
Development Friction: Each developer must manually track and apply migrations
Testing Complexity: Test databases require manual migration setup

Requirements

Automatic Execution: Migrations run on application startup
Idempotency: Safe to run multiple times, only applies pending migrations
Order Preservation: Migrations applied in deterministic order
Tracking: Record which migrations have been applied
Safety: Clear errors, no partial application of migrations
Simplicity: Minimal complexity, easy to understand and debug
Container Compatible: Works in ephemeral container environments
Developer Friendly: Easy to add new migrations

Decision

Implement an automatic sequential migration system that runs on application startup, using numbered SQL files and a migration tracking table.

Core Components

Migration Tracking Table: schema_migrations table in SQLite
Migration Files: Sequentially numbered .sql files in migrations/ directory
Migration Runner: run_migrations() function in starpunk/migrations.py
Integration Point: Called from init_db() in starpunk/database.py

Migration File Format

Naming Convention: {number:03d}_{description}.sql

Examples:

001_add_code_verifier_to_auth_state.sql
002_add_tags_table.sql
003_add_note_syndication_urls.sql

Format Rules:

Three-digit zero-padded number (001, 002, 003, ...)
Underscore separator
Lowercase descriptive name with underscores
.sql extension

File Content:

-- Migration: {Description}
-- Date: {YYYY-MM-DD}
-- ADR: {ADR reference if applicable}

{SQL statements}

-- Each statement should be idempotent where possible
-- Use IF NOT EXISTS for CREATE TABLE/INDEX
-- Use default values for ALTER TABLE ADD COLUMN

Migration Tracking Schema

CREATE TABLE IF NOT EXISTS schema_migrations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    migration_name TEXT UNIQUE NOT NULL,
    applied_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_schema_migrations_name
    ON schema_migrations(migration_name);

Fields:

id: Auto-increment primary key
migration_name: Filename of migration (e.g., 001_add_code_verifier_to_auth_state.sql)
applied_at: Timestamp when migration was applied

Migration Discovery and Execution

Algorithm:

Initialize tracking table (if not exists)
Discover migration files in migrations/ directory
Sort by filename (numeric prefix ensures order)
Check each migration against schema_migrations table
Apply pending migrations in order
Record successful migrations in tracking table
Fail fast on any error with clear message

Execution Order:

Alphanumeric sort of filenames ensures correct order
001_*.sql runs before 002_*.sql
New migrations added with next available number

SQLite Transaction Handling

Approach: Execute each migration in a transaction

Implementation:

try:
    conn.execute("BEGIN")
    conn.executescript(migration_sql)
    conn.execute(
        "INSERT INTO schema_migrations (migration_name) VALUES (?)",
        (migration_file,)
    )
    conn.commit()
except Exception as e:
    conn.rollback()
    raise MigrationError(f"Migration {migration_file} failed: {e}")

Note on SQLite DDL: SQLite does not support full rollback of DDL statements (CREATE TABLE, ALTER TABLE) within transactions. However:

Most DDL is auto-committed immediately
Migration failures leave partial state
Mitigation: Write idempotent migrations using IF NOT EXISTS, DEFAULT values, etc.
Recovery: Failed migrations must be manually fixed, then re-run

Integration Points

In starpunk/database.py:

def init_db(app=None):
    """
    Initialize database schema and run migrations

    Args:
        app: Flask application instance (optional, for config access)
    """
    if app:
        db_path = app.config["DATABASE_PATH"]
        logger = app.logger
    else:
        db_path = Path("./data/starpunk.db")
        logger = None

    # Ensure parent directory exists
    db_path.parent.mkdir(parents=True, exist_ok=True)

    # Create initial schema
    conn = sqlite3.connect(db_path)
    try:
        conn.executescript(SCHEMA_SQL)
        conn.commit()
        if logger:
            logger.info(f"Database initialized: {db_path}")
    finally:
        conn.close()

    # Run migrations
    from starpunk.migrations import run_migrations
    run_migrations(db_path, logger=logger)

Call Order:

create_app() → init_db(app)
init_db() → create base schema → run_migrations()
run_migrations() → apply pending migrations
Application starts serving requests

Error Handling

Error Types:

Migration File Error: Invalid SQL syntax
- Action: Log error with filename and line number
- Result: Application fails to start
- Recovery: Fix migration SQL, restart
Migration Conflict: Two migrations with same number
- Action: Log error listing conflicting files
- Result: Application fails to start
- Recovery: Renumber migrations, restart
Database Lock: SQLite database locked
- Action: Retry with exponential backoff (3 attempts)
- Result: Fail if still locked after retries
- Recovery: Ensure no other processes accessing database
Partial Migration: Migration fails mid-execution
- Action: Log error with migration name and error details
- Result: Application fails to start
- Recovery: Fix issue manually, restart (migration will retry)

Error Message Format:

[ERROR] Database migration failed: 002_add_tags_table.sql
Reason: near "CRATE": syntax error at line 5
Action required: Fix migration file and restart application

Logging Strategy

Log Levels:

INFO: Migration discovery and successful application

[INFO] Discovered 3 migration files
[INFO] Applied migration: 001_add_code_verifier_to_auth_state.sql
[INFO] All migrations applied successfully (3 total, 1 pending)

DEBUG: Detailed migration execution

[DEBUG] Migration file: 001_add_code_verifier_to_auth_state.sql
[DEBUG] Migration status: pending
[DEBUG] Executing migration SQL...
[DEBUG] Migration recorded in schema_migrations

WARNING: Unusual but non-fatal conditions

[WARNING] No migrations directory found, skipping migrations
[WARNING] Migrations directory empty

ERROR: Migration failures

[ERROR] Migration failed: 002_add_tags_table.sql
[ERROR] Database error: near "CRATE": syntax error

Logging Output:

Development: Console (handled by Flask logger)
Production: Container logs (stdout/stderr)
Format: Timestamp, level, message

Developer Workflow

Adding a New Migration:

Create migration file:

# Determine next number
ls migrations/ | tail -1
# Creates: 002_add_tags_table.sql

touch migrations/002_add_tags_table.sql

Write migration SQL:

-- Migration: Add tags table
-- Date: 2025-11-19
-- ADR: ADR-025-tags-feature

CREATE TABLE IF NOT EXISTS tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);

Test migration:

# Start application (migration runs automatically)
flask --app app.py run

# Check logs for migration success
# [INFO] Applied migration: 002_add_tags_table.sql

Commit migration:

git add migrations/002_add_tags_table.sql
git commit -m "Add tags table migration"

Migration Best Practices:

Make migrations additive: Add columns, don't remove (mark as deprecated instead)
Use defaults for new columns: ALTER TABLE ... ADD COLUMN ... DEFAULT ...
Write idempotent SQL: Use IF NOT EXISTS where possible
Test on copy of production database: Verify migration works with real data
Keep migrations small: One logical change per migration
Document purpose: Include header comment explaining change

Rationale

Why Sequential Numbers Instead of Timestamps?

Decision: Use sequential numbers (001, 002, 003)

Alternatives Considered:

Timestamps (20251119_143522_add_tags.sql)
UUIDs (a7b3c9d1-add-tags.sql)
Git SHAs (a7b3c9d-add-tags.sql)

Rationale:

Simplicity: Easy to see order at a glance
No Conflicts: Single developer unlikely to have conflicts
Readability: Shorter filenames
Team Compatible: Even with multiple developers, merge conflicts explicit
Sortability: Lexicographic sort equals execution order

Trade-off: Two developers working on separate branches may create conflicting numbers. Resolution is simple (renumber before merge).

Why Run on Startup Instead of Manual Command?

Decision: Automatic execution on create_app()

Alternatives Considered:

CLI command: flask db migrate
Separate initialization script
Container entrypoint script

Rationale:

Container Friendly: Containers self-initialize on startup
Developer Friendly: git pull + flask run just works
No Forgotten Migrations: Impossible to skip migrations
Idempotent: Safe to run multiple times
Fail Fast: Application won't start with incomplete schema

Trade-off: Application startup slightly slower (negligible for SQLite). Migrations must be fast (<1s each).

Why SQLite Transaction Per Migration?

Decision: Each migration executes in its own transaction

Alternatives Considered:

Single transaction for all migrations
No transaction (auto-commit)

Rationale:

Isolation: Failed migration doesn't affect previously successful ones
Resume: Can continue from last successful migration
SQLite Limitation: DDL statements auto-commit anyway
Tracking: Each successful migration recorded immediately

Trade-off: SQLite DDL rollback is limited. Partial migration may leave inconsistent state requiring manual fix.

Why No Down Migrations?

Decision: Only forward migrations, no rollback

Alternatives Considered:

Paired up/down migrations (Django, Rails style)
Snapshot-based rollback

Rationale:

Simplicity: Half the code, half the complexity
IndieWeb Philosophy: Own your data, fix forward
SQLite Limitations: Limited ALTER TABLE support makes rollbacks difficult
Production Reality: Rollbacks rarely used, risky
Alternative: Restore from backup if needed

Trade-off: Cannot automatically rollback. Must fix forward or restore from backup.

Why In-Application Instead of External Tool?

Decision: Migration runner built into application

Alternatives Considered:

Alembic (SQLAlchemy migrations)
Flask-Migrate (Flask + Alembic)
Custom CLI tool

Rationale:

No Dependencies: Alembic adds complexity and dependencies
Perfect for SQLite: Simple file-based migrations sufficient
Single Codebase: No separate migration tool to maintain
Minimal Code: ~100 lines vs. thousands in Alembic
Alignment: "Every line must justify its existence"

Trade-off: Less powerful than Alembic (no auto-generation, model diffing). For StarPunk's simple schema, this is acceptable.

Consequences

Positive

Zero-Touch Deployment: Containers start with correct schema automatically
Developer Productivity: No manual migration tracking or execution
Safer Deployments: Migrations always applied in correct order
Better Testing: Test databases automatically migrated
Audit Trail: Clear history of schema changes in schema_migrations
Idempotent: Safe to run migrations multiple times
Simple: Easy to understand, debug, and maintain
No Dependencies: Pure Python + SQLite, no external tools

Negative

Startup Time: Migrations add ~50-200ms to startup (negligible)
No Auto-Generation: Migrations must be written manually (acceptable for simple schema)
No Rollback: Cannot automatically undo migrations (restore from backup instead)
SQLite Limitations: Limited ALTER TABLE support, no full DDL transactions
Sequential Conflicts: Multiple developers may create conflicting numbers (rare, easy to fix)

Neutral

Migration File Management: Developers must number migrations correctly
Testing Requirement: Migrations should be tested on production-like data
Documentation Need: Migration best practices should be documented

Implementation Specification

File Structure

starpunk/
├── migrations.py          # NEW: Migration runner
├── database.py            # MODIFIED: Call run_migrations()
├── __init__.py            # No changes
└── config.py              # No changes

migrations/                # EXISTING DIRECTORY
├── 001_add_code_verifier_to_auth_state.sql  # EXISTING
└── 002_*.sql             # Future migrations

New File: `starpunk/migrations.py`

"""
Database migration runner for StarPunk

Automatically discovers and applies pending migrations on startup.
Migrations are numbered SQL files in the migrations/ directory.
"""

import sqlite3
from pathlib import Path
import logging


class MigrationError(Exception):
    """Raised when a migration fails to apply"""
    pass


def create_migrations_table(conn):
    """
    Create schema_migrations tracking table if it doesn't exist

    Args:
        conn: SQLite connection
    """
    conn.execute("""
        CREATE TABLE IF NOT EXISTS schema_migrations (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            migration_name TEXT UNIQUE NOT NULL,
            applied_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
        )
    """)

    conn.execute("""
        CREATE INDEX IF NOT EXISTS idx_schema_migrations_name
            ON schema_migrations(migration_name)
    """)

    conn.commit()


def get_applied_migrations(conn):
    """
    Get set of already-applied migration names

    Args:
        conn: SQLite connection

    Returns:
        set: Set of migration filenames that have been applied
    """
    cursor = conn.execute(
        "SELECT migration_name FROM schema_migrations ORDER BY id"
    )
    return set(row[0] for row in cursor.fetchall())


def discover_migration_files(migrations_dir):
    """
    Discover all migration files in migrations directory

    Args:
        migrations_dir: Path to migrations directory

    Returns:
        list: Sorted list of (filename, full_path) tuples
    """
    if not migrations_dir.exists():
        return []

    migration_files = []
    for file_path in migrations_dir.glob("*.sql"):
        migration_files.append((file_path.name, file_path))

    # Sort by filename (numeric prefix ensures correct order)
    migration_files.sort(key=lambda x: x[0])

    return migration_files


def apply_migration(conn, migration_name, migration_path, logger=None):
    """
    Apply a single migration file

    Args:
        conn: SQLite connection
        migration_name: Filename of migration
        migration_path: Full path to migration file
        logger: Optional logger for output

    Raises:
        MigrationError: If migration fails to apply
    """
    try:
        # Read migration SQL
        migration_sql = migration_path.read_text()

        if logger:
            logger.debug(f"Applying migration: {migration_name}")

        # Execute migration in transaction
        conn.execute("BEGIN")
        conn.executescript(migration_sql)

        # Record migration as applied
        conn.execute(
            "INSERT INTO schema_migrations (migration_name) VALUES (?)",
            (migration_name,)
        )

        conn.commit()

        if logger:
            logger.info(f"Applied migration: {migration_name}")

    except Exception as e:
        conn.rollback()
        error_msg = f"Migration {migration_name} failed: {e}"
        if logger:
            logger.error(error_msg)
        raise MigrationError(error_msg)


def run_migrations(db_path, logger=None):
    """
    Run all pending database migrations

    Called automatically during database initialization.
    Discovers migration files, checks which have been applied,
    and applies any pending migrations in order.

    Args:
        db_path: Path to SQLite database file
        logger: Optional logger for output

    Raises:
        MigrationError: If any migration fails to apply
    """
    if logger is None:
        logger = logging.getLogger(__name__)

    # Determine migrations directory
    # Assumes migrations/ is in project root, sibling to starpunk/
    migrations_dir = Path(__file__).parent.parent / "migrations"

    if not migrations_dir.exists():
        logger.warning(f"Migrations directory not found: {migrations_dir}")
        return

    # Connect to database
    conn = sqlite3.connect(db_path)

    try:
        # Ensure migrations tracking table exists
        create_migrations_table(conn)

        # Get already-applied migrations
        applied = get_applied_migrations(conn)

        # Discover migration files
        migration_files = discover_migration_files(migrations_dir)

        if not migration_files:
            logger.info("No migration files found")
            return

        # Apply pending migrations
        pending_count = 0
        for migration_name, migration_path in migration_files:
            if migration_name not in applied:
                apply_migration(conn, migration_name, migration_path, logger)
                pending_count += 1

        # Summary
        total_count = len(migration_files)
        if pending_count > 0:
            logger.info(
                f"Migrations complete: {pending_count} applied, "
                f"{total_count} total"
            )
        else:
            logger.info(f"All migrations up to date ({total_count} total)")

    except MigrationError:
        # Re-raise migration errors (already logged)
        raise

    except Exception as e:
        error_msg = f"Migration system error: {e}"
        logger.error(error_msg)
        raise MigrationError(error_msg)

    finally:
        conn.close()

Modified File: `starpunk/database.py`

Changes:

Import migration runner at top:

from starpunk.migrations import run_migrations

Modify init_db() to call migrations:

def init_db(app=None):
    """
    Initialize database schema and run migrations

    Args:
        app: Flask application instance (optional, for config access)
    """
    if app:
        db_path = app.config["DATABASE_PATH"]
        logger = app.logger
    else:
        # Fallback to default path
        db_path = Path("./data/starpunk.db")
        logger = None

    # Ensure parent directory exists
    db_path.parent.mkdir(parents=True, exist_ok=True)

    # Create database and initial schema
    conn = sqlite3.connect(db_path)
    try:
        conn.executescript(SCHEMA_SQL)
        conn.commit()
        if logger:
            logger.info(f"Database initialized: {db_path}")
        else:
            print(f"Database initialized: {db_path}")
    finally:
        conn.close()

    # Run migrations
    run_migrations(db_path, logger=logger)

Migration Tracking Table SQL

Location: Created automatically by create_migrations_table()

CREATE TABLE IF NOT EXISTS schema_migrations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    migration_name TEXT UNIQUE NOT NULL,
    applied_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_schema_migrations_name
    ON schema_migrations(migration_name);

Example Migration File

File: migrations/002_add_tags_table.sql

-- Migration: Add tags table for note categorization
-- Date: 2025-11-19
-- ADR: ADR-025-tags-feature

-- Tags table
CREATE TABLE IF NOT EXISTS tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);

-- Note-Tag junction table
CREATE TABLE IF NOT EXISTS note_tags (
    note_id INTEGER NOT NULL,
    tag_id INTEGER NOT NULL,
    PRIMARY KEY (note_id, tag_id),
    FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_note_tags_note ON note_tags(note_id);
CREATE INDEX IF NOT EXISTS idx_note_tags_tag ON note_tags(tag_id);

Testing Strategy

Unit Tests

Test File: tests/test_migrations.py

Test Cases:

test_create_migrations_table(): Verify table created with correct schema
test_get_applied_migrations(): Verify retrieval of applied migrations
test_discover_migration_files(): Verify discovery and sorting
test_apply_migration_success(): Verify successful migration application
test_apply_migration_failure(): Verify error handling and rollback
test_run_migrations_empty(): Verify behavior with no migrations
test_run_migrations_all_applied(): Verify idempotency
test_run_migrations_partial(): Verify applying only pending migrations
test_run_migrations_order(): Verify migrations applied in correct order

Integration Tests

Test File: tests/test_database_init.py

Test Cases:

test_init_db_creates_schema_and_migrations(): Verify full initialization
test_init_db_idempotent(): Verify safe to call multiple times
test_migration_applied_on_startup(): Verify app startup applies migrations

Manual Testing

Procedure:

Fresh Database:

rm data/starpunk.db
flask --app app.py run
# Verify: [INFO] Applied migration: 001_add_code_verifier_to_auth_state.sql

Existing Database:

flask --app app.py run
# Verify: [INFO] All migrations up to date (1 total)

Add New Migration:

echo "-- Test migration" > migrations/002_test.sql
flask --app app.py run
# Verify: [INFO] Applied migration: 002_test.sql

Migration Failure:

echo "INVALID SQL;" > migrations/003_fail.sql
flask --app app.py run
# Verify: [ERROR] Migration 003_fail.sql failed: near "INVALID": syntax error

Container Startup:

docker run -v $(pwd)/data:/app/data starpunk
# Verify: Migrations applied automatically

Migration Management Guide

Adding a New Migration

Step-by-Step:

Determine next number:

ls migrations/ | tail -1
# Output: 001_add_code_verifier_to_auth_state.sql
# Next: 002

Create migration file:

touch migrations/002_add_tags_table.sql

Write migration SQL:

-- Migration: Add tags table
-- Date: 2025-11-19
-- ADR: ADR-025-tags-feature

CREATE TABLE IF NOT EXISTS tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);

Test migration locally:

# Backup database
cp data/starpunk.db data/starpunk.db.backup

# Run application (migration auto-applies)
flask --app app.py run

# Check logs for success
# Verify database schema
sqlite3 data/starpunk.db ".schema tags"

Commit migration:

git add migrations/002_add_tags_table.sql
git commit -m "Add tags table migration"

Handling Migration Conflicts

Scenario: Two developers create migration 002 on different branches

Resolution:

Developer A: Created 002_add_tags_table.sql on feature/tags
Developer B: Created 002_add_comments_table.sql on feature/comments
Developer A merges first: 002_add_tags_table.sql is in main

Developer B rebases:

git checkout feature/comments
git rebase main
# Conflict: both have 002_*.sql

# Renumber Developer B's migration
git mv migrations/002_add_comments_table.sql \
        migrations/003_add_comments_table.sql

git add migrations/003_add_comments_table.sql
git rebase --continue

Rolling Back a Migration

Not Supported Automatically

Manual Procedure:

Restore from backup:

cp data/starpunk.db.backup data/starpunk.db

OR Fix forward:

-- Create new migration that undoes change
-- migrations/004_remove_tags_table.sql
DROP TABLE IF EXISTS tags;

OR Manual SQL:

sqlite3 data/starpunk.db
sqlite> DROP TABLE tags;
sqlite> DELETE FROM schema_migrations
        WHERE migration_name = '002_add_tags_table.sql';
sqlite> .quit

Best Practices for Writing Migrations

Always use IF NOT EXISTS for CREATE statements
Always use DEFAULT for new NOT NULL columns
Test on production data copy before deploying
Keep migrations small - one logical change per file
Document purpose in header comment
Make migrations additive when possible
Avoid data transformations in structure migrations
Use descriptive names for migration files

Good Migration:

-- Migration: Add published_url column to notes
-- Date: 2025-11-19
-- ADR: ADR-026-syndication-tracking

ALTER TABLE notes
    ADD COLUMN published_url TEXT DEFAULT NULL;

CREATE INDEX IF NOT EXISTS idx_notes_published_url
    ON notes(published_url);

Bad Migration:

-- Migration: Updates
-- Date: 2025-11-19

ALTER TABLE notes ADD COLUMN url TEXT NOT NULL;  -- FAILS: no default!
DROP TABLE old_stuff;  -- Destructive!
UPDATE notes SET ...;  -- Data transformation, hard to debug

Version Impact

Change Type: Infrastructure improvement (new feature)

Semantic Versioning Analysis:

Adds new functionality: Automatic migrations
Backward compatible: Existing databases work, migrations optional
No breaking changes: API unchanged, behavior compatible
Infrastructure improvement: Developer experience enhancement

Recommended Version: MINOR increment (e.g., 0.8.0 → 0.9.0)

Rationale: Adds significant new functionality (automatic migrations) but maintains full backward compatibility.

Compliance

Project Standards

Minimal Code: ~150 lines for complete migration system
No Dependencies: Pure Python + SQLite, no external tools
Standards First: Follows standard migration patterns
Single Responsibility: Migration system does one thing well
Documentation as Code: Migrations self-document schema changes

Security Considerations

SQL Injection: Migration files are trusted code (not user input)
File Access: Only reads from trusted migrations/ directory
Database Access: Uses existing database connection patterns
Error Exposure: Logs sanitized error messages only

IndieWeb Compatibility

Data Ownership: Migration tracking stored in user's database
Portability: Standard SQL migrations, easily portable
Self-Hosting: No external services required
Transparency: Clear audit trail of schema changes

References

Migration System Patterns

Django Migrations: Inspiration for tracking table
Rails ActiveRecord Migrations: Inspiration for sequential numbering
Flyway: Inspiration for SQL-based migrations
Alembic: Considered but rejected (too complex for needs)

SQLite Documentation

SQLite Transaction Support: https://www.sqlite.org/lang_transaction.html
SQLite ALTER TABLE: https://www.sqlite.org/lang_altertable.html
SQLite Limitations: https://www.sqlite.org/limits.html

Internal Documentation

ADR-004: File-based note storage (similar pattern)
ADR-008: Versioning strategy (migration impact)
docs/standards/versioning-strategy.md: Version management

Developer Questions & Architectural Responses

This section addresses critical implementation questions identified during developer review.

Q1: SCHEMA_SQL Chicken-and-Egg Problem

Question: Current SCHEMA_SQL (line 60 in database.py) already includes code_verifier TEXT NOT NULL DEFAULT '' in the auth_state table. Migration 001_add_code_verifier_to_auth_state.sql tries to add the same column. On fresh databases, this fails because the column already exists.

Decision: SCHEMA_SQL represents the complete target state (current schema after all migrations applied)

Rationale:

Fresh installs should get the latest schema immediately (no migration overhead)
Existing installs need migrations to reach the target state
This is the standard pattern used by Django, Rails, and other frameworks
Migrations are time-based snapshots, SCHEMA_SQL is the destination

Implementation:

Keep code_verifier in SCHEMA_SQL - It's part of the current schema
Migration 001 is for existing databases only - Databases created before PKCE feature
Auto-skip migrations on fresh installs - Detect and skip migrations already in SCHEMA_SQL

Solution Pattern:

def run_migrations(db_path, logger=None):
    # ... existing code ...

    # Check if this is a fresh database (no schema_migrations table existed before we created it)
    cursor = conn.execute(
        "SELECT COUNT(*) FROM schema_migrations"
    )
    migration_count = cursor.fetchone()[0]

    # If fresh database (0 migrations recorded), mark all migrations as applied
    # since SCHEMA_SQL already contains all changes
    if migration_count == 0:
        for migration_name, _ in migration_files:
            conn.execute(
                "INSERT INTO schema_migrations (migration_name) VALUES (?)",
                (migration_name,)
            )
        conn.commit()
        logger.info(f"Fresh database: marked {len(migration_files)} migrations as applied")
        return

    # Otherwise, apply pending migrations normally
    # ... existing migration application code ...

Consequence: Fresh installs never run migrations (already at target state), existing installs run only pending migrations.

Q2: schema_migrations Table Location

Question: Should the schema_migrations table be in SCHEMA_SQL or only created by migrations.py?

Decision: Only in migrations.py - Do NOT add to SCHEMA_SQL

Rationale:

Separation of Concerns: Migration tracking is infrastructure, not application schema
Detection Mechanism: Absence of table indicates fresh database (see Q1 solution)
Cleaner Schema: Application schema stays focused on application tables
Migration System Ownership: Migration system creates its own tracking table

Implementation:

create_migrations_table() in migrations.py creates the table
SCHEMA_SQL remains unchanged (no schema_migrations table)
Fresh database detection relies on table non-existence

Q3: ALTER TABLE Idempotency

Question: SQLite doesn't support IF NOT EXISTS for ALTER TABLE ADD COLUMN. How do we make migrations idempotent?

Decision: Accept non-idempotency, rely on migration tracking

Rationale:

SQL Limitation: SQLite ALTER TABLE operations are not inherently idempotent
Tracking Is Sufficient: schema_migrations table prevents re-application
Failure Handling: Failed migrations leave clear error messages
Production Reality: Migrations rarely fail, and when they do, they need manual intervention anyway

Implementation:

For Fresh Databases (Q1 solution):

All migrations automatically marked as applied
Never actually executed (schema already complete)
No idempotency issue

For Existing Databases:

Migration tracking prevents re-running

If migration fails, manual intervention required:

# Option 1: Fix the issue and re-run (migration will retry)
sqlite3 data/starpunk.db "ALTER TABLE ..."
# Then restart app - migration will succeed

# Option 2: Mark as applied manually (if change already exists)
sqlite3 data/starpunk.db \
  "INSERT INTO schema_migrations (migration_name) VALUES ('001_...');"

Helper Function (optional, if needed):

def column_exists(conn, table_name, column_name):
    """Check if column exists in table (helper for conditional migrations)"""
    cursor = conn.execute(f"PRAGMA table_info({table_name})")
    columns = [row[1] for row in cursor.fetchall()]
    return column_name in columns

Use in Migration (if absolutely necessary):

# In migration file (rare case where you need idempotency)
if not column_exists(conn, 'auth_state', 'code_verifier'):
    conn.execute("ALTER TABLE auth_state ADD COLUMN code_verifier TEXT NOT NULL DEFAULT ''")

Decision: Do NOT use helper functions by default. Only add if specific migration requires it. Prefer Q1 solution (fresh database detection).

Q4: Migration Filename Validation

Question: Should we enforce strict \d{3}_description.sql pattern or be flexible?

Decision: Flexible with strong convention

Pattern:

Recommended: \d{3}_lowercase_with_underscores.sql (e.g., 001_add_code_verifier.sql)
Required: Must be .sql file, must start with digits, must be sortable
Sorting: Alphanumeric sort determines execution order

Rationale:

Simplicity: Glob pattern *.sql + alphanumeric sort is simplest
Error Tolerance: Don't fail on filename format (warn instead)
Developer Freedom: Allow variations (001.sql, 0001_desc.sql, etc.)
Order Matters: Only requirement is deterministic sort order

Implementation:

def discover_migration_files(migrations_dir):
    """
    Discover all migration files in migrations directory
    Files must be .sql and sortable alphanumerically
    """
    if not migrations_dir.exists():
        return []

    migration_files = []
    for file_path in migrations_dir.glob("*.sql"):
        migration_files.append((file_path.name, file_path))

    # Sort alphanumerically (001_... before 002_...)
    migration_files.sort(key=lambda x: x[0])

    return migration_files

Validation (optional warning):

import re

RECOMMENDED_PATTERN = re.compile(r'^\d{3}_[a-z0-9_]+\.sql$')

for migration_name, _ in migration_files:
    if not RECOMMENDED_PATTERN.match(migration_name):
        logger.warning(
            f"Migration {migration_name} doesn't follow recommended pattern: "
            f"NNN_lowercase_description.sql"
        )

Decision: Implement glob + sort (required), skip validation (optional).

Q5: Existing Database Migration Path

Question: How do existing StarPunk users transition when they upgrade to the version with automatic migrations?

Decision: Automatic and transparent

Scenario Analysis:

Scenario A: Database created BEFORE code_verifier feature

Database exists, has auth_state table WITHOUT code_verifier column
User upgrades to version with automatic migrations
On startup: run_migrations() executes
Migration 001 runs: ALTER TABLE auth_state ADD COLUMN code_verifier...
Result: Database updated, migration tracked

Scenario B: Database created AFTER code_verifier feature but BEFORE automatic migrations

Database exists, has auth_state table WITH code_verifier column
schema_migrations table does NOT exist
User upgrades to version with automatic migrations
On startup: run_migrations() executes
create_migrations_table() creates tracking table
Problem: Migration 001 will try to add existing column and FAIL

Solution for Scenario B: Fresh database detection (Q1 solution)

# Detect if database has code_verifier but no migration tracking
# This indicates database created between PKCE feature and migration system

cursor = conn.execute("PRAGMA table_info(auth_state)")
columns = [row[1] for row in cursor.fetchall()]
has_code_verifier = 'code_verifier' in columns

cursor = conn.execute("SELECT COUNT(*) FROM schema_migrations")
migration_count = cursor.fetchone()[0]

if migration_count == 0 and has_code_verifier:
    # Database created after PKCE but before migrations
    # Mark all migrations as applied
    for migration_name, _ in migration_files:
        conn.execute(
            "INSERT INTO schema_migrations (migration_name) VALUES (?)",
            (migration_name,)
        )
    conn.commit()
    logger.info("Existing database: migrations marked as applied")
    return

Refined Solution: Check if SCHEMA_SQL is already applied

def is_schema_current(conn):
    """
    Check if database schema matches current SCHEMA_SQL
    Heuristic: Check for latest schema feature (code_verifier column)
    """
    cursor = conn.execute("PRAGMA table_info(auth_state)")
    columns = [row[1] for row in cursor.fetchall()]
    return 'code_verifier' in columns

def run_migrations(db_path, logger=None):
    # ... setup ...

    cursor = conn.execute("SELECT COUNT(*) FROM schema_migrations")
    migration_count = cursor.fetchone()[0]

    # If no migrations recorded AND schema is current, mark all as applied
    if migration_count == 0 and is_schema_current(conn):
        for migration_name, _ in migration_files:
            conn.execute(
                "INSERT INTO schema_migrations (migration_name) VALUES (?)",
                (migration_name,)
            )
        conn.commit()
        logger.info(f"Database up-to-date: marked {len(migration_files)} migrations as applied")
        return

    # Otherwise apply pending migrations...

User Impact: None. Upgrade is automatic and transparent.

Q6: Column Existence Helpers

Question: Should we provide helper functions for checking column/table existence, or keep it pure SQL?

Decision: Provide optional helper, but don't use by default

Rationale:

Primary solution is fresh database detection (Q1/Q5)
Helpers useful for edge cases only
Keep migration system simple by default
Document helpers for future use

Helpers to Provide (in migrations.py):

def table_exists(conn, table_name):
    """Check if table exists in database"""
    cursor = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='table' AND name=?",
        (table_name,)
    )
    return cursor.fetchone() is not None

def column_exists(conn, table_name, column_name):
    """Check if column exists in table"""
    cursor = conn.execute(f"PRAGMA table_info({table_name})")
    columns = [row[1] for row in cursor.fetchall()]
    return column_name in columns

def index_exists(conn, index_name):
    """Check if index exists in database"""
    cursor = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='index' AND name=?",
        (index_name,)
    )
    return cursor.fetchone() is not None

Documentation:

"""
Helper functions for conditional migrations (advanced usage only)

These are provided for edge cases where migrations need conditional logic.
In most cases, the migration system's fresh database detection handles
idempotency automatically.

Example usage in migration:
    from starpunk.migrations import column_exists

    if not column_exists(conn, 'notes', 'published_url'):
        conn.execute("ALTER TABLE notes ADD COLUMN published_url TEXT")
"""

Decision: Include helpers in migrations.py, document as "advanced usage", don't use in migration 001 (use fresh DB detection instead).

Q7: SCHEMA_SQL Purpose Clarification

Question: What should SCHEMA_SQL represent - initial state, current state, or minimal state?

Decision: SCHEMA_SQL is the complete current state (target schema after all migrations)

Definition:

SCHEMA_SQL = {Complete database schema as of the current version}
           = {Initial schema} + {All migrations applied}

Guidelines:

When adding a new feature with schema changes:
- Add new tables/columns to SCHEMA_SQL
- Create migration file for existing databases
- Example: PKCE feature added code_verifier to both SCHEMA_SQL and migration 001
When creating a fresh database:
- Execute SCHEMA_SQL → complete schema immediately
- Mark all migrations as applied (never execute them)
When upgrading existing database:
- SCHEMA_SQL already executed (during original creation)
- Run pending migrations to reach current state
- Each migration is a "delta" to reach SCHEMA_SQL

Maintenance Rules:

DO:

Update SCHEMA_SQL when schema changes
Create migration for same change
Keep SCHEMA_SQL as single source of truth for "current state"

DON'T:

Remove changes from SCHEMA_SQL (only add)
Create migration without updating SCHEMA_SQL
Expect SCHEMA_SQL to be "minimal" or "initial"

Example Workflow:

Adding Tags Feature:

Update SCHEMA_SQL: Add tags table and note_tags junction table
Create migration: 002_add_tags_table.sql with same SQL
Fresh installs: Get tags via SCHEMA_SQL, migration 002 marked as applied
Existing installs: Migration 002 executes, adds tags table

Migration 002 SQL (mirrors SCHEMA_SQL):

-- Migration: Add tags table
-- Date: 2025-11-19

CREATE TABLE IF NOT EXISTS tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);

CREATE TABLE IF NOT EXISTS note_tags (
    note_id INTEGER NOT NULL,
    tag_id INTEGER NOT NULL,
    PRIMARY KEY (note_id, tag_id),
    FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_note_tags_note ON note_tags(note_id);
CREATE INDEX IF NOT EXISTS idx_note_tags_tag ON note_tags(tag_id);

SCHEMA_SQL Update (same content as migration):

SCHEMA_SQL = """
-- Notes metadata (content is in files)
CREATE TABLE IF NOT EXISTS notes (
    -- ... existing ...
);

-- ... existing tables ...

-- Tags table (added in migration 002)
CREATE TABLE IF NOT EXISTS tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);

-- Note-Tag junction table (added in migration 002)
CREATE TABLE IF NOT EXISTS note_tags (
    note_id INTEGER NOT NULL,
    tag_id INTEGER NOT NULL,
    PRIMARY KEY (note_id, tag_id),
    FOREIGN KEY (note_id) REFERENCES notes(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_note_tags_note ON note_tags(note_id);
CREATE INDEX IF NOT EXISTS idx_note_tags_tag ON note_tags(tag_id);
"""

Summary of Architectural Decisions

Question	Decision	Implementation
Q1: Chicken-and-egg problem	SCHEMA_SQL is target state, auto-skip migrations on fresh DBs	Fresh database detection in `run_migrations()`
Q2: schema_migrations location	Only in migrations.py, NOT in SCHEMA_SQL	`create_migrations_table()` creates it
Q3: ALTER TABLE idempotency	Accept non-idempotency, rely on tracking	Migration tracking prevents re-runs
Q4: Filename validation	Flexible: `*.sql` + alphanumeric sort	No strict validation, warn if off-pattern
Q5: Existing database transition	Automatic via fresh DB detection	Check `code_verifier` existence heuristic
Q6: Column helpers	Provide but don't use by default	Include in `migrations.py` for advanced use
Q7: SCHEMA_SQL purpose	Complete current state (target schema)	Update SCHEMA_SQL with every schema change

Implementation Specification Updates

Modified: `starpunk/migrations.py`

Add fresh database detection:

def is_schema_current(conn):
    """
    Check if database schema is current (matches SCHEMA_SQL)

    Uses heuristic: Check for presence of latest schema features
    Currently checks for code_verifier column in auth_state table

    Args:
        conn: SQLite connection

    Returns:
        bool: True if schema appears current, False if legacy
    """
    try:
        cursor = conn.execute("PRAGMA table_info(auth_state)")
        columns = [row[1] for row in cursor.fetchall()]
        return 'code_verifier' in columns
    except sqlite3.OperationalError:
        # Table doesn't exist - definitely not current
        return False


def table_exists(conn, table_name):
    """Check if table exists in database"""
    cursor = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='table' AND name=?",
        (table_name,)
    )
    return cursor.fetchone() is not None


def column_exists(conn, table_name, column_name):
    """Check if column exists in table"""
    try:
        cursor = conn.execute(f"PRAGMA table_info({table_name})")
        columns = [row[1] for row in cursor.fetchall()]
        return column_name in columns
    except sqlite3.OperationalError:
        return False


def index_exists(conn, index_name):
    """Check if index exists in database"""
    cursor = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='index' AND name=?",
        (index_name,)
    )
    return cursor.fetchone() is not None


def run_migrations(db_path, logger=None):
    """
    Run all pending database migrations

    Fresh Database Behavior:
    - If schema_migrations table is empty AND schema is current
    - Marks all migrations as applied (skip execution)
    - This handles databases created with current SCHEMA_SQL

    Existing Database Behavior:
    - Applies only pending migrations
    - Migrations already in schema_migrations are skipped

    Args:
        db_path: Path to SQLite database file
        logger: Optional logger for output

    Raises:
        MigrationError: If any migration fails to apply
    """
    if logger is None:
        logger = logging.getLogger(__name__)

    # Determine migrations directory
    migrations_dir = Path(__file__).parent.parent / "migrations"

    if not migrations_dir.exists():
        logger.warning(f"Migrations directory not found: {migrations_dir}")
        return

    # Connect to database
    conn = sqlite3.connect(db_path)

    try:
        # Ensure migrations tracking table exists
        create_migrations_table(conn)

        # Check if this is a fresh database with current schema
        cursor = conn.execute("SELECT COUNT(*) FROM schema_migrations")
        migration_count = cursor.fetchone()[0]

        # Discover migration files
        migration_files = discover_migration_files(migrations_dir)

        if not migration_files:
            logger.info("No migration files found")
            return

        # Fresh database detection
        if migration_count == 0:
            if is_schema_current(conn):
                # Schema is current - mark all migrations as applied
                for migration_name, _ in migration_files:
                    conn.execute(
                        "INSERT INTO schema_migrations (migration_name) VALUES (?)",
                        (migration_name,)
                    )
                conn.commit()
                logger.info(
                    f"Fresh database detected: marked {len(migration_files)} "
                    f"migrations as applied (schema already current)"
                )
                return
            else:
                logger.info("Legacy database detected: applying all migrations")

        # Get already-applied migrations
        applied = get_applied_migrations(conn)

        # Apply pending migrations
        pending_count = 0
        for migration_name, migration_path in migration_files:
            if migration_name not in applied:
                apply_migration(conn, migration_name, migration_path, logger)
                pending_count += 1

        # Summary
        total_count = len(migration_files)
        if pending_count > 0:
            logger.info(
                f"Migrations complete: {pending_count} applied, "
                f"{total_count} total"
            )
        else:
            logger.info(f"All migrations up to date ({total_count} total)")

    except MigrationError:
        raise

    except Exception as e:
        error_msg = f"Migration system error: {e}"
        logger.error(error_msg)
        raise MigrationError(error_msg)

    finally:
        conn.close()

Modified: `SCHEMA_SQL` Maintenance

SCHEMA_SQL does NOT change - it already includes code_verifier (correct).

Rule for Future Changes:

Add new schema elements to SCHEMA_SQL
Create corresponding migration file
Migration contains same SQL as SCHEMA_SQL addition
Fresh installs get it from SCHEMA_SQL
Existing installs get it from migration

Migration 001 Status

No changes needed to 001_add_code_verifier_to_auth_state.sql

Behavior:

Fresh databases: Never executes (marked as applied via fresh DB detection)
Legacy databases (before PKCE): Executes successfully (column doesn't exist)
Mid-version databases (after PKCE, before migrations): Never executes (fresh DB detection)

This is correct and requires no changes.

What We Learned

Simplicity Wins: 150 lines beats thousands in Alembic for our use case
Container Requirements: Modern deployment requires automatic initialization
SQLite Is Sufficient: No need for complex migration frameworks
Sequential Works: Numbering beats timestamps for small teams
Forward-Only Is OK: Rollback capability rarely needed in practice
Fresh DB Detection Solves Bootstrap: Heuristic check prevents chicken-and-egg problems
SCHEMA_SQL as Target State: Clearest mental model for developers
Migration Tracking Is Primary Safety: SQL idempotency is secondary

Decided: 2025-11-19 Updated: 2025-11-19 (Developer Q&A section added) Author: StarPunk Architect Implements: Automatic database migration system Version Impact: MINOR increment recommended

51 KiB Raw Blame History

ADR-020: Automatic Database Migration System

Status

Context

Current State

Pain Points

Requirements

Decision

Core Components

Migration File Format

Migration Tracking Schema

Migration Discovery and Execution

SQLite Transaction Handling

Integration Points

Error Handling

Logging Strategy

Developer Workflow

Rationale

Why Sequential Numbers Instead of Timestamps?

Why Run on Startup Instead of Manual Command?

Why SQLite Transaction Per Migration?

Why No Down Migrations?

Why In-Application Instead of External Tool?

Consequences

Positive

Negative

Neutral

Implementation Specification

File Structure

New File: starpunk/migrations.py

Modified File: starpunk/database.py

Migration Tracking Table SQL

Example Migration File

Testing Strategy

Unit Tests

Integration Tests

Manual Testing

Migration Management Guide

Adding a New Migration

Handling Migration Conflicts

Rolling Back a Migration

Best Practices for Writing Migrations

Version Impact

Compliance

Project Standards

Security Considerations

IndieWeb Compatibility

References

Migration System Patterns

SQLite Documentation

Internal Documentation

Developer Questions & Architectural Responses

Q1: SCHEMA_SQL Chicken-and-Egg Problem

Q2: schema_migrations Table Location

Q3: ALTER TABLE Idempotency

Q4: Migration Filename Validation

Q5: Existing Database Migration Path

Q6: Column Existence Helpers

Q7: SCHEMA_SQL Purpose Clarification

Summary of Architectural Decisions

Implementation Specification Updates

Modified: starpunk/migrations.py

Modified: SCHEMA_SQL Maintenance

Migration 001 Status

What We Learned

51 KiB

Raw Blame History

New File: `starpunk/migrations.py`

Modified File: `starpunk/database.py`

Modified: `starpunk/migrations.py`

Modified: `SCHEMA_SQL` Maintenance