feat(tags): Add database schema and tags module (v1.3.0 Phase 1)

Implements tag/category system backend following microformats2 p-category specification. Database changes: - Migration 008: Add tags and note_tags tables - Normalized tag storage (case-insensitive lookup, display name preserved) - Indexes for performance New module: - starpunk/tags.py: Tag management functions - normalize_tag: Normalize tag strings - get_or_create_tag: Get or create tag records - add_tags_to_note: Associate tags with notes (replaces existing) - get_note_tags: Retrieve note tags (alphabetically ordered) - get_tag_by_name: Lookup tag by normalized name - get_notes_by_tag: Get all notes with specific tag - parse_tag_input: Parse comma-separated tag input Model updates: - Note.tags property (lazy-loaded, prefer pre-loading in routes) - Note.to_dict() add include_tags parameter CRUD updates: - create_note() accepts tags parameter - update_note() accepts tags parameter (None = no change, [] = remove all) Micropub integration: - Pass tags to create_note() (tags already extracted by extract_tags()) - Return tags in q=source response Per design doc: docs/design/v1.3.0/microformats-tags-design.md Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-10 11:24:23 -07:00
parent 927db4aea0
commit f10d0679da
188 changed files with 601 additions and 945 deletions
--- a/docs/design/v1.0.0/migration-failure-diagnosis-v1.0.0-rc.1.md
+++ b/docs/design/v1.0.0/migration-failure-diagnosis-v1.0.0-rc.1.md
@@ -0,0 +1,145 @@
+# Migration Failure Diagnosis - v1.0.0-rc.1
+
+## Executive Summary
+
+The v1.0.0-rc.1 container is experiencing a critical startup failure due to a **race condition in the database initialization and migration system**. The error `sqlite3.OperationalError: no such column: token_hash` occurs when `SCHEMA_SQL` attempts to create indexes for a `tokens` table structure that no longer exists after migration 002 drops and recreates it.
+
+## Root Cause Analysis
+
+### The Execution Order Problem
+
+1. **Database Initialization** (`init_db()` in `database.py:94-127`)
+   - Line 115: `conn.executescript(SCHEMA_SQL)` - Creates initial schema
+   - Line 126: `run_migrations()` - Applies pending migrations
+
+2. **SCHEMA_SQL Definition** (`database.py:46-60`)
+   - Creates `tokens` table WITH `token_hash` column (lines 46-56)
+   - Creates indexes including `idx_tokens_hash` (line 58)
+
+3. **Migration 002** (`002_secure_tokens_and_authorization_codes.sql`)
+   - Line 17: `DROP TABLE IF EXISTS tokens;`
+   - Lines 20-30: Creates NEW `tokens` table with same structure
+   - Lines 49-51: Creates indexes again
+
+### The Critical Issue
+
+For an **existing production database** (v0.9.5):
+
+1. Database already has an OLD `tokens` table (without `token_hash` column)
+2. `init_db()` runs `SCHEMA_SQL` which includes:
+   ```sql
+   CREATE TABLE IF NOT EXISTS tokens (
+       ...
+       token_hash TEXT UNIQUE NOT NULL,
+       ...
+   );
+   CREATE INDEX IF NOT EXISTS idx_tokens_hash ON tokens(token_hash);
+   ```
+3. The `CREATE TABLE IF NOT EXISTS` is a no-op (table exists)
+4. The `CREATE INDEX` tries to create an index on `token_hash` column
+5. **ERROR**: Column `token_hash` doesn't exist in the old table structure
+6. Container crashes before migrations can run
+
+### Why This Wasn't Caught Earlier
+
+- **Fresh databases** work fine - SCHEMA_SQL creates the correct structure
+- **Test environments** likely started fresh or had the new schema
+- **Production** has an existing v0.9.5 database with the old `tokens` table structure
+
+## The Schema Evolution Mismatch
+
+### Original tokens table (v0.9.5)
+The old structure likely had columns like:
+- `token` (plain text - security issue)
+- `me`
+- `client_id`
+- `scope`
+- etc.
+
+### New tokens table (v1.0.0-rc.1)
+- `token_hash` (SHA256 hash - secure)
+- Same other columns
+
+### The Problem
+SCHEMA_SQL was updated to match the POST-migration structure, but it runs BEFORE migrations. This creates an impossible situation for existing databases.
+
+## Migration System Design Flaw
+
+The current system has a fundamental ordering issue:
+
+1. **SCHEMA_SQL** should represent the INITIAL schema (v0.1.0)
+2. **Migrations** should evolve from that base
+3. **Current Reality**: SCHEMA_SQL represents the LATEST schema
+
+This works for fresh databases but fails for existing ones that need migration.
+
+## Recommended Fix
+
+### Option 1: Conditional Index Creation (Quick Fix)
+Modify SCHEMA_SQL to use conditional logic or remove problematic indexes from SCHEMA_SQL since migration 002 creates them anyway.
+
+### Option 2: Fix Execution Order (Better)
+1. Run migrations BEFORE attempting schema creation
+2. Only use SCHEMA_SQL for truly fresh databases
+
+### Option 3: Proper Schema Versioning (Best)
+1. SCHEMA_SQL should be the v0.1.0 schema
+2. All evolution happens through migrations
+3. Fresh databases run all migrations from the beginning
+
+## Immediate Workaround
+
+For the production deployment:
+
+1. **Manual intervention before upgrade**:
+   ```sql
+   -- Connect to production database
+   -- Manually add the column before v1.0.0-rc.1 starts
+   ALTER TABLE tokens ADD COLUMN token_hash TEXT;
+   ```
+
+2. **Then deploy v1.0.0-rc.1**:
+   - SCHEMA_SQL will succeed (column exists)
+   - Migration 002 will drop and recreate the table properly
+   - System will work correctly
+
+## Verification Steps
+
+1. Check production database structure:
+   ```sql
+   PRAGMA table_info(tokens);
+   ```
+
+2. Verify migration status:
+   ```sql
+   SELECT * FROM schema_migrations;
+   ```
+
+3. Test with a v0.9.5 database locally to reproduce
+
+## Long-term Architecture Recommendations
+
+1. **Separate Initial Schema from Current Schema**
+   - `INITIAL_SCHEMA_SQL` - The v0.1.0 starting point
+   - Migrations handle ALL evolution
+
+2. **Migration-First Initialization**
+   - Check for existing database
+   - Run migrations first if database exists
+   - Only apply SCHEMA_SQL to truly empty databases
+
+3. **Schema Version Tracking**
+   - Add a `schema_version` table
+   - Track the current schema version explicitly
+   - Make decisions based on version, not heuristics
+
+4. **Testing Strategy**
+   - Always test upgrades from previous production version
+   - Include migration testing in CI/CD pipeline
+   - Maintain database snapshots for each released version
+
+## Conclusion
+
+This is a **critical architectural issue** in the migration system that affects all existing production deployments. The immediate fix is straightforward, but the system needs architectural changes to prevent similar issues in future releases.
+
+The core principle violated: **SCHEMA_SQL should represent the beginning, not the end state**.