Implements tag/category system backend following microformats2 p-category specification. Database changes: - Migration 008: Add tags and note_tags tables - Normalized tag storage (case-insensitive lookup, display name preserved) - Indexes for performance New module: - starpunk/tags.py: Tag management functions - normalize_tag: Normalize tag strings - get_or_create_tag: Get or create tag records - add_tags_to_note: Associate tags with notes (replaces existing) - get_note_tags: Retrieve note tags (alphabetically ordered) - get_tag_by_name: Lookup tag by normalized name - get_notes_by_tag: Get all notes with specific tag - parse_tag_input: Parse comma-separated tag input Model updates: - Note.tags property (lazy-loaded, prefer pre-loading in routes) - Note.to_dict() add include_tags parameter CRUD updates: - create_note() accepts tags parameter - update_note() accepts tags parameter (None = no change, [] = remove all) Micropub integration: - Pass tags to create_note() (tags already extracted by extract_tags()) - Return tags in q=source response Per design doc: docs/design/v1.3.0/microformats-tags-design.md Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
238 lines
6.4 KiB
Markdown
238 lines
6.4 KiB
Markdown
# Migration Race Condition Fix - Quick Implementation Reference
|
|
|
|
## Implementation Checklist
|
|
|
|
### Code Changes - `/home/phil/Projects/starpunk/starpunk/migrations.py`
|
|
|
|
```python
|
|
# 1. Add imports at top
|
|
import time
|
|
import random
|
|
|
|
# 2. Replace entire run_migrations function (lines 304-462)
|
|
# See full implementation in migration-race-condition-fix-implementation.md
|
|
|
|
# Key patterns to implement:
|
|
|
|
# A. Retry loop structure
|
|
max_retries = 10
|
|
retry_count = 0
|
|
base_delay = 0.1
|
|
start_time = time.time()
|
|
max_total_time = 120 # 2 minute absolute max
|
|
|
|
while retry_count < max_retries and (time.time() - start_time) < max_total_time:
|
|
conn = None # NEW connection each iteration
|
|
try:
|
|
conn = sqlite3.connect(db_path, timeout=30.0)
|
|
conn.execute("BEGIN IMMEDIATE") # Lock acquisition
|
|
# ... migration logic ...
|
|
conn.commit()
|
|
return # Success
|
|
except sqlite3.OperationalError as e:
|
|
if "database is locked" in str(e).lower():
|
|
retry_count += 1
|
|
if retry_count < max_retries:
|
|
# Exponential backoff with jitter
|
|
delay = base_delay * (2 ** retry_count) + random.uniform(0, 0.1)
|
|
# Graduated logging
|
|
if retry_count <= 3:
|
|
logger.debug(f"Retry {retry_count}/{max_retries}")
|
|
elif retry_count <= 7:
|
|
logger.info(f"Retry {retry_count}/{max_retries}")
|
|
else:
|
|
logger.warning(f"Retry {retry_count}/{max_retries}")
|
|
time.sleep(delay)
|
|
continue
|
|
finally:
|
|
if conn:
|
|
try:
|
|
conn.close()
|
|
except:
|
|
pass
|
|
|
|
# B. Error handling pattern
|
|
except Exception as e:
|
|
try:
|
|
conn.rollback()
|
|
except Exception as rollback_error:
|
|
logger.critical(f"FATAL: Rollback failed: {rollback_error}")
|
|
raise SystemExit(1)
|
|
raise MigrationError(f"Migration failed: {e}")
|
|
|
|
# C. Final error message
|
|
raise MigrationError(
|
|
f"Failed to acquire migration lock after {max_retries} attempts over {elapsed:.1f}s. "
|
|
f"Possible causes:\n"
|
|
f"1. Another process is stuck in migration (check logs)\n"
|
|
f"2. Database file permissions issue\n"
|
|
f"3. Disk I/O problems\n"
|
|
f"Action: Restart container with single worker to diagnose"
|
|
)
|
|
```
|
|
|
|
### Testing Requirements
|
|
|
|
#### 1. Unit Test File: `test_migration_race_condition.py`
|
|
```python
|
|
import multiprocessing
|
|
from multiprocessing import Barrier, Process
|
|
import time
|
|
|
|
def test_concurrent_migrations():
|
|
"""Test 4 workers starting simultaneously"""
|
|
barrier = Barrier(4)
|
|
|
|
def worker(worker_id):
|
|
barrier.wait() # Synchronize start
|
|
from starpunk import create_app
|
|
app = create_app()
|
|
return True
|
|
|
|
with multiprocessing.Pool(4) as pool:
|
|
results = pool.map(worker, range(4))
|
|
|
|
assert all(results), "Some workers failed"
|
|
|
|
def test_lock_retry():
|
|
"""Test retry logic with mock"""
|
|
with patch('sqlite3.connect') as mock:
|
|
mock.side_effect = [
|
|
sqlite3.OperationalError("database is locked"),
|
|
sqlite3.OperationalError("database is locked"),
|
|
MagicMock() # Success on 3rd try
|
|
]
|
|
run_migrations(db_path)
|
|
assert mock.call_count == 3
|
|
```
|
|
|
|
#### 2. Integration Test: `test_integration.sh`
|
|
```bash
|
|
#!/bin/bash
|
|
# Test with actual gunicorn
|
|
|
|
# Clean start
|
|
rm -f test.db
|
|
|
|
# Start gunicorn with 4 workers
|
|
timeout 10 gunicorn --workers 4 --bind 127.0.0.1:8001 app:app &
|
|
PID=$!
|
|
|
|
# Wait for startup
|
|
sleep 3
|
|
|
|
# Check if running
|
|
if ! kill -0 $PID 2>/dev/null; then
|
|
echo "FAILED: Gunicorn crashed"
|
|
exit 1
|
|
fi
|
|
|
|
# Check health endpoint
|
|
curl -f http://127.0.0.1:8001/health || exit 1
|
|
|
|
# Cleanup
|
|
kill $PID
|
|
|
|
echo "SUCCESS: All workers started without race condition"
|
|
```
|
|
|
|
#### 3. Container Test: `test_container.sh`
|
|
```bash
|
|
#!/bin/bash
|
|
# Test in container environment
|
|
|
|
# Build
|
|
podman build -t starpunk:race-test -f Containerfile .
|
|
|
|
# Run with fresh database
|
|
podman run --rm -d --name race-test \
|
|
-v $(pwd)/test-data:/data \
|
|
starpunk:race-test
|
|
|
|
# Check logs for success patterns
|
|
sleep 5
|
|
podman logs race-test | grep -E "(Applied migration|already applied by another worker)"
|
|
|
|
# Cleanup
|
|
podman stop race-test
|
|
```
|
|
|
|
### Verification Patterns in Logs
|
|
|
|
#### Successful Migration (One Worker Wins)
|
|
```
|
|
Worker 0: Applying migration: 001_initial_schema.sql
|
|
Worker 1: Database locked by another worker, retry 1/10 in 0.21s
|
|
Worker 2: Database locked by another worker, retry 1/10 in 0.23s
|
|
Worker 3: Database locked by another worker, retry 1/10 in 0.19s
|
|
Worker 0: Applied migration: 001_initial_schema.sql
|
|
Worker 1: All migrations already applied by another worker
|
|
Worker 2: All migrations already applied by another worker
|
|
Worker 3: All migrations already applied by another worker
|
|
```
|
|
|
|
#### Performance Metrics to Check
|
|
- Single worker: < 100ms total
|
|
- 4 workers: < 500ms total
|
|
- 10 workers (stress): < 2000ms total
|
|
|
|
### Rollback Plan if Issues
|
|
|
|
1. **Immediate Workaround**
|
|
```bash
|
|
# Change to single worker temporarily
|
|
gunicorn --workers 1 --bind 0.0.0.0:8000 app:app
|
|
```
|
|
|
|
2. **Revert Code**
|
|
```bash
|
|
git revert HEAD
|
|
```
|
|
|
|
3. **Emergency Patch**
|
|
```python
|
|
# In app.py temporarily
|
|
import os
|
|
if os.getenv('GUNICORN_WORKER_ID', '1') == '1':
|
|
init_db() # Only first worker runs migrations
|
|
```
|
|
|
|
### Deployment Commands
|
|
|
|
```bash
|
|
# 1. Run tests
|
|
python -m pytest test_migration_race_condition.py -v
|
|
|
|
# 2. Build container
|
|
podman build -t starpunk:v1.0.0-rc.3.1 -f Containerfile .
|
|
|
|
# 3. Tag for release
|
|
podman tag starpunk:v1.0.0-rc.3.1 git.philmade.com/starpunk:v1.0.0-rc.3.1
|
|
|
|
# 4. Push
|
|
podman push git.philmade.com/starpunk:v1.0.0-rc.3.1
|
|
|
|
# 5. Deploy
|
|
kubectl rollout restart deployment/starpunk
|
|
```
|
|
|
|
---
|
|
|
|
## Critical Points to Remember
|
|
|
|
1. **NEW CONNECTION EACH RETRY** - Don't reuse connections
|
|
2. **BEGIN IMMEDIATE** - Not EXCLUSIVE, not DEFERRED
|
|
3. **30s per attempt, 120s total max** - Two different timeouts
|
|
4. **Graduated logging** - DEBUG → INFO → WARNING based on retry count
|
|
5. **Test at multiple levels** - Unit, integration, container
|
|
6. **Fresh database state** between tests
|
|
|
|
## Support
|
|
|
|
If issues arise, check:
|
|
1. `/home/phil/Projects/starpunk/docs/architecture/migration-race-condition-answers.md` - Full Q&A
|
|
2. `/home/phil/Projects/starpunk/docs/reports/migration-race-condition-fix-implementation.md` - Detailed implementation
|
|
3. SQLite lock states: `PRAGMA lock_status` during issue
|
|
|
|
---
|
|
*Quick Reference v1.0 - 2025-11-24* |