# ADR-004: File-Based Note Storage Architecture ## Status Accepted ## Context The user explicitly requires notes to be stored as files on disk rather than as database records. This is critical for: 1. Data portability - notes can be backed up, moved, and read without the application 2. User ownership - direct access to content in human-readable format 3. Simplicity - text files are the simplest storage mechanism 4. Future-proofing - markdown files will be readable forever However, we also need SQLite for: - Metadata (timestamps, slugs, published status) - Authentication tokens - Fast querying and indexing - Relational data The challenge is designing how file-based storage and database metadata work together efficiently. ## Decision ### Hybrid Architecture: Files + Database Metadata **Notes Content**: Stored as markdown files on disk **Notes Metadata**: Stored in SQLite database **Source of Truth**: Files are authoritative for content; database is authoritative for metadata ### File Storage Strategy #### Directory Structure ``` data/ ├── notes/ │ ├── 2024/ │ │ ├── 11/ │ │ │ ├── my-first-note.md │ │ │ └── another-note.md │ │ └── 12/ │ │ └── december-note.md │ └── 2025/ │ └── 01/ │ └── new-year-note.md ├── starpunk.db # SQLite database └── .backups/ # Optional backup directory ``` #### File Naming Convention - **Format**: `{slug}.md` - **Slug rules**: lowercase, alphanumeric, hyphens only, no spaces - **Example**: `my-first-note.md` - **Uniqueness**: Enforced by filesystem (can't have two files with same name in same directory) #### File Organization - **Pattern**: Year/Month subdirectories (`YYYY/MM/`) - **Rationale**: - Keeps directories manageable (max ~30 files per month) - Easy chronological browsing - Matches natural mental model - Scalable to thousands of notes - **Example path**: `data/notes/2024/11/my-first-note.md` ### Database Schema ```sql CREATE TABLE notes ( id INTEGER PRIMARY KEY AUTOINCREMENT, slug TEXT UNIQUE NOT NULL, -- URL identifier file_path TEXT UNIQUE NOT NULL, -- Relative path from data/notes/ published BOOLEAN DEFAULT 0, -- Publication status created_at TIMESTAMP NOT NULL, -- Creation timestamp updated_at TIMESTAMP NOT NULL, -- Last modification timestamp content_hash TEXT -- SHA-256 of file content for change detection ); CREATE INDEX idx_notes_created_at ON notes(created_at DESC); CREATE INDEX idx_notes_published ON notes(published); CREATE INDEX idx_notes_slug ON notes(slug); ``` ### File Format #### Markdown File Structure ```markdown [Content of the note in markdown format] ``` **That's it.** No frontmatter, no metadata in file. Keep it pure. **Rationale**: - Maximum portability - Readable by any markdown editor - No custom parsing required - Metadata belongs in database (timestamps, slugs, etc.) - User sees just their content when opening file #### Optional Future Enhancement (V2+) If frontmatter becomes necessary, use standard YAML: ```markdown --- title: Optional Title tags: tag1, tag2 --- [Content here] ``` But for V1: **NO frontmatter**. ## Rationale ### File Storage Benefits **Simplicity Score: 10/10** - Text files are the simplest storage - No binary formats - Human-readable - Easy to backup (rsync, git, Dropbox, etc.) **Portability Score: 10/10** - Standard markdown format - Readable without application - Can be edited in any text editor - Easy to migrate to other systems **Ownership Score: 10/10** - User has direct access to their content - No vendor lock-in - Can grep their own notes - Backup is simple file copy ### Hybrid Approach Benefits **Performance**: Database indexes enable fast queries **Flexibility**: Rich metadata without cluttering files **Integrity**: Database enforces uniqueness and relationships **Simplicity**: Each system does what it's best at ## Consequences ### Positive - Notes are portable markdown files - User can edit notes directly in filesystem if desired - Easy backup (just copy data/ directory) - Database provides fast metadata queries - Can rebuild database from files if needed - Git-friendly (can version control notes) - Maximum data ownership ### Negative - Must keep file and database in sync - Potential for orphaned database records - Potential for orphaned files - File operations are slower than database queries - Must handle file system errors ### Mitigation Strategies #### Sync Strategy 1. **On note creation**: Write file FIRST, then database record 2. **On note update**: Update file FIRST, then database record (update timestamp, content_hash) 3. **On note delete**: Mark as deleted in database, optionally move file to .trash/ 4. **On startup**: Optional integrity check to detect orphans #### Orphan Detection ```python # Pseudo-code for integrity check def check_integrity(): # Find database records without files for note in database.all_notes(): if not file_exists(note.file_path): log_error(f"Orphaned database record: {note.slug}") # Find files without database records for file in filesystem.all_markdown_files(): if not database.has_note(file_path=file): log_error(f"Orphaned file: {file}") ``` #### Content Hash Strategy - Calculate SHA-256 hash of file content on write - Store hash in database - On read, can verify content hasn't been externally modified - Enables change detection and cache invalidation ## Data Flow Patterns ### Creating a Note 1. Generate slug from content or timestamp 2. Determine file path: `data/notes/{YYYY}/{MM}/{slug}.md` 3. Create directories if needed 4. Write markdown content to file 5. Calculate content hash 6. Insert record into database 7. Return success **Transaction Safety**: If database insert fails, delete file and raise error ### Reading a Note **By Slug**: 1. Query database for file_path by slug 2. Read file content from disk 3. Return content + metadata **For List**: 1. Query database for metadata (sorted, filtered) 2. Optionally read file content for each note 3. Return list with metadata and content ### Updating a Note 1. Query database for existing file_path 2. Write new content to file (atomic write to temp, then rename) 3. Calculate new content hash 4. Update database record (timestamp, content_hash) 5. Return success **Transaction Safety**: Keep backup of original file until database update succeeds ### Deleting a Note **Soft Delete (Recommended)**: 1. Update database: set `deleted_at` timestamp 2. Optionally move file to `.trash/` subdirectory 3. Return success **Hard Delete**: 1. Delete database record 2. Delete file from filesystem 3. Return success ## File System Operations ### Atomic Writes ```python # Pseudo-code for atomic file write def write_note_safely(path, content): temp_path = f"{path}.tmp" write(temp_path, content) atomic_rename(temp_path, path) # Atomic on POSIX systems ``` ### Directory Creation ```python # Ensure directory exists before writing def ensure_note_directory(year, month): path = f"data/notes/{year}/{month}" makedirs(path, exist_ok=True) return path ``` ### Slug Generation ```python # Generate URL-safe slug def generate_slug(content=None, timestamp=None): if content: # Extract first few words, normalize words = extract_first_words(content, max=5) slug = normalize(words) # lowercase, hyphens, no special chars else: # Fallback: timestamp-based slug = timestamp.strftime("%Y%m%d-%H%M%S") # Ensure uniqueness if database.slug_exists(slug): slug = f"{slug}-{random_suffix()}" return slug ``` ## Backup Strategy ### Simple Backup ```bash # User can backup with simple copy cp -r data/ backup/ # Or with rsync rsync -av data/ backup/ # Or with git cd data/ && git add . && git commit -m "Backup" ``` ### Restore Strategy 1. Copy data/ directory to new location 2. Application reads database 3. If database missing or corrupt, rebuild from files: ```python def rebuild_database_from_files(): for file_path in glob("data/notes/**/*.md"): content = read_file(file_path) metadata = extract_metadata_from_path(file_path) database.insert_note( slug=metadata.slug, file_path=file_path, created_at=file_stat.created, updated_at=file_stat.modified, content_hash=hash(content) ) ``` ## Standards Compliance ### Markdown Standard - CommonMark specification - No custom extensions in V1 - Standard markdown processors can read files ### File System Compatibility - ASCII-safe filenames - No special characters in paths - Maximum path length under 255 characters - POSIX-compatible directory structure ## Alternatives Considered ### All-Database Storage (Rejected) - **Simplicity**: 8/10 - Simpler code, single source of truth - **Portability**: 2/10 - Requires database export - **Ownership**: 3/10 - User doesn't have direct access - **Verdict**: Violates user requirement for file-based storage ### Flat File Directory (Rejected) ``` data/notes/ ├── note-1.md ├── note-2.md ├── note-3.md ... ├── note-9999.md ``` - **Simplicity**: 10/10 - Simplest possible structure - **Scalability**: 3/10 - Thousands of files in one directory is slow - **Verdict**: Not scalable, poor performance with many notes ### Git-Based Storage (Rejected for V1) - **Simplicity**: 6/10 - Requires git integration - **Portability**: 9/10 - Excellent versioning - **Performance**: 7/10 - Git operations have overhead - **Verdict**: Interesting for V2, but adds complexity to V1 ### Frontmatter in Files (Rejected for V1) ```markdown --- slug: my-note created: 2024-11-18 published: true --- Note content here ``` - **Simplicity**: 7/10 - Requires YAML parsing - **Portability**: 8/10 - Common pattern, but not pure markdown - **Single Source**: 10/10 - All data in one place - **Verdict**: Deferred to V2; V1 keeps files pure ### JSON Metadata Sidecar (Rejected) ``` notes/ ├── my-note.md ├── my-note.json # Metadata ``` - **Simplicity**: 6/10 - Doubles number of files - **Portability**: 7/10 - Markdown still clean, but extra files - **Sync Issues**: 5/10 - Must keep two files in sync - **Verdict**: Database metadata is cleaner ## Implementation Checklist - [ ] Create data/notes directory structure on initialization - [ ] Implement slug generation algorithm - [ ] Implement atomic file write operations - [ ] Implement content hash calculation - [ ] Create database schema with indexes - [ ] Implement sync between files and database - [ ] Implement orphan detection (optional for V1) - [ ] Add file system error handling - [ ] Create backup documentation for users - [ ] Test with thousands of notes for performance ## References - CommonMark Spec: https://spec.commonmark.org/ - POSIX File Operations: https://pubs.opengroup.org/onlinepubs/9699919799/ - File System Best Practices: https://www.pathname.com/fhs/ - Atomic File Operations: https://lwn.net/Articles/457667/