11 KiB
ADR-004: File-Based Note Storage Architecture
Status
Accepted
Context
The user explicitly requires notes to be stored as files on disk rather than as database records. This is critical for:
- Data portability - notes can be backed up, moved, and read without the application
- User ownership - direct access to content in human-readable format
- Simplicity - text files are the simplest storage mechanism
- Future-proofing - markdown files will be readable forever
However, we also need SQLite for:
- Metadata (timestamps, slugs, published status)
- Authentication tokens
- Fast querying and indexing
- Relational data
The challenge is designing how file-based storage and database metadata work together efficiently.
Decision
Hybrid Architecture: Files + Database Metadata
Notes Content: Stored as markdown files on disk Notes Metadata: Stored in SQLite database Source of Truth: Files are authoritative for content; database is authoritative for metadata
File Storage Strategy
Directory Structure
data/
├── notes/
│ ├── 2024/
│ │ ├── 11/
│ │ │ ├── my-first-note.md
│ │ │ └── another-note.md
│ │ └── 12/
│ │ └── december-note.md
│ └── 2025/
│ └── 01/
│ └── new-year-note.md
├── starpunk.db # SQLite database
└── .backups/ # Optional backup directory
File Naming Convention
- Format:
{slug}.md - Slug rules: lowercase, alphanumeric, hyphens only, no spaces
- Example:
my-first-note.md - Uniqueness: Enforced by filesystem (can't have two files with same name in same directory)
File Organization
- Pattern: Year/Month subdirectories (
YYYY/MM/) - Rationale:
- Keeps directories manageable (max ~30 files per month)
- Easy chronological browsing
- Matches natural mental model
- Scalable to thousands of notes
- Example path:
data/notes/2024/11/my-first-note.md
Database Schema
CREATE TABLE notes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
slug TEXT UNIQUE NOT NULL, -- URL identifier
file_path TEXT UNIQUE NOT NULL, -- Relative path from data/notes/
published BOOLEAN DEFAULT 0, -- Publication status
created_at TIMESTAMP NOT NULL, -- Creation timestamp
updated_at TIMESTAMP NOT NULL, -- Last modification timestamp
content_hash TEXT -- SHA-256 of file content for change detection
);
CREATE INDEX idx_notes_created_at ON notes(created_at DESC);
CREATE INDEX idx_notes_published ON notes(published);
CREATE INDEX idx_notes_slug ON notes(slug);
File Format
Markdown File Structure
[Content of the note in markdown format]
That's it. No frontmatter, no metadata in file. Keep it pure.
Rationale:
- Maximum portability
- Readable by any markdown editor
- No custom parsing required
- Metadata belongs in database (timestamps, slugs, etc.)
- User sees just their content when opening file
Optional Future Enhancement (V2+)
If frontmatter becomes necessary, use standard YAML:
---
title: Optional Title
tags: tag1, tag2
---
[Content here]
But for V1: NO frontmatter.
Rationale
File Storage Benefits
Simplicity Score: 10/10
- Text files are the simplest storage
- No binary formats
- Human-readable
- Easy to backup (rsync, git, Dropbox, etc.)
Portability Score: 10/10
- Standard markdown format
- Readable without application
- Can be edited in any text editor
- Easy to migrate to other systems
Ownership Score: 10/10
- User has direct access to their content
- No vendor lock-in
- Can grep their own notes
- Backup is simple file copy
Hybrid Approach Benefits
Performance: Database indexes enable fast queries Flexibility: Rich metadata without cluttering files Integrity: Database enforces uniqueness and relationships Simplicity: Each system does what it's best at
Consequences
Positive
- Notes are portable markdown files
- User can edit notes directly in filesystem if desired
- Easy backup (just copy data/ directory)
- Database provides fast metadata queries
- Can rebuild database from files if needed
- Git-friendly (can version control notes)
- Maximum data ownership
Negative
- Must keep file and database in sync
- Potential for orphaned database records
- Potential for orphaned files
- File operations are slower than database queries
- Must handle file system errors
Mitigation Strategies
Sync Strategy
- On note creation: Write file FIRST, then database record
- On note update: Update file FIRST, then database record (update timestamp, content_hash)
- On note delete: Mark as deleted in database, optionally move file to .trash/
- On startup: Optional integrity check to detect orphans
Orphan Detection
# Pseudo-code for integrity check
def check_integrity():
# Find database records without files
for note in database.all_notes():
if not file_exists(note.file_path):
log_error(f"Orphaned database record: {note.slug}")
# Find files without database records
for file in filesystem.all_markdown_files():
if not database.has_note(file_path=file):
log_error(f"Orphaned file: {file}")
Content Hash Strategy
- Calculate SHA-256 hash of file content on write
- Store hash in database
- On read, can verify content hasn't been externally modified
- Enables change detection and cache invalidation
Data Flow Patterns
Creating a Note
- Generate slug from content or timestamp
- Determine file path:
data/notes/{YYYY}/{MM}/{slug}.md - Create directories if needed
- Write markdown content to file
- Calculate content hash
- Insert record into database
- Return success
Transaction Safety: If database insert fails, delete file and raise error
Reading a Note
By Slug:
- Query database for file_path by slug
- Read file content from disk
- Return content + metadata
For List:
- Query database for metadata (sorted, filtered)
- Optionally read file content for each note
- Return list with metadata and content
Updating a Note
- Query database for existing file_path
- Write new content to file (atomic write to temp, then rename)
- Calculate new content hash
- Update database record (timestamp, content_hash)
- Return success
Transaction Safety: Keep backup of original file until database update succeeds
Deleting a Note
Soft Delete (Recommended):
- Update database: set
deleted_attimestamp - Optionally move file to
.trash/subdirectory - Return success
Hard Delete:
- Delete database record
- Delete file from filesystem
- Return success
File System Operations
Atomic Writes
# Pseudo-code for atomic file write
def write_note_safely(path, content):
temp_path = f"{path}.tmp"
write(temp_path, content)
atomic_rename(temp_path, path) # Atomic on POSIX systems
Directory Creation
# Ensure directory exists before writing
def ensure_note_directory(year, month):
path = f"data/notes/{year}/{month}"
makedirs(path, exist_ok=True)
return path
Slug Generation
# Generate URL-safe slug
def generate_slug(content=None, timestamp=None):
if content:
# Extract first few words, normalize
words = extract_first_words(content, max=5)
slug = normalize(words) # lowercase, hyphens, no special chars
else:
# Fallback: timestamp-based
slug = timestamp.strftime("%Y%m%d-%H%M%S")
# Ensure uniqueness
if database.slug_exists(slug):
slug = f"{slug}-{random_suffix()}"
return slug
Backup Strategy
Simple Backup
# User can backup with simple copy
cp -r data/ backup/
# Or with rsync
rsync -av data/ backup/
# Or with git
cd data/ && git add . && git commit -m "Backup"
Restore Strategy
- Copy data/ directory to new location
- Application reads database
- If database missing or corrupt, rebuild from files:
def rebuild_database_from_files(): for file_path in glob("data/notes/**/*.md"): content = read_file(file_path) metadata = extract_metadata_from_path(file_path) database.insert_note( slug=metadata.slug, file_path=file_path, created_at=file_stat.created, updated_at=file_stat.modified, content_hash=hash(content) )
Standards Compliance
Markdown Standard
- CommonMark specification
- No custom extensions in V1
- Standard markdown processors can read files
File System Compatibility
- ASCII-safe filenames
- No special characters in paths
- Maximum path length under 255 characters
- POSIX-compatible directory structure
Alternatives Considered
All-Database Storage (Rejected)
- Simplicity: 8/10 - Simpler code, single source of truth
- Portability: 2/10 - Requires database export
- Ownership: 3/10 - User doesn't have direct access
- Verdict: Violates user requirement for file-based storage
Flat File Directory (Rejected)
data/notes/
├── note-1.md
├── note-2.md
├── note-3.md
...
├── note-9999.md
- Simplicity: 10/10 - Simplest possible structure
- Scalability: 3/10 - Thousands of files in one directory is slow
- Verdict: Not scalable, poor performance with many notes
Git-Based Storage (Rejected for V1)
- Simplicity: 6/10 - Requires git integration
- Portability: 9/10 - Excellent versioning
- Performance: 7/10 - Git operations have overhead
- Verdict: Interesting for V2, but adds complexity to V1
Frontmatter in Files (Rejected for V1)
---
slug: my-note
created: 2024-11-18
published: true
---
Note content here
- Simplicity: 7/10 - Requires YAML parsing
- Portability: 8/10 - Common pattern, but not pure markdown
- Single Source: 10/10 - All data in one place
- Verdict: Deferred to V2; V1 keeps files pure
JSON Metadata Sidecar (Rejected)
notes/
├── my-note.md
├── my-note.json # Metadata
- Simplicity: 6/10 - Doubles number of files
- Portability: 7/10 - Markdown still clean, but extra files
- Sync Issues: 5/10 - Must keep two files in sync
- Verdict: Database metadata is cleaner
Implementation Checklist
- Create data/notes directory structure on initialization
- Implement slug generation algorithm
- Implement atomic file write operations
- Implement content hash calculation
- Create database schema with indexes
- Implement sync between files and database
- Implement orphan detection (optional for V1)
- Add file system error handling
- Create backup documentation for users
- Test with thousands of notes for performance
References
- CommonMark Spec: https://spec.commonmark.org/
- POSIX File Operations: https://pubs.opengroup.org/onlinepubs/9699919799/
- File System Best Practices: https://www.pathname.com/fhs/
- Atomic File Operations: https://lwn.net/Articles/457667/