385 lines
11 KiB
Markdown
385 lines
11 KiB
Markdown
# ADR-004: File-Based Note Storage Architecture
|
|
|
|
## Status
|
|
Accepted
|
|
|
|
## Context
|
|
The user explicitly requires notes to be stored as files on disk rather than as database records. This is critical for:
|
|
1. Data portability - notes can be backed up, moved, and read without the application
|
|
2. User ownership - direct access to content in human-readable format
|
|
3. Simplicity - text files are the simplest storage mechanism
|
|
4. Future-proofing - markdown files will be readable forever
|
|
|
|
However, we also need SQLite for:
|
|
- Metadata (timestamps, slugs, published status)
|
|
- Authentication tokens
|
|
- Fast querying and indexing
|
|
- Relational data
|
|
|
|
The challenge is designing how file-based storage and database metadata work together efficiently.
|
|
|
|
## Decision
|
|
|
|
### Hybrid Architecture: Files + Database Metadata
|
|
|
|
**Notes Content**: Stored as markdown files on disk
|
|
**Notes Metadata**: Stored in SQLite database
|
|
**Source of Truth**: Files are authoritative for content; database is authoritative for metadata
|
|
|
|
### File Storage Strategy
|
|
|
|
#### Directory Structure
|
|
```
|
|
data/
|
|
├── notes/
|
|
│ ├── 2024/
|
|
│ │ ├── 11/
|
|
│ │ │ ├── my-first-note.md
|
|
│ │ │ └── another-note.md
|
|
│ │ └── 12/
|
|
│ │ └── december-note.md
|
|
│ └── 2025/
|
|
│ └── 01/
|
|
│ └── new-year-note.md
|
|
├── starpunk.db # SQLite database
|
|
└── .backups/ # Optional backup directory
|
|
```
|
|
|
|
#### File Naming Convention
|
|
- **Format**: `{slug}.md`
|
|
- **Slug rules**: lowercase, alphanumeric, hyphens only, no spaces
|
|
- **Example**: `my-first-note.md`
|
|
- **Uniqueness**: Enforced by filesystem (can't have two files with same name in same directory)
|
|
|
|
#### File Organization
|
|
- **Pattern**: Year/Month subdirectories (`YYYY/MM/`)
|
|
- **Rationale**:
|
|
- Keeps directories manageable (max ~30 files per month)
|
|
- Easy chronological browsing
|
|
- Matches natural mental model
|
|
- Scalable to thousands of notes
|
|
- **Example path**: `data/notes/2024/11/my-first-note.md`
|
|
|
|
### Database Schema
|
|
|
|
```sql
|
|
CREATE TABLE notes (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
slug TEXT UNIQUE NOT NULL, -- URL identifier
|
|
file_path TEXT UNIQUE NOT NULL, -- Relative path from data/notes/
|
|
published BOOLEAN DEFAULT 0, -- Publication status
|
|
created_at TIMESTAMP NOT NULL, -- Creation timestamp
|
|
updated_at TIMESTAMP NOT NULL, -- Last modification timestamp
|
|
content_hash TEXT -- SHA-256 of file content for change detection
|
|
);
|
|
|
|
CREATE INDEX idx_notes_created_at ON notes(created_at DESC);
|
|
CREATE INDEX idx_notes_published ON notes(published);
|
|
CREATE INDEX idx_notes_slug ON notes(slug);
|
|
```
|
|
|
|
### File Format
|
|
|
|
#### Markdown File Structure
|
|
```markdown
|
|
[Content of the note in markdown format]
|
|
```
|
|
|
|
**That's it.** No frontmatter, no metadata in file. Keep it pure.
|
|
|
|
**Rationale**:
|
|
- Maximum portability
|
|
- Readable by any markdown editor
|
|
- No custom parsing required
|
|
- Metadata belongs in database (timestamps, slugs, etc.)
|
|
- User sees just their content when opening file
|
|
|
|
#### Optional Future Enhancement (V2+)
|
|
If frontmatter becomes necessary, use standard YAML:
|
|
```markdown
|
|
---
|
|
title: Optional Title
|
|
tags: tag1, tag2
|
|
---
|
|
[Content here]
|
|
```
|
|
|
|
But for V1: **NO frontmatter**.
|
|
|
|
## Rationale
|
|
|
|
### File Storage Benefits
|
|
**Simplicity Score: 10/10**
|
|
- Text files are the simplest storage
|
|
- No binary formats
|
|
- Human-readable
|
|
- Easy to backup (rsync, git, Dropbox, etc.)
|
|
|
|
**Portability Score: 10/10**
|
|
- Standard markdown format
|
|
- Readable without application
|
|
- Can be edited in any text editor
|
|
- Easy to migrate to other systems
|
|
|
|
**Ownership Score: 10/10**
|
|
- User has direct access to their content
|
|
- No vendor lock-in
|
|
- Can grep their own notes
|
|
- Backup is simple file copy
|
|
|
|
### Hybrid Approach Benefits
|
|
**Performance**: Database indexes enable fast queries
|
|
**Flexibility**: Rich metadata without cluttering files
|
|
**Integrity**: Database enforces uniqueness and relationships
|
|
**Simplicity**: Each system does what it's best at
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
- Notes are portable markdown files
|
|
- User can edit notes directly in filesystem if desired
|
|
- Easy backup (just copy data/ directory)
|
|
- Database provides fast metadata queries
|
|
- Can rebuild database from files if needed
|
|
- Git-friendly (can version control notes)
|
|
- Maximum data ownership
|
|
|
|
### Negative
|
|
- Must keep file and database in sync
|
|
- Potential for orphaned database records
|
|
- Potential for orphaned files
|
|
- File operations are slower than database queries
|
|
- Must handle file system errors
|
|
|
|
### Mitigation Strategies
|
|
|
|
#### Sync Strategy
|
|
1. **On note creation**: Write file FIRST, then database record
|
|
2. **On note update**: Update file FIRST, then database record (update timestamp, content_hash)
|
|
3. **On note delete**: Mark as deleted in database, optionally move file to .trash/
|
|
4. **On startup**: Optional integrity check to detect orphans
|
|
|
|
#### Orphan Detection
|
|
```python
|
|
# Pseudo-code for integrity check
|
|
def check_integrity():
|
|
# Find database records without files
|
|
for note in database.all_notes():
|
|
if not file_exists(note.file_path):
|
|
log_error(f"Orphaned database record: {note.slug}")
|
|
|
|
# Find files without database records
|
|
for file in filesystem.all_markdown_files():
|
|
if not database.has_note(file_path=file):
|
|
log_error(f"Orphaned file: {file}")
|
|
```
|
|
|
|
#### Content Hash Strategy
|
|
- Calculate SHA-256 hash of file content on write
|
|
- Store hash in database
|
|
- On read, can verify content hasn't been externally modified
|
|
- Enables change detection and cache invalidation
|
|
|
|
## Data Flow Patterns
|
|
|
|
### Creating a Note
|
|
|
|
1. Generate slug from content or timestamp
|
|
2. Determine file path: `data/notes/{YYYY}/{MM}/{slug}.md`
|
|
3. Create directories if needed
|
|
4. Write markdown content to file
|
|
5. Calculate content hash
|
|
6. Insert record into database
|
|
7. Return success
|
|
|
|
**Transaction Safety**: If database insert fails, delete file and raise error
|
|
|
|
### Reading a Note
|
|
|
|
**By Slug**:
|
|
1. Query database for file_path by slug
|
|
2. Read file content from disk
|
|
3. Return content + metadata
|
|
|
|
**For List**:
|
|
1. Query database for metadata (sorted, filtered)
|
|
2. Optionally read file content for each note
|
|
3. Return list with metadata and content
|
|
|
|
### Updating a Note
|
|
|
|
1. Query database for existing file_path
|
|
2. Write new content to file (atomic write to temp, then rename)
|
|
3. Calculate new content hash
|
|
4. Update database record (timestamp, content_hash)
|
|
5. Return success
|
|
|
|
**Transaction Safety**: Keep backup of original file until database update succeeds
|
|
|
|
### Deleting a Note
|
|
|
|
**Soft Delete (Recommended)**:
|
|
1. Update database: set `deleted_at` timestamp
|
|
2. Optionally move file to `.trash/` subdirectory
|
|
3. Return success
|
|
|
|
**Hard Delete**:
|
|
1. Delete database record
|
|
2. Delete file from filesystem
|
|
3. Return success
|
|
|
|
## File System Operations
|
|
|
|
### Atomic Writes
|
|
```python
|
|
# Pseudo-code for atomic file write
|
|
def write_note_safely(path, content):
|
|
temp_path = f"{path}.tmp"
|
|
write(temp_path, content)
|
|
atomic_rename(temp_path, path) # Atomic on POSIX systems
|
|
```
|
|
|
|
### Directory Creation
|
|
```python
|
|
# Ensure directory exists before writing
|
|
def ensure_note_directory(year, month):
|
|
path = f"data/notes/{year}/{month}"
|
|
makedirs(path, exist_ok=True)
|
|
return path
|
|
```
|
|
|
|
### Slug Generation
|
|
```python
|
|
# Generate URL-safe slug
|
|
def generate_slug(content=None, timestamp=None):
|
|
if content:
|
|
# Extract first few words, normalize
|
|
words = extract_first_words(content, max=5)
|
|
slug = normalize(words) # lowercase, hyphens, no special chars
|
|
else:
|
|
# Fallback: timestamp-based
|
|
slug = timestamp.strftime("%Y%m%d-%H%M%S")
|
|
|
|
# Ensure uniqueness
|
|
if database.slug_exists(slug):
|
|
slug = f"{slug}-{random_suffix()}"
|
|
|
|
return slug
|
|
```
|
|
|
|
## Backup Strategy
|
|
|
|
### Simple Backup
|
|
```bash
|
|
# User can backup with simple copy
|
|
cp -r data/ backup/
|
|
|
|
# Or with rsync
|
|
rsync -av data/ backup/
|
|
|
|
# Or with git
|
|
cd data/ && git add . && git commit -m "Backup"
|
|
```
|
|
|
|
### Restore Strategy
|
|
1. Copy data/ directory to new location
|
|
2. Application reads database
|
|
3. If database missing or corrupt, rebuild from files:
|
|
```python
|
|
def rebuild_database_from_files():
|
|
for file_path in glob("data/notes/**/*.md"):
|
|
content = read_file(file_path)
|
|
metadata = extract_metadata_from_path(file_path)
|
|
database.insert_note(
|
|
slug=metadata.slug,
|
|
file_path=file_path,
|
|
created_at=file_stat.created,
|
|
updated_at=file_stat.modified,
|
|
content_hash=hash(content)
|
|
)
|
|
```
|
|
|
|
## Standards Compliance
|
|
|
|
### Markdown Standard
|
|
- CommonMark specification
|
|
- No custom extensions in V1
|
|
- Standard markdown processors can read files
|
|
|
|
### File System Compatibility
|
|
- ASCII-safe filenames
|
|
- No special characters in paths
|
|
- Maximum path length under 255 characters
|
|
- POSIX-compatible directory structure
|
|
|
|
## Alternatives Considered
|
|
|
|
### All-Database Storage (Rejected)
|
|
- **Simplicity**: 8/10 - Simpler code, single source of truth
|
|
- **Portability**: 2/10 - Requires database export
|
|
- **Ownership**: 3/10 - User doesn't have direct access
|
|
- **Verdict**: Violates user requirement for file-based storage
|
|
|
|
### Flat File Directory (Rejected)
|
|
```
|
|
data/notes/
|
|
├── note-1.md
|
|
├── note-2.md
|
|
├── note-3.md
|
|
...
|
|
├── note-9999.md
|
|
```
|
|
- **Simplicity**: 10/10 - Simplest possible structure
|
|
- **Scalability**: 3/10 - Thousands of files in one directory is slow
|
|
- **Verdict**: Not scalable, poor performance with many notes
|
|
|
|
### Git-Based Storage (Rejected for V1)
|
|
- **Simplicity**: 6/10 - Requires git integration
|
|
- **Portability**: 9/10 - Excellent versioning
|
|
- **Performance**: 7/10 - Git operations have overhead
|
|
- **Verdict**: Interesting for V2, but adds complexity to V1
|
|
|
|
### Frontmatter in Files (Rejected for V1)
|
|
```markdown
|
|
---
|
|
slug: my-note
|
|
created: 2024-11-18
|
|
published: true
|
|
---
|
|
Note content here
|
|
```
|
|
- **Simplicity**: 7/10 - Requires YAML parsing
|
|
- **Portability**: 8/10 - Common pattern, but not pure markdown
|
|
- **Single Source**: 10/10 - All data in one place
|
|
- **Verdict**: Deferred to V2; V1 keeps files pure
|
|
|
|
### JSON Metadata Sidecar (Rejected)
|
|
```
|
|
notes/
|
|
├── my-note.md
|
|
├── my-note.json # Metadata
|
|
```
|
|
- **Simplicity**: 6/10 - Doubles number of files
|
|
- **Portability**: 7/10 - Markdown still clean, but extra files
|
|
- **Sync Issues**: 5/10 - Must keep two files in sync
|
|
- **Verdict**: Database metadata is cleaner
|
|
|
|
## Implementation Checklist
|
|
|
|
- [ ] Create data/notes directory structure on initialization
|
|
- [ ] Implement slug generation algorithm
|
|
- [ ] Implement atomic file write operations
|
|
- [ ] Implement content hash calculation
|
|
- [ ] Create database schema with indexes
|
|
- [ ] Implement sync between files and database
|
|
- [ ] Implement orphan detection (optional for V1)
|
|
- [ ] Add file system error handling
|
|
- [ ] Create backup documentation for users
|
|
- [ ] Test with thousands of notes for performance
|
|
|
|
## References
|
|
- CommonMark Spec: https://spec.commonmark.org/
|
|
- POSIX File Operations: https://pubs.opengroup.org/onlinepubs/9699919799/
|
|
- File System Best Practices: https://www.pathname.com/fhs/
|
|
- Atomic File Operations: https://lwn.net/Articles/457667/
|