that initial commit
This commit is contained in:
384
docs/decisions/ADR-004-file-based-note-storage.md
Normal file
384
docs/decisions/ADR-004-file-based-note-storage.md
Normal file
@@ -0,0 +1,384 @@
|
||||
# ADR-004: File-Based Note Storage Architecture
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The user explicitly requires notes to be stored as files on disk rather than as database records. This is critical for:
|
||||
1. Data portability - notes can be backed up, moved, and read without the application
|
||||
2. User ownership - direct access to content in human-readable format
|
||||
3. Simplicity - text files are the simplest storage mechanism
|
||||
4. Future-proofing - markdown files will be readable forever
|
||||
|
||||
However, we also need SQLite for:
|
||||
- Metadata (timestamps, slugs, published status)
|
||||
- Authentication tokens
|
||||
- Fast querying and indexing
|
||||
- Relational data
|
||||
|
||||
The challenge is designing how file-based storage and database metadata work together efficiently.
|
||||
|
||||
## Decision
|
||||
|
||||
### Hybrid Architecture: Files + Database Metadata
|
||||
|
||||
**Notes Content**: Stored as markdown files on disk
|
||||
**Notes Metadata**: Stored in SQLite database
|
||||
**Source of Truth**: Files are authoritative for content; database is authoritative for metadata
|
||||
|
||||
### File Storage Strategy
|
||||
|
||||
#### Directory Structure
|
||||
```
|
||||
data/
|
||||
├── notes/
|
||||
│ ├── 2024/
|
||||
│ │ ├── 11/
|
||||
│ │ │ ├── my-first-note.md
|
||||
│ │ │ └── another-note.md
|
||||
│ │ └── 12/
|
||||
│ │ └── december-note.md
|
||||
│ └── 2025/
|
||||
│ └── 01/
|
||||
│ └── new-year-note.md
|
||||
├── starpunk.db # SQLite database
|
||||
└── .backups/ # Optional backup directory
|
||||
```
|
||||
|
||||
#### File Naming Convention
|
||||
- **Format**: `{slug}.md`
|
||||
- **Slug rules**: lowercase, alphanumeric, hyphens only, no spaces
|
||||
- **Example**: `my-first-note.md`
|
||||
- **Uniqueness**: Enforced by filesystem (can't have two files with same name in same directory)
|
||||
|
||||
#### File Organization
|
||||
- **Pattern**: Year/Month subdirectories (`YYYY/MM/`)
|
||||
- **Rationale**:
|
||||
- Keeps directories manageable (max ~30 files per month)
|
||||
- Easy chronological browsing
|
||||
- Matches natural mental model
|
||||
- Scalable to thousands of notes
|
||||
- **Example path**: `data/notes/2024/11/my-first-note.md`
|
||||
|
||||
### Database Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE notes (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
slug TEXT UNIQUE NOT NULL, -- URL identifier
|
||||
file_path TEXT UNIQUE NOT NULL, -- Relative path from data/notes/
|
||||
published BOOLEAN DEFAULT 0, -- Publication status
|
||||
created_at TIMESTAMP NOT NULL, -- Creation timestamp
|
||||
updated_at TIMESTAMP NOT NULL, -- Last modification timestamp
|
||||
content_hash TEXT -- SHA-256 of file content for change detection
|
||||
);
|
||||
|
||||
CREATE INDEX idx_notes_created_at ON notes(created_at DESC);
|
||||
CREATE INDEX idx_notes_published ON notes(published);
|
||||
CREATE INDEX idx_notes_slug ON notes(slug);
|
||||
```
|
||||
|
||||
### File Format
|
||||
|
||||
#### Markdown File Structure
|
||||
```markdown
|
||||
[Content of the note in markdown format]
|
||||
```
|
||||
|
||||
**That's it.** No frontmatter, no metadata in file. Keep it pure.
|
||||
|
||||
**Rationale**:
|
||||
- Maximum portability
|
||||
- Readable by any markdown editor
|
||||
- No custom parsing required
|
||||
- Metadata belongs in database (timestamps, slugs, etc.)
|
||||
- User sees just their content when opening file
|
||||
|
||||
#### Optional Future Enhancement (V2+)
|
||||
If frontmatter becomes necessary, use standard YAML:
|
||||
```markdown
|
||||
---
|
||||
title: Optional Title
|
||||
tags: tag1, tag2
|
||||
---
|
||||
[Content here]
|
||||
```
|
||||
|
||||
But for V1: **NO frontmatter**.
|
||||
|
||||
## Rationale
|
||||
|
||||
### File Storage Benefits
|
||||
**Simplicity Score: 10/10**
|
||||
- Text files are the simplest storage
|
||||
- No binary formats
|
||||
- Human-readable
|
||||
- Easy to backup (rsync, git, Dropbox, etc.)
|
||||
|
||||
**Portability Score: 10/10**
|
||||
- Standard markdown format
|
||||
- Readable without application
|
||||
- Can be edited in any text editor
|
||||
- Easy to migrate to other systems
|
||||
|
||||
**Ownership Score: 10/10**
|
||||
- User has direct access to their content
|
||||
- No vendor lock-in
|
||||
- Can grep their own notes
|
||||
- Backup is simple file copy
|
||||
|
||||
### Hybrid Approach Benefits
|
||||
**Performance**: Database indexes enable fast queries
|
||||
**Flexibility**: Rich metadata without cluttering files
|
||||
**Integrity**: Database enforces uniqueness and relationships
|
||||
**Simplicity**: Each system does what it's best at
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- Notes are portable markdown files
|
||||
- User can edit notes directly in filesystem if desired
|
||||
- Easy backup (just copy data/ directory)
|
||||
- Database provides fast metadata queries
|
||||
- Can rebuild database from files if needed
|
||||
- Git-friendly (can version control notes)
|
||||
- Maximum data ownership
|
||||
|
||||
### Negative
|
||||
- Must keep file and database in sync
|
||||
- Potential for orphaned database records
|
||||
- Potential for orphaned files
|
||||
- File operations are slower than database queries
|
||||
- Must handle file system errors
|
||||
|
||||
### Mitigation Strategies
|
||||
|
||||
#### Sync Strategy
|
||||
1. **On note creation**: Write file FIRST, then database record
|
||||
2. **On note update**: Update file FIRST, then database record (update timestamp, content_hash)
|
||||
3. **On note delete**: Mark as deleted in database, optionally move file to .trash/
|
||||
4. **On startup**: Optional integrity check to detect orphans
|
||||
|
||||
#### Orphan Detection
|
||||
```python
|
||||
# Pseudo-code for integrity check
|
||||
def check_integrity():
|
||||
# Find database records without files
|
||||
for note in database.all_notes():
|
||||
if not file_exists(note.file_path):
|
||||
log_error(f"Orphaned database record: {note.slug}")
|
||||
|
||||
# Find files without database records
|
||||
for file in filesystem.all_markdown_files():
|
||||
if not database.has_note(file_path=file):
|
||||
log_error(f"Orphaned file: {file}")
|
||||
```
|
||||
|
||||
#### Content Hash Strategy
|
||||
- Calculate SHA-256 hash of file content on write
|
||||
- Store hash in database
|
||||
- On read, can verify content hasn't been externally modified
|
||||
- Enables change detection and cache invalidation
|
||||
|
||||
## Data Flow Patterns
|
||||
|
||||
### Creating a Note
|
||||
|
||||
1. Generate slug from content or timestamp
|
||||
2. Determine file path: `data/notes/{YYYY}/{MM}/{slug}.md`
|
||||
3. Create directories if needed
|
||||
4. Write markdown content to file
|
||||
5. Calculate content hash
|
||||
6. Insert record into database
|
||||
7. Return success
|
||||
|
||||
**Transaction Safety**: If database insert fails, delete file and raise error
|
||||
|
||||
### Reading a Note
|
||||
|
||||
**By Slug**:
|
||||
1. Query database for file_path by slug
|
||||
2. Read file content from disk
|
||||
3. Return content + metadata
|
||||
|
||||
**For List**:
|
||||
1. Query database for metadata (sorted, filtered)
|
||||
2. Optionally read file content for each note
|
||||
3. Return list with metadata and content
|
||||
|
||||
### Updating a Note
|
||||
|
||||
1. Query database for existing file_path
|
||||
2. Write new content to file (atomic write to temp, then rename)
|
||||
3. Calculate new content hash
|
||||
4. Update database record (timestamp, content_hash)
|
||||
5. Return success
|
||||
|
||||
**Transaction Safety**: Keep backup of original file until database update succeeds
|
||||
|
||||
### Deleting a Note
|
||||
|
||||
**Soft Delete (Recommended)**:
|
||||
1. Update database: set `deleted_at` timestamp
|
||||
2. Optionally move file to `.trash/` subdirectory
|
||||
3. Return success
|
||||
|
||||
**Hard Delete**:
|
||||
1. Delete database record
|
||||
2. Delete file from filesystem
|
||||
3. Return success
|
||||
|
||||
## File System Operations
|
||||
|
||||
### Atomic Writes
|
||||
```python
|
||||
# Pseudo-code for atomic file write
|
||||
def write_note_safely(path, content):
|
||||
temp_path = f"{path}.tmp"
|
||||
write(temp_path, content)
|
||||
atomic_rename(temp_path, path) # Atomic on POSIX systems
|
||||
```
|
||||
|
||||
### Directory Creation
|
||||
```python
|
||||
# Ensure directory exists before writing
|
||||
def ensure_note_directory(year, month):
|
||||
path = f"data/notes/{year}/{month}"
|
||||
makedirs(path, exist_ok=True)
|
||||
return path
|
||||
```
|
||||
|
||||
### Slug Generation
|
||||
```python
|
||||
# Generate URL-safe slug
|
||||
def generate_slug(content=None, timestamp=None):
|
||||
if content:
|
||||
# Extract first few words, normalize
|
||||
words = extract_first_words(content, max=5)
|
||||
slug = normalize(words) # lowercase, hyphens, no special chars
|
||||
else:
|
||||
# Fallback: timestamp-based
|
||||
slug = timestamp.strftime("%Y%m%d-%H%M%S")
|
||||
|
||||
# Ensure uniqueness
|
||||
if database.slug_exists(slug):
|
||||
slug = f"{slug}-{random_suffix()}"
|
||||
|
||||
return slug
|
||||
```
|
||||
|
||||
## Backup Strategy
|
||||
|
||||
### Simple Backup
|
||||
```bash
|
||||
# User can backup with simple copy
|
||||
cp -r data/ backup/
|
||||
|
||||
# Or with rsync
|
||||
rsync -av data/ backup/
|
||||
|
||||
# Or with git
|
||||
cd data/ && git add . && git commit -m "Backup"
|
||||
```
|
||||
|
||||
### Restore Strategy
|
||||
1. Copy data/ directory to new location
|
||||
2. Application reads database
|
||||
3. If database missing or corrupt, rebuild from files:
|
||||
```python
|
||||
def rebuild_database_from_files():
|
||||
for file_path in glob("data/notes/**/*.md"):
|
||||
content = read_file(file_path)
|
||||
metadata = extract_metadata_from_path(file_path)
|
||||
database.insert_note(
|
||||
slug=metadata.slug,
|
||||
file_path=file_path,
|
||||
created_at=file_stat.created,
|
||||
updated_at=file_stat.modified,
|
||||
content_hash=hash(content)
|
||||
)
|
||||
```
|
||||
|
||||
## Standards Compliance
|
||||
|
||||
### Markdown Standard
|
||||
- CommonMark specification
|
||||
- No custom extensions in V1
|
||||
- Standard markdown processors can read files
|
||||
|
||||
### File System Compatibility
|
||||
- ASCII-safe filenames
|
||||
- No special characters in paths
|
||||
- Maximum path length under 255 characters
|
||||
- POSIX-compatible directory structure
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### All-Database Storage (Rejected)
|
||||
- **Simplicity**: 8/10 - Simpler code, single source of truth
|
||||
- **Portability**: 2/10 - Requires database export
|
||||
- **Ownership**: 3/10 - User doesn't have direct access
|
||||
- **Verdict**: Violates user requirement for file-based storage
|
||||
|
||||
### Flat File Directory (Rejected)
|
||||
```
|
||||
data/notes/
|
||||
├── note-1.md
|
||||
├── note-2.md
|
||||
├── note-3.md
|
||||
...
|
||||
├── note-9999.md
|
||||
```
|
||||
- **Simplicity**: 10/10 - Simplest possible structure
|
||||
- **Scalability**: 3/10 - Thousands of files in one directory is slow
|
||||
- **Verdict**: Not scalable, poor performance with many notes
|
||||
|
||||
### Git-Based Storage (Rejected for V1)
|
||||
- **Simplicity**: 6/10 - Requires git integration
|
||||
- **Portability**: 9/10 - Excellent versioning
|
||||
- **Performance**: 7/10 - Git operations have overhead
|
||||
- **Verdict**: Interesting for V2, but adds complexity to V1
|
||||
|
||||
### Frontmatter in Files (Rejected for V1)
|
||||
```markdown
|
||||
---
|
||||
slug: my-note
|
||||
created: 2024-11-18
|
||||
published: true
|
||||
---
|
||||
Note content here
|
||||
```
|
||||
- **Simplicity**: 7/10 - Requires YAML parsing
|
||||
- **Portability**: 8/10 - Common pattern, but not pure markdown
|
||||
- **Single Source**: 10/10 - All data in one place
|
||||
- **Verdict**: Deferred to V2; V1 keeps files pure
|
||||
|
||||
### JSON Metadata Sidecar (Rejected)
|
||||
```
|
||||
notes/
|
||||
├── my-note.md
|
||||
├── my-note.json # Metadata
|
||||
```
|
||||
- **Simplicity**: 6/10 - Doubles number of files
|
||||
- **Portability**: 7/10 - Markdown still clean, but extra files
|
||||
- **Sync Issues**: 5/10 - Must keep two files in sync
|
||||
- **Verdict**: Database metadata is cleaner
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Create data/notes directory structure on initialization
|
||||
- [ ] Implement slug generation algorithm
|
||||
- [ ] Implement atomic file write operations
|
||||
- [ ] Implement content hash calculation
|
||||
- [ ] Create database schema with indexes
|
||||
- [ ] Implement sync between files and database
|
||||
- [ ] Implement orphan detection (optional for V1)
|
||||
- [ ] Add file system error handling
|
||||
- [ ] Create backup documentation for users
|
||||
- [ ] Test with thousands of notes for performance
|
||||
|
||||
## References
|
||||
- CommonMark Spec: https://spec.commonmark.org/
|
||||
- POSIX File Operations: https://pubs.opengroup.org/onlinepubs/9699919799/
|
||||
- File System Best Practices: https://www.pathname.com/fhs/
|
||||
- Atomic File Operations: https://lwn.net/Articles/457667/
|
||||
Reference in New Issue
Block a user