Files
StarPunk/docs/decisions/ADR-004-file-based-note-storage.md
2025-11-18 19:21:31 -07:00

11 KiB

ADR-004: File-Based Note Storage Architecture

Status

Accepted

Context

The user explicitly requires notes to be stored as files on disk rather than as database records. This is critical for:

  1. Data portability - notes can be backed up, moved, and read without the application
  2. User ownership - direct access to content in human-readable format
  3. Simplicity - text files are the simplest storage mechanism
  4. Future-proofing - markdown files will be readable forever

However, we also need SQLite for:

  • Metadata (timestamps, slugs, published status)
  • Authentication tokens
  • Fast querying and indexing
  • Relational data

The challenge is designing how file-based storage and database metadata work together efficiently.

Decision

Hybrid Architecture: Files + Database Metadata

Notes Content: Stored as markdown files on disk Notes Metadata: Stored in SQLite database Source of Truth: Files are authoritative for content; database is authoritative for metadata

File Storage Strategy

Directory Structure

data/
├── notes/
│   ├── 2024/
│   │   ├── 11/
│   │   │   ├── my-first-note.md
│   │   │   └── another-note.md
│   │   └── 12/
│   │       └── december-note.md
│   └── 2025/
│       └── 01/
│           └── new-year-note.md
├── starpunk.db          # SQLite database
└── .backups/            # Optional backup directory

File Naming Convention

  • Format: {slug}.md
  • Slug rules: lowercase, alphanumeric, hyphens only, no spaces
  • Example: my-first-note.md
  • Uniqueness: Enforced by filesystem (can't have two files with same name in same directory)

File Organization

  • Pattern: Year/Month subdirectories (YYYY/MM/)
  • Rationale:
    • Keeps directories manageable (max ~30 files per month)
    • Easy chronological browsing
    • Matches natural mental model
    • Scalable to thousands of notes
  • Example path: data/notes/2024/11/my-first-note.md

Database Schema

CREATE TABLE notes (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    slug TEXT UNIQUE NOT NULL,           -- URL identifier
    file_path TEXT UNIQUE NOT NULL,      -- Relative path from data/notes/
    published BOOLEAN DEFAULT 0,         -- Publication status
    created_at TIMESTAMP NOT NULL,       -- Creation timestamp
    updated_at TIMESTAMP NOT NULL,       -- Last modification timestamp
    content_hash TEXT                    -- SHA-256 of file content for change detection
);

CREATE INDEX idx_notes_created_at ON notes(created_at DESC);
CREATE INDEX idx_notes_published ON notes(published);
CREATE INDEX idx_notes_slug ON notes(slug);

File Format

Markdown File Structure

[Content of the note in markdown format]

That's it. No frontmatter, no metadata in file. Keep it pure.

Rationale:

  • Maximum portability
  • Readable by any markdown editor
  • No custom parsing required
  • Metadata belongs in database (timestamps, slugs, etc.)
  • User sees just their content when opening file

Optional Future Enhancement (V2+)

If frontmatter becomes necessary, use standard YAML:

---
title: Optional Title
tags: tag1, tag2
---
[Content here]

But for V1: NO frontmatter.

Rationale

File Storage Benefits

Simplicity Score: 10/10

  • Text files are the simplest storage
  • No binary formats
  • Human-readable
  • Easy to backup (rsync, git, Dropbox, etc.)

Portability Score: 10/10

  • Standard markdown format
  • Readable without application
  • Can be edited in any text editor
  • Easy to migrate to other systems

Ownership Score: 10/10

  • User has direct access to their content
  • No vendor lock-in
  • Can grep their own notes
  • Backup is simple file copy

Hybrid Approach Benefits

Performance: Database indexes enable fast queries Flexibility: Rich metadata without cluttering files Integrity: Database enforces uniqueness and relationships Simplicity: Each system does what it's best at

Consequences

Positive

  • Notes are portable markdown files
  • User can edit notes directly in filesystem if desired
  • Easy backup (just copy data/ directory)
  • Database provides fast metadata queries
  • Can rebuild database from files if needed
  • Git-friendly (can version control notes)
  • Maximum data ownership

Negative

  • Must keep file and database in sync
  • Potential for orphaned database records
  • Potential for orphaned files
  • File operations are slower than database queries
  • Must handle file system errors

Mitigation Strategies

Sync Strategy

  1. On note creation: Write file FIRST, then database record
  2. On note update: Update file FIRST, then database record (update timestamp, content_hash)
  3. On note delete: Mark as deleted in database, optionally move file to .trash/
  4. On startup: Optional integrity check to detect orphans

Orphan Detection

# Pseudo-code for integrity check
def check_integrity():
    # Find database records without files
    for note in database.all_notes():
        if not file_exists(note.file_path):
            log_error(f"Orphaned database record: {note.slug}")

    # Find files without database records
    for file in filesystem.all_markdown_files():
        if not database.has_note(file_path=file):
            log_error(f"Orphaned file: {file}")

Content Hash Strategy

  • Calculate SHA-256 hash of file content on write
  • Store hash in database
  • On read, can verify content hasn't been externally modified
  • Enables change detection and cache invalidation

Data Flow Patterns

Creating a Note

  1. Generate slug from content or timestamp
  2. Determine file path: data/notes/{YYYY}/{MM}/{slug}.md
  3. Create directories if needed
  4. Write markdown content to file
  5. Calculate content hash
  6. Insert record into database
  7. Return success

Transaction Safety: If database insert fails, delete file and raise error

Reading a Note

By Slug:

  1. Query database for file_path by slug
  2. Read file content from disk
  3. Return content + metadata

For List:

  1. Query database for metadata (sorted, filtered)
  2. Optionally read file content for each note
  3. Return list with metadata and content

Updating a Note

  1. Query database for existing file_path
  2. Write new content to file (atomic write to temp, then rename)
  3. Calculate new content hash
  4. Update database record (timestamp, content_hash)
  5. Return success

Transaction Safety: Keep backup of original file until database update succeeds

Deleting a Note

Soft Delete (Recommended):

  1. Update database: set deleted_at timestamp
  2. Optionally move file to .trash/ subdirectory
  3. Return success

Hard Delete:

  1. Delete database record
  2. Delete file from filesystem
  3. Return success

File System Operations

Atomic Writes

# Pseudo-code for atomic file write
def write_note_safely(path, content):
    temp_path = f"{path}.tmp"
    write(temp_path, content)
    atomic_rename(temp_path, path)  # Atomic on POSIX systems

Directory Creation

# Ensure directory exists before writing
def ensure_note_directory(year, month):
    path = f"data/notes/{year}/{month}"
    makedirs(path, exist_ok=True)
    return path

Slug Generation

# Generate URL-safe slug
def generate_slug(content=None, timestamp=None):
    if content:
        # Extract first few words, normalize
        words = extract_first_words(content, max=5)
        slug = normalize(words)  # lowercase, hyphens, no special chars
    else:
        # Fallback: timestamp-based
        slug = timestamp.strftime("%Y%m%d-%H%M%S")

    # Ensure uniqueness
    if database.slug_exists(slug):
        slug = f"{slug}-{random_suffix()}"

    return slug

Backup Strategy

Simple Backup

# User can backup with simple copy
cp -r data/ backup/

# Or with rsync
rsync -av data/ backup/

# Or with git
cd data/ && git add . && git commit -m "Backup"

Restore Strategy

  1. Copy data/ directory to new location
  2. Application reads database
  3. If database missing or corrupt, rebuild from files:
    def rebuild_database_from_files():
        for file_path in glob("data/notes/**/*.md"):
            content = read_file(file_path)
            metadata = extract_metadata_from_path(file_path)
            database.insert_note(
                slug=metadata.slug,
                file_path=file_path,
                created_at=file_stat.created,
                updated_at=file_stat.modified,
                content_hash=hash(content)
            )
    

Standards Compliance

Markdown Standard

  • CommonMark specification
  • No custom extensions in V1
  • Standard markdown processors can read files

File System Compatibility

  • ASCII-safe filenames
  • No special characters in paths
  • Maximum path length under 255 characters
  • POSIX-compatible directory structure

Alternatives Considered

All-Database Storage (Rejected)

  • Simplicity: 8/10 - Simpler code, single source of truth
  • Portability: 2/10 - Requires database export
  • Ownership: 3/10 - User doesn't have direct access
  • Verdict: Violates user requirement for file-based storage

Flat File Directory (Rejected)

data/notes/
├── note-1.md
├── note-2.md
├── note-3.md
...
├── note-9999.md
  • Simplicity: 10/10 - Simplest possible structure
  • Scalability: 3/10 - Thousands of files in one directory is slow
  • Verdict: Not scalable, poor performance with many notes

Git-Based Storage (Rejected for V1)

  • Simplicity: 6/10 - Requires git integration
  • Portability: 9/10 - Excellent versioning
  • Performance: 7/10 - Git operations have overhead
  • Verdict: Interesting for V2, but adds complexity to V1

Frontmatter in Files (Rejected for V1)

---
slug: my-note
created: 2024-11-18
published: true
---
Note content here
  • Simplicity: 7/10 - Requires YAML parsing
  • Portability: 8/10 - Common pattern, but not pure markdown
  • Single Source: 10/10 - All data in one place
  • Verdict: Deferred to V2; V1 keeps files pure

JSON Metadata Sidecar (Rejected)

notes/
├── my-note.md
├── my-note.json  # Metadata
  • Simplicity: 6/10 - Doubles number of files
  • Portability: 7/10 - Markdown still clean, but extra files
  • Sync Issues: 5/10 - Must keep two files in sync
  • Verdict: Database metadata is cleaner

Implementation Checklist

  • Create data/notes directory structure on initialization
  • Implement slug generation algorithm
  • Implement atomic file write operations
  • Implement content hash calculation
  • Create database schema with indexes
  • Implement sync between files and database
  • Implement orphan detection (optional for V1)
  • Add file system error handling
  • Create backup documentation for users
  • Test with thousands of notes for performance

References