phil/StarPunk

Fork 0

Files

Phil Skentelbery a68fd570c7 that initial commit

2025-11-18 19:21:31 -07:00

11 KiB

Raw Blame History

ADR-004: File-Based Note Storage Architecture

Status

Accepted

Context

The user explicitly requires notes to be stored as files on disk rather than as database records. This is critical for:

Data portability - notes can be backed up, moved, and read without the application
User ownership - direct access to content in human-readable format
Simplicity - text files are the simplest storage mechanism
Future-proofing - markdown files will be readable forever

However, we also need SQLite for:

Metadata (timestamps, slugs, published status)
Authentication tokens
Fast querying and indexing
Relational data

The challenge is designing how file-based storage and database metadata work together efficiently.

Decision

Hybrid Architecture: Files + Database Metadata

Notes Content: Stored as markdown files on disk Notes Metadata: Stored in SQLite database Source of Truth: Files are authoritative for content; database is authoritative for metadata

File Storage Strategy

Directory Structure

data/
├── notes/
│   ├── 2024/
│   │   ├── 11/
│   │   │   ├── my-first-note.md
│   │   │   └── another-note.md
│   │   └── 12/
│   │       └── december-note.md
│   └── 2025/
│       └── 01/
│           └── new-year-note.md
├── starpunk.db          # SQLite database
└── .backups/            # Optional backup directory

File Naming Convention

Format: {slug}.md
Slug rules: lowercase, alphanumeric, hyphens only, no spaces
Example: my-first-note.md
Uniqueness: Enforced by filesystem (can't have two files with same name in same directory)

File Organization

Pattern: Year/Month subdirectories (YYYY/MM/)
Rationale:
- Keeps directories manageable (max ~30 files per month)
- Easy chronological browsing
- Matches natural mental model
- Scalable to thousands of notes
Example path: data/notes/2024/11/my-first-note.md

Database Schema

CREATE TABLE notes (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    slug TEXT UNIQUE NOT NULL,           -- URL identifier
    file_path TEXT UNIQUE NOT NULL,      -- Relative path from data/notes/
    published BOOLEAN DEFAULT 0,         -- Publication status
    created_at TIMESTAMP NOT NULL,       -- Creation timestamp
    updated_at TIMESTAMP NOT NULL,       -- Last modification timestamp
    content_hash TEXT                    -- SHA-256 of file content for change detection
);

CREATE INDEX idx_notes_created_at ON notes(created_at DESC);
CREATE INDEX idx_notes_published ON notes(published);
CREATE INDEX idx_notes_slug ON notes(slug);

File Format

Markdown File Structure

[Content of the note in markdown format]

That's it. No frontmatter, no metadata in file. Keep it pure.

Rationale:

Maximum portability
Readable by any markdown editor
No custom parsing required
Metadata belongs in database (timestamps, slugs, etc.)
User sees just their content when opening file

Optional Future Enhancement (V2+)

If frontmatter becomes necessary, use standard YAML:

---
title: Optional Title
tags: tag1, tag2
---
[Content here]

But for V1: NO frontmatter.

Rationale

File Storage Benefits

Simplicity Score: 10/10

Text files are the simplest storage
No binary formats
Human-readable
Easy to backup (rsync, git, Dropbox, etc.)

Portability Score: 10/10

Standard markdown format
Readable without application
Can be edited in any text editor
Easy to migrate to other systems

Ownership Score: 10/10

User has direct access to their content
No vendor lock-in
Can grep their own notes
Backup is simple file copy

Hybrid Approach Benefits

Performance: Database indexes enable fast queries Flexibility: Rich metadata without cluttering files Integrity: Database enforces uniqueness and relationships Simplicity: Each system does what it's best at

Consequences

Positive

Notes are portable markdown files
User can edit notes directly in filesystem if desired
Easy backup (just copy data/ directory)
Database provides fast metadata queries
Can rebuild database from files if needed
Git-friendly (can version control notes)
Maximum data ownership

Negative

Must keep file and database in sync
Potential for orphaned database records
Potential for orphaned files
File operations are slower than database queries
Must handle file system errors

Mitigation Strategies

Sync Strategy

On note creation: Write file FIRST, then database record
On note update: Update file FIRST, then database record (update timestamp, content_hash)
On note delete: Mark as deleted in database, optionally move file to .trash/
On startup: Optional integrity check to detect orphans

Orphan Detection

# Pseudo-code for integrity check
def check_integrity():
    # Find database records without files
    for note in database.all_notes():
        if not file_exists(note.file_path):
            log_error(f"Orphaned database record: {note.slug}")

    # Find files without database records
    for file in filesystem.all_markdown_files():
        if not database.has_note(file_path=file):
            log_error(f"Orphaned file: {file}")

Content Hash Strategy

Calculate SHA-256 hash of file content on write
Store hash in database
On read, can verify content hasn't been externally modified
Enables change detection and cache invalidation

Data Flow Patterns

Creating a Note

Generate slug from content or timestamp
Determine file path: data/notes/{YYYY}/{MM}/{slug}.md
Create directories if needed
Write markdown content to file
Calculate content hash
Insert record into database
Return success

Transaction Safety: If database insert fails, delete file and raise error

Reading a Note

By Slug:

Query database for file_path by slug
Read file content from disk
Return content + metadata

For List:

Query database for metadata (sorted, filtered)
Optionally read file content for each note
Return list with metadata and content

Updating a Note

Query database for existing file_path
Write new content to file (atomic write to temp, then rename)
Calculate new content hash
Update database record (timestamp, content_hash)
Return success

Transaction Safety: Keep backup of original file until database update succeeds

Deleting a Note

Soft Delete (Recommended):

Update database: set deleted_at timestamp
Optionally move file to .trash/ subdirectory
Return success

Hard Delete:

Delete database record
Delete file from filesystem
Return success

File System Operations

Atomic Writes

# Pseudo-code for atomic file write
def write_note_safely(path, content):
    temp_path = f"{path}.tmp"
    write(temp_path, content)
    atomic_rename(temp_path, path)  # Atomic on POSIX systems

Directory Creation

# Ensure directory exists before writing
def ensure_note_directory(year, month):
    path = f"data/notes/{year}/{month}"
    makedirs(path, exist_ok=True)
    return path

Slug Generation

# Generate URL-safe slug
def generate_slug(content=None, timestamp=None):
    if content:
        # Extract first few words, normalize
        words = extract_first_words(content, max=5)
        slug = normalize(words)  # lowercase, hyphens, no special chars
    else:
        # Fallback: timestamp-based
        slug = timestamp.strftime("%Y%m%d-%H%M%S")

    # Ensure uniqueness
    if database.slug_exists(slug):
        slug = f"{slug}-{random_suffix()}"

    return slug

Backup Strategy

Simple Backup

# User can backup with simple copy
cp -r data/ backup/

# Or with rsync
rsync -av data/ backup/

# Or with git
cd data/ && git add . && git commit -m "Backup"

Restore Strategy

Copy data/ directory to new location
Application reads database

If database missing or corrupt, rebuild from files:

def rebuild_database_from_files():
    for file_path in glob("data/notes/**/*.md"):
        content = read_file(file_path)
        metadata = extract_metadata_from_path(file_path)
        database.insert_note(
            slug=metadata.slug,
            file_path=file_path,
            created_at=file_stat.created,
            updated_at=file_stat.modified,
            content_hash=hash(content)
        )

Standards Compliance

Markdown Standard

CommonMark specification
No custom extensions in V1
Standard markdown processors can read files

File System Compatibility

ASCII-safe filenames
No special characters in paths
Maximum path length under 255 characters
POSIX-compatible directory structure

Alternatives Considered

All-Database Storage (Rejected)

Simplicity: 8/10 - Simpler code, single source of truth
Portability: 2/10 - Requires database export
Ownership: 3/10 - User doesn't have direct access
Verdict: Violates user requirement for file-based storage

Flat File Directory (Rejected)

data/notes/
├── note-1.md
├── note-2.md
├── note-3.md
...
├── note-9999.md

Simplicity: 10/10 - Simplest possible structure
Scalability: 3/10 - Thousands of files in one directory is slow
Verdict: Not scalable, poor performance with many notes

Git-Based Storage (Rejected for V1)

Simplicity: 6/10 - Requires git integration
Portability: 9/10 - Excellent versioning
Performance: 7/10 - Git operations have overhead
Verdict: Interesting for V2, but adds complexity to V1

Frontmatter in Files (Rejected for V1)

---
slug: my-note
created: 2024-11-18
published: true
---
Note content here

Simplicity: 7/10 - Requires YAML parsing
Portability: 8/10 - Common pattern, but not pure markdown
Single Source: 10/10 - All data in one place
Verdict: Deferred to V2; V1 keeps files pure

JSON Metadata Sidecar (Rejected)

notes/
├── my-note.md
├── my-note.json  # Metadata

Simplicity: 6/10 - Doubles number of files
Portability: 7/10 - Markdown still clean, but extra files
Sync Issues: 5/10 - Must keep two files in sync
Verdict: Database metadata is cleaner

Implementation Checklist

Create data/notes directory structure on initialization
Implement slug generation algorithm
Implement atomic file write operations
Implement content hash calculation
Create database schema with indexes
Implement sync between files and database
Implement orphan detection (optional for V1)
Add file system error handling
Create backup documentation for users
Test with thousands of notes for performance

References

CommonMark Spec: https://spec.commonmark.org/
POSIX File Operations: https://pubs.opengroup.org/onlinepubs/9699919799/
File System Best Practices: https://www.pathname.com/fhs/
Atomic File Operations: https://lwn.net/Articles/457667/

11 KiB Raw Blame History