Files
StarPunk/docs/decisions/ADR-034-full-text-search.md
Phil Skentelbery 82bb1499d5 docs: Add v1.1.0 architecture and validation documentation
- ADR-033: Database migration redesign
- ADR-034: Full-text search with FTS5
- ADR-035: Custom slugs in Micropub
- ADR-036: IndieAuth token verification method
- ADR-039: Micropub URL construction fix
- Implementation plan and decisions
- Architecture specifications
- Validation reports for implementation and search UI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 10:39:58 -07:00

6.2 KiB

ADR-034: Full-Text Search with SQLite FTS5

Status

Proposed

Context

Users need the ability to search through their notes efficiently. Currently, finding specific content requires manually browsing through notes or using external tools. A built-in search capability is essential for any content management system, especially as the number of notes grows.

Requirements:

  • Fast search across all note content
  • Support for phrase searching and boolean operators
  • Ranking by relevance
  • Minimal performance impact on write operations
  • No external dependencies (Elasticsearch, Solr, etc.)
  • Works with existing SQLite database

Decision

Implement full-text search using SQLite's FTS5 (Full-Text Search version 5) extension:

  1. FTS5 Virtual Table: Create a shadow FTS table that indexes note content
  2. Synchronized Updates: Keep FTS index in sync with note operations
  3. Search Endpoint: New /api/search endpoint for queries
  4. Search UI: Simple search interface in the web UI
  5. Advanced Operators: Support FTS5's query syntax for power users

Database schema:

-- FTS5 virtual table for note content
CREATE VIRTUAL TABLE IF NOT EXISTS notes_fts USING fts5(
    slug UNINDEXED,  -- For result retrieval, not searchable
    title,           -- Note title (first line)
    content,         -- Full markdown content
    tokenize='porter unicode61'  -- Stem words, handle unicode
);

-- Trigger to keep FTS in sync with notes table
CREATE TRIGGER notes_fts_insert AFTER INSERT ON notes
BEGIN
    INSERT INTO notes_fts (rowid, slug, title, content)
    SELECT id, slug, title_from_content(content), content
    FROM notes WHERE id = NEW.id;
END;

-- Similar triggers for UPDATE and DELETE

Rationale

SQLite FTS5 is the optimal choice because:

  1. Native Integration: Built into SQLite, no external dependencies
  2. Performance: Highly optimized C implementation
  3. Features: Rich query syntax (phrases, NEAR, boolean, wildcards)
  4. Ranking: Built-in BM25 ranking algorithm
  5. Simplicity: Just another table in our existing database
  6. Maintenance-free: No separate search service to manage
  7. Size: Minimal storage overhead (~30% of original text)

Query capabilities:

  • Simple terms: indieweb
  • Phrases: "static site"
  • Wildcards: micro*
  • Boolean: micropub OR websub
  • Exclusions: indieweb NOT wordpress
  • Field-specific: title:announcement

Consequences

Positive

  • Powerful search with zero external dependencies
  • Fast queries even with thousands of notes
  • Rich query syntax for power users
  • Automatic stemming (search "running" finds "run", "runs")
  • Unicode support for international content
  • Integrates seamlessly with existing SQLite database

Negative

  • FTS index increases database size by ~30%
  • Initial indexing of existing notes required
  • Must maintain sync triggers for consistency
  • FTS5 requires SQLite 3.9.0+ (2015, widely available)
  • Cannot search in encrypted/binary content

Performance Characteristics

  • Index build: ~1ms per note
  • Search query: <10ms for 10,000 notes
  • Index size: ~30% of indexed text
  • Write overhead: ~5% increase in note creation time

Alternatives Considered

Alternative 1: Simple LIKE Queries

SELECT * FROM notes WHERE content LIKE '%search term%'
  • Pros: No setup, works today
  • Cons: Extremely slow on large datasets, no ranking, no advanced features
  • Rejected because: Performance degrades quickly with scale

Alternative 2: External Search Service (Elasticsearch/Meilisearch)

  • Pros: More features, dedicated search infrastructure
  • Cons: External dependency, complex setup, overkill for single-user CMS
  • Rejected because: Violates minimal philosophy, adds operational complexity

Alternative 3: Client-Side Search (Lunr.js)

  • Pros: No server changes needed
  • Cons: Must download all content to browser, doesn't scale
  • Rejected because: Impractical beyond a few hundred notes
  • Pros: Powerful pattern matching
  • Cons: Slow, no ranking, must read all files from disk
  • Rejected because: Poor performance, no relevance ranking

Implementation Plan

Phase 1: Database Schema (2 hours)

  1. Add FTS5 table creation to migrations
  2. Create sync triggers for INSERT/UPDATE/DELETE
  3. Build initial index from existing notes
  4. Test sync on note operations

Phase 2: Search API (2 hours)

  1. Create /api/search endpoint
  2. Implement query parser and validation
  3. Add result ranking and pagination
  4. Return structured results with snippets

Phase 3: Search UI (1 hour)

  1. Add search box to navigation
  2. Create search results page
  3. Highlight matching terms in results
  4. Add search query syntax help

Phase 4: Testing (1 hour)

  1. Test with various query types
  2. Benchmark with large datasets
  3. Verify sync triggers work correctly
  4. Test Unicode and special characters

API Design

Search Endpoint

GET /api/search?q={query}&limit=20&offset=0

Response:
{
    "query": "indieweb micropub",
    "total": 15,
    "results": [
        {
            "slug": "implementing-micropub",
            "title": "Implementing Micropub",
            "snippet": "...the <mark>IndieWeb</mark> <mark>Micropub</mark> specification...",
            "rank": 2.4,
            "published": true,
            "created_at": "2024-01-15T10:00:00Z"
        }
    ]
}

Query Syntax Examples

  • indieweb - Find notes containing "indieweb"
  • "static site" - Exact phrase
  • micro* - Prefix search
  • title:announcement - Search in title only
  • micropub OR websub - Boolean operators
  • indieweb -wordpress - Exclusion

Security Considerations

  1. Sanitize queries to prevent SQL injection (FTS5 handles this)
  2. Rate limit search endpoint to prevent abuse
  3. Only search published notes for anonymous users
  4. Escape HTML in snippets to prevent XSS

Migration Strategy

  1. Check SQLite version supports FTS5 (3.9.0+)
  2. Create FTS table and triggers in migration
  3. Build initial index from existing notes
  4. Monitor index size and performance
  5. Document search syntax for users

References