- ADR-033: Database migration redesign - ADR-034: Full-text search with FTS5 - ADR-035: Custom slugs in Micropub - ADR-036: IndieAuth token verification method - ADR-039: Micropub URL construction fix - Implementation plan and decisions - Architecture specifications - Validation reports for implementation and search UI 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
186 lines
6.2 KiB
Markdown
186 lines
6.2 KiB
Markdown
# ADR-034: Full-Text Search with SQLite FTS5
|
|
|
|
## Status
|
|
Proposed
|
|
|
|
## Context
|
|
Users need the ability to search through their notes efficiently. Currently, finding specific content requires manually browsing through notes or using external tools. A built-in search capability is essential for any content management system, especially as the number of notes grows.
|
|
|
|
Requirements:
|
|
- Fast search across all note content
|
|
- Support for phrase searching and boolean operators
|
|
- Ranking by relevance
|
|
- Minimal performance impact on write operations
|
|
- No external dependencies (Elasticsearch, Solr, etc.)
|
|
- Works with existing SQLite database
|
|
|
|
## Decision
|
|
Implement full-text search using SQLite's FTS5 (Full-Text Search version 5) extension:
|
|
|
|
1. **FTS5 Virtual Table**: Create a shadow FTS table that indexes note content
|
|
2. **Synchronized Updates**: Keep FTS index in sync with note operations
|
|
3. **Search Endpoint**: New `/api/search` endpoint for queries
|
|
4. **Search UI**: Simple search interface in the web UI
|
|
5. **Advanced Operators**: Support FTS5's query syntax for power users
|
|
|
|
Database schema:
|
|
```sql
|
|
-- FTS5 virtual table for note content
|
|
CREATE VIRTUAL TABLE IF NOT EXISTS notes_fts USING fts5(
|
|
slug UNINDEXED, -- For result retrieval, not searchable
|
|
title, -- Note title (first line)
|
|
content, -- Full markdown content
|
|
tokenize='porter unicode61' -- Stem words, handle unicode
|
|
);
|
|
|
|
-- Trigger to keep FTS in sync with notes table
|
|
CREATE TRIGGER notes_fts_insert AFTER INSERT ON notes
|
|
BEGIN
|
|
INSERT INTO notes_fts (rowid, slug, title, content)
|
|
SELECT id, slug, title_from_content(content), content
|
|
FROM notes WHERE id = NEW.id;
|
|
END;
|
|
|
|
-- Similar triggers for UPDATE and DELETE
|
|
```
|
|
|
|
## Rationale
|
|
SQLite FTS5 is the optimal choice because:
|
|
|
|
1. **Native Integration**: Built into SQLite, no external dependencies
|
|
2. **Performance**: Highly optimized C implementation
|
|
3. **Features**: Rich query syntax (phrases, NEAR, boolean, wildcards)
|
|
4. **Ranking**: Built-in BM25 ranking algorithm
|
|
5. **Simplicity**: Just another table in our existing database
|
|
6. **Maintenance-free**: No separate search service to manage
|
|
7. **Size**: Minimal storage overhead (~30% of original text)
|
|
|
|
Query capabilities:
|
|
- Simple terms: `indieweb`
|
|
- Phrases: `"static site"`
|
|
- Wildcards: `micro*`
|
|
- Boolean: `micropub OR websub`
|
|
- Exclusions: `indieweb NOT wordpress`
|
|
- Field-specific: `title:announcement`
|
|
|
|
## Consequences
|
|
### Positive
|
|
- Powerful search with zero external dependencies
|
|
- Fast queries even with thousands of notes
|
|
- Rich query syntax for power users
|
|
- Automatic stemming (search "running" finds "run", "runs")
|
|
- Unicode support for international content
|
|
- Integrates seamlessly with existing SQLite database
|
|
|
|
### Negative
|
|
- FTS index increases database size by ~30%
|
|
- Initial indexing of existing notes required
|
|
- Must maintain sync triggers for consistency
|
|
- FTS5 requires SQLite 3.9.0+ (2015, widely available)
|
|
- Cannot search in encrypted/binary content
|
|
|
|
### Performance Characteristics
|
|
- Index build: ~1ms per note
|
|
- Search query: <10ms for 10,000 notes
|
|
- Index size: ~30% of indexed text
|
|
- Write overhead: ~5% increase in note creation time
|
|
|
|
## Alternatives Considered
|
|
|
|
### Alternative 1: Simple LIKE Queries
|
|
```sql
|
|
SELECT * FROM notes WHERE content LIKE '%search term%'
|
|
```
|
|
- **Pros**: No setup, works today
|
|
- **Cons**: Extremely slow on large datasets, no ranking, no advanced features
|
|
- **Rejected because**: Performance degrades quickly with scale
|
|
|
|
### Alternative 2: External Search Service (Elasticsearch/Meilisearch)
|
|
- **Pros**: More features, dedicated search infrastructure
|
|
- **Cons**: External dependency, complex setup, overkill for single-user CMS
|
|
- **Rejected because**: Violates minimal philosophy, adds operational complexity
|
|
|
|
### Alternative 3: Client-Side Search (Lunr.js)
|
|
- **Pros**: No server changes needed
|
|
- **Cons**: Must download all content to browser, doesn't scale
|
|
- **Rejected because**: Impractical beyond a few hundred notes
|
|
|
|
### Alternative 4: Regex/Grep-based Search
|
|
- **Pros**: Powerful pattern matching
|
|
- **Cons**: Slow, no ranking, must read all files from disk
|
|
- **Rejected because**: Poor performance, no relevance ranking
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: Database Schema (2 hours)
|
|
1. Add FTS5 table creation to migrations
|
|
2. Create sync triggers for INSERT/UPDATE/DELETE
|
|
3. Build initial index from existing notes
|
|
4. Test sync on note operations
|
|
|
|
### Phase 2: Search API (2 hours)
|
|
1. Create `/api/search` endpoint
|
|
2. Implement query parser and validation
|
|
3. Add result ranking and pagination
|
|
4. Return structured results with snippets
|
|
|
|
### Phase 3: Search UI (1 hour)
|
|
1. Add search box to navigation
|
|
2. Create search results page
|
|
3. Highlight matching terms in results
|
|
4. Add search query syntax help
|
|
|
|
### Phase 4: Testing (1 hour)
|
|
1. Test with various query types
|
|
2. Benchmark with large datasets
|
|
3. Verify sync triggers work correctly
|
|
4. Test Unicode and special characters
|
|
|
|
## API Design
|
|
|
|
### Search Endpoint
|
|
```
|
|
GET /api/search?q={query}&limit=20&offset=0
|
|
|
|
Response:
|
|
{
|
|
"query": "indieweb micropub",
|
|
"total": 15,
|
|
"results": [
|
|
{
|
|
"slug": "implementing-micropub",
|
|
"title": "Implementing Micropub",
|
|
"snippet": "...the <mark>IndieWeb</mark> <mark>Micropub</mark> specification...",
|
|
"rank": 2.4,
|
|
"published": true,
|
|
"created_at": "2024-01-15T10:00:00Z"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Query Syntax Examples
|
|
- `indieweb` - Find notes containing "indieweb"
|
|
- `"static site"` - Exact phrase
|
|
- `micro*` - Prefix search
|
|
- `title:announcement` - Search in title only
|
|
- `micropub OR websub` - Boolean operators
|
|
- `indieweb -wordpress` - Exclusion
|
|
|
|
## Security Considerations
|
|
1. Sanitize queries to prevent SQL injection (FTS5 handles this)
|
|
2. Rate limit search endpoint to prevent abuse
|
|
3. Only search published notes for anonymous users
|
|
4. Escape HTML in snippets to prevent XSS
|
|
|
|
## Migration Strategy
|
|
1. Check SQLite version supports FTS5 (3.9.0+)
|
|
2. Create FTS table and triggers in migration
|
|
3. Build initial index from existing notes
|
|
4. Monitor index size and performance
|
|
5. Document search syntax for users
|
|
|
|
## References
|
|
- SQLite FTS5 Documentation: https://www.sqlite.org/fts5.html
|
|
- BM25 Ranking: https://en.wikipedia.org/wiki/Okapi_BM25
|
|
- FTS5 Performance: https://www.sqlite.org/fts5.html#performance |