# StarPunk Architecture Overview ## Executive Summary StarPunk is a minimal, single-user IndieWeb CMS designed around the principle: "Every line of code must justify its existence." The architecture prioritizes simplicity, standards compliance, and user data ownership through careful technology selection and hybrid data storage. **Core Architecture**: API-first Flask application with hybrid file+database storage, server-side rendering, and delegated authentication. ## System Architecture ### High-Level Components ``` ┌─────────────────────────────────────────────────────────────┐ │ User Browser │ └───────────────┬─────────────────────────────────────────────┘ │ │ HTTP/HTTPS ↓ ┌─────────────────────────────────────────────────────────────┐ │ Flask Application │ │ ┌─────────────────────────────────────────────────────────┤ │ │ Web Interface (Jinja2 Templates) │ │ │ - Public: Homepage, Note Permalinks │ │ │ - Admin: Dashboard, Note Editor │ │ └──────────────────────────────┬──────────────────────────┘ │ ┌──────────────────────────────┴──────────────────────────┐ │ │ API Layer (RESTful + Micropub) │ │ │ - Notes CRUD API │ │ │ - Micropub Endpoint │ │ │ - RSS Feed Generator │ │ │ - Authentication Handlers │ │ └──────────────────────────────┬──────────────────────────┘ │ ┌──────────────────────────────┴──────────────────────────┐ │ │ Business Logic │ │ │ - Note Management (create, read, update, delete) │ │ │ - File/Database Sync │ │ │ - Markdown Rendering │ │ │ - Slug Generation │ │ │ - Session Management │ │ └──────────────────────────────┬──────────────────────────┘ │ ┌──────────────────────────────┴──────────────────────────┐ │ │ Data Layer │ │ │ ┌──────────────────┐ ┌─────────────────────────┐ │ │ │ │ File Storage │ │ SQLite Database │ │ │ │ │ │ │ │ │ │ │ │ Markdown Files │ │ - Note Metadata │ │ │ │ │ (Pure Content) │ │ - Sessions │ │ │ │ │ │ │ - Tokens │ │ │ │ │ data/notes/ │ │ - Auth State │ │ │ │ │ YYYY/MM/ │ │ │ │ │ │ │ slug.md │ │ data/starpunk.db │ │ │ │ └──────────────────┘ └─────────────────────────┘ │ │ └─────────────────────────────────────────────────────────┘ └─────────────────────────────────────────────────────────────┘ │ │ HTTPS ↓ ┌─────────────────────────────────────────────────────────────┐ │ External Services │ │ - IndieLogin.com (Authentication) │ │ - User's Website (Identity Verification) │ │ - Micropub Clients (Publishing) │ └─────────────────────────────────────────────────────────────┘ ``` ## Core Principles ### 1. Radical Simplicity - Total dependencies: 6 direct packages - No build tools, no npm, no bundlers - Server-side rendering eliminates frontend complexity - Single file SQLite database - Zero configuration frameworks ### 2. Hybrid Data Architecture **Files for Content**: Markdown notes stored as plain text files - Maximum portability - Human-readable - Direct user access - Easy backup (copy, rsync, git) **Database for Metadata**: SQLite stores structured data - Fast queries and indexes - Referential integrity - Efficient filtering and sorting - Transaction support **Sync Strategy**: Files are authoritative for content; database is authoritative for metadata. Both must stay in sync. ### 3. Standards-First Design - IndieWeb: Microformats2, IndieAuth, Micropub - Web: HTML5, RSS 2.0, HTTP standards - Security: OAuth 2.0, HTTPS, secure cookies - Data: CommonMark markdown ### 4. API-First Architecture All functionality exposed via API, web interface consumes API. This enables: - Micropub client support - Future client applications - Scriptable automation - Clean separation of concerns ### 5. Progressive Enhancement - Core functionality works without JavaScript - JavaScript adds optional enhancements (markdown preview) - Server-side rendering for fast initial loads - Mobile-responsive from the start ## Component Descriptions ### Web Layer #### Public Interface **Purpose**: Display published notes to the world **Technology**: Server-side rendered HTML (Jinja2) **Routes**: - `/` - Homepage with recent notes - `/note/{slug}` - Individual note permalink - `/feed.xml` - RSS feed **Features**: - Microformats2 markup (h-entry, h-card) - Reverse chronological note list - Clean, minimal design - Mobile-responsive - No JavaScript required #### Admin Interface **Purpose**: Manage notes (create, edit, publish) **Technology**: Server-side rendered HTML (Jinja2) + optional vanilla JS **Routes**: - `/admin/login` - Authentication - `/admin` - Dashboard (list of all notes) - `/admin/new` - Create new note - `/admin/edit/{id}` - Edit existing note **Features**: - Markdown editor - Optional real-time preview (JS enhancement) - Publish/draft toggle - Protected by session authentication ### API Layer #### Notes API **Purpose**: CRUD operations for notes **Authentication**: Session-based (admin interface) **Routes**: ``` GET /api/notes List published notes POST /api/notes Create new note GET /api/notes/{id} Get single note PUT /api/notes/{id} Update note DELETE /api/notes/{id} Delete note ``` **Response Format**: JSON #### Micropub Endpoint **Purpose**: Accept posts from external Micropub clients **Authentication**: IndieAuth bearer tokens **Routes**: ``` POST /api/micropub Create note (h-entry) GET /api/micropub?q=config Query configuration GET /api/micropub?q=source Query note source ``` **Content Types**: - application/json - application/x-www-form-urlencoded **Compliance**: Full Micropub specification #### RSS Feed **Purpose**: Syndicate published notes **Technology**: feedgen library **Route**: `/feed.xml` **Format**: Valid RSS 2.0 XML **Caching**: 5 minutes **Features**: - All published notes - RFC-822 date formatting - CDATA-wrapped HTML content - Proper GUID for each item ### Business Logic Layer #### Note Management **Operations**: 1. **Create**: Generate slug → write file → insert database record 2. **Read**: Query database for path → read file → render markdown 3. **Update**: Write file atomically → update database timestamp 4. **Delete**: Mark deleted in database → optionally archive file **Key Components**: - Slug generation (URL-safe, unique) - Markdown rendering (markdown library) - Content hashing (integrity verification) - Atomic file operations (prevent corruption) #### File/Database Sync **Strategy**: Write files first, then database **Rollback**: If database operation fails, delete/restore file **Verification**: Content hash detects external modifications **Integrity Check**: Optional scan for orphaned files/records #### Authentication **Admin Auth**: IndieLogin.com OAuth 2.0 flow - User enters website URL - Redirect to indielogin.com - Verify identity via RelMeAuth or email - Return verified "me" URL - Create session token - Store in HttpOnly cookie **Micropub Auth**: IndieAuth token verification - Client obtains token via IndieAuth flow - Token sent as Bearer in Authorization header - Verify token exists and not expired - Check scope permissions ### Data Layer #### File Storage **Location**: `data/notes/` **Structure**: `YYYY/MM/slug.md` **Format**: Pure markdown, no frontmatter **Operations**: - Atomic writes (temp file → rename) - Directory creation (makedirs) - Content reading (UTF-8 encoding) **Example**: ``` data/notes/ ├── 2024/ │ ├── 11/ │ │ ├── my-first-note.md │ │ └── another-note.md │ └── 12/ │ └── december-note.md ``` #### Database Storage **Location**: `data/starpunk.db` **Engine**: SQLite3 **Tables**: - `notes` - Metadata (slug, file_path, published, timestamps, hash) - `sessions` - Auth sessions (token, me, expiry) - `tokens` - Micropub tokens (token, me, client_id, scope) - `auth_state` - CSRF tokens (state, expiry) **Indexes**: - `notes.created_at` (DESC) - Fast chronological queries - `notes.published` - Fast filtering - `notes.slug` - Fast lookup by slug - `sessions.session_token` - Fast auth checks **Queries**: Direct SQL using Python sqlite3 module (no ORM) ## Data Flow Examples ### Creating a Note (via Admin Interface) ``` 1. User fills out form at /admin/new ↓ 2. POST to /api/notes with markdown content ↓ 3. Verify user session (check session cookie) ↓ 4. Generate unique slug from content or timestamp ↓ 5. Determine file path: data/notes/2024/11/slug.md ↓ 6. Create directories if needed (makedirs) ↓ 7. Write markdown content to file (atomic write) ↓ 8. Calculate SHA-256 hash of content ↓ 9. Begin database transaction ↓ 10. Insert record into notes table: - slug - file_path - published (from form) - created_at (now) - updated_at (now) - content_hash ↓ 11. If database insert fails: - Delete file - Return error to user ↓ 12. If database insert succeeds: - Commit transaction - Return success with note URL ↓ 13. Redirect user to /admin (dashboard) ``` ### Reading a Note (via Public Interface) ``` 1. User visits /note/my-first-note ↓ 2. Extract slug from URL ↓ 3. Query database: SELECT file_path, created_at, published FROM notes WHERE slug = 'my-first-note' AND published = 1 ↓ 4. If not found → 404 error ↓ 5. Read markdown content from file: - Open data/notes/2024/11/my-first-note.md - Read UTF-8 content ↓ 6. Render markdown to HTML (markdown.markdown()) ↓ 7. Render Jinja2 template with: - content_html (rendered HTML) - created_at (timestamp) - slug (for permalink) ↓ 8. Return HTML with microformats markup ``` ### Publishing via Micropub ``` 1. Micropub client POSTs to /api/micropub Headers: Authorization: Bearer {token} Body: {"type": ["h-entry"], "properties": {"content": ["..."]}} ↓ 2. Extract bearer token from Authorization header ↓ 3. Query database: SELECT me, scope FROM tokens WHERE token = {token} AND expires_at > now() ↓ 4. If token invalid → 401 Unauthorized ↓ 5. Parse Micropub JSON payload ↓ 6. Extract content from properties.content[0] ↓ 7. Create note (same flow as admin interface): - Generate slug - Write file - Insert database record ↓ 8. If successful: - Return 201 Created - Set Location header to note URL ↓ 9. Client receives note URL, displays success ``` ### IndieLogin Authentication Flow ``` 1. User visits /admin/login ↓ 2. User enters their website: https://alice.example.com ↓ 3. POST to /admin/login with "me" parameter ↓ 4. Validate URL format ↓ 5. Generate random state token (CSRF protection) ↓ 6. Store state in database with 5-minute expiry ↓ 7. Build IndieLogin authorization URL: https://indielogin.com/auth? me=https://alice.example.com client_id=https://starpunk.example.com redirect_uri=https://starpunk.example.com/auth/callback state={random_state} ↓ 8. Redirect user to IndieLogin ↓ 9. IndieLogin verifies user's identity: - Checks rel="me" links on alice.example.com - Or sends email verification - User authenticates via chosen method ↓ 10. IndieLogin redirects back: /auth/callback?code={auth_code}&state={state} ↓ 11. Verify state matches stored value (CSRF check) ↓ 12. Exchange code for verified identity: POST https://indielogin.com/auth code={auth_code} client_id=https://starpunk.example.com redirect_uri=https://starpunk.example.com/auth/callback ↓ 13. IndieLogin returns: {"me": "https://alice.example.com"} ↓ 14. Verify me == ADMIN_ME (config) ↓ 15. If match: - Generate session token - Insert into sessions table - Set HttpOnly, Secure cookie - Redirect to /admin ↓ 16. If no match: - Return "Unauthorized" error - Log attempt ``` ## Security Architecture ### Authentication Security #### Session Management - **Token Generation**: `secrets.token_urlsafe(32)` (256-bit entropy) - **Storage**: Hash before storing in database - **Cookies**: HttpOnly, Secure, SameSite=Lax - **Expiry**: 30 days, extendable on use - **Validation**: Every protected route checks session #### CSRF Protection - **State Tokens**: Random tokens for OAuth flows - **Expiry**: 5 minutes (short-lived) - **Single-Use**: Deleted after verification - **SameSite**: Cookies set to Lax mode #### Access Control - **Admin Routes**: Require valid session - **Micropub Routes**: Require valid bearer token - **Public Routes**: No authentication needed - **Identity Verification**: Only ADMIN_ME can authenticate ### Input Validation #### User Input - **Markdown**: Sanitize to prevent XSS in rendered HTML - **URLs**: Validate format and scheme (https://) - **Slugs**: Alphanumeric + hyphens only - **JSON**: Parse and validate structure - **File Paths**: Prevent directory traversal (validate against base path) #### Micropub Payloads - **Content-Type**: Verify matches expected format - **Required Fields**: Validate h-entry structure - **Size Limits**: Prevent DoS via large payloads - **Scope Verification**: Check token has required permissions ### Database Security #### SQL Injection Prevention - **Parameterized Queries**: Always use parameter substitution - **No String Interpolation**: Never build SQL with f-strings - **Input Sanitization**: Validate before database operations Example: ```python # GOOD cursor.execute("SELECT * FROM notes WHERE slug = ?", (slug,)) # BAD (SQL injection vulnerable) cursor.execute(f"SELECT * FROM notes WHERE slug = '{slug}'") ``` #### Data Integrity - **Transactions**: Use for multi-step operations - **Constraints**: UNIQUE on slugs, file_paths - **Foreign Keys**: Enforce relationships (if applicable) - **Content Hashing**: Detect unauthorized file modifications ### Network Security #### HTTPS - **Production Requirement**: TLS 1.2+ required - **Reverse Proxy**: Nginx/Caddy handles SSL termination - **Certificate Validation**: Verify SSL certs on outbound requests - **HSTS**: Set Strict-Transport-Security header #### Security Headers ```python # Set on all responses Content-Security-Policy: default-src 'self' X-Frame-Options: DENY X-Content-Type-Options: nosniff Referrer-Policy: strict-origin-when-cross-origin ``` #### Rate Limiting - **Implementation**: Reverse proxy (nginx/Caddy) - **Admin Routes**: Stricter limits - **API Routes**: Moderate limits - **Public Routes**: Permissive limits ### File System Security #### Atomic Operations ```python # Write to temp file, then atomic rename temp_path = f"{target_path}.tmp" with open(temp_path, 'w') as f: f.write(content) os.rename(temp_path, target_path) # Atomic on POSIX ``` #### Path Validation ```python # Prevent directory traversal base_path = os.path.abspath(DATA_PATH) requested_path = os.path.abspath(os.path.join(base_path, user_input)) if not requested_path.startswith(base_path): raise SecurityError("Path traversal detected") ``` #### File Permissions - **Data Directory**: 700 (owner only) - **Database File**: 600 (owner read/write) - **Note Files**: 600 (owner read/write) - **Application User**: Dedicated non-root user ## Performance Considerations ### Response Time Targets - **API Responses**: < 100ms (database + file read) - **Page Renders**: < 200ms (template rendering) - **RSS Feed**: < 300ms (query + file reads + XML generation) ### Optimization Strategies #### Database - **Indexes**: On frequently queried columns (created_at, slug, published) - **Connection Pooling**: Single connection (single-user, no contention) - **Query Optimization**: SELECT only needed columns - **Prepared Statements**: Reuse compiled queries #### File System - **Caching**: Consider caching rendered HTML in memory (optional) - **Directory Structure**: Year/Month prevents large directories - **Atomic Reads**: Fast sequential reads, no locking needed #### HTTP - **Static Assets**: Cache headers on CSS/JS (1 year) - **RSS Feed**: Cache for 5 minutes (Cache-Control) - **Compression**: gzip/brotli via reverse proxy - **ETags**: For conditional requests #### Rendering - **Template Compilation**: Jinja2 compiles templates automatically - **Minimal Templating**: Simple templates render fast - **Server-Side**: No client-side rendering overhead ### Resource Usage #### Memory - **Flask Process**: ~50MB base - **SQLite**: ~10MB typical working set - **Total**: < 100MB under normal load #### Disk - **Application**: ~5MB (code + dependencies) - **Database**: ~1MB per 1000 notes - **Notes**: ~5KB average per markdown file - **Total**: Scales linearly with note count #### CPU - **Idle**: Near zero - **Request Handling**: Minimal (no heavy processing) - **Markdown Rendering**: Fast (pure Python) - **Database Queries**: Indexed, sub-millisecond ## Deployment Architecture ### Single-Server Deployment ``` ┌─────────────────────────────────────────────────┐ │ Internet │ └────────────────┬────────────────────────────────┘ │ │ Port 443 (HTTPS) ↓ ┌─────────────────────────────────────────────────┐ │ Nginx/Caddy (Reverse Proxy) │ │ - SSL/TLS termination │ │ - Static file serving │ │ - Rate limiting │ │ - Compression │ └────────────────┬────────────────────────────────┘ │ │ Port 8000 (HTTP) ↓ ┌─────────────────────────────────────────────────┐ │ Gunicorn (WSGI Server) │ │ - 4 worker processes │ │ - Process management │ │ - Load balancing (round-robin) │ └────────────────┬────────────────────────────────┘ │ │ WSGI ↓ ┌─────────────────────────────────────────────────┐ │ Flask Application │ │ - Request handling │ │ - Business logic │ │ - Template rendering │ └────────────────┬────────────────────────────────┘ │ ↓ ┌────────────────────────────┬────────────────────┐ │ File System │ SQLite Database │ │ data/notes/ │ data/starpunk.db │ │ YYYY/MM/slug.md │ │ └────────────────────────────┴────────────────────┘ ``` ### Process Management (systemd) ```ini [Unit] Description=StarPunk CMS After=network.target [Service] Type=notify User=starpunk WorkingDirectory=/opt/starpunk Environment="PATH=/opt/starpunk/venv/bin" ExecStart=/opt/starpunk/venv/bin/gunicorn -w 4 -b 127.0.0.1:8000 app:app Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ``` ### Backup Strategy #### Automated Daily Backup ```bash #!/bin/bash # backup.sh - Run daily via cron DATE=$(date +%Y%m%d) BACKUP_DIR="/backup/starpunk" # Backup data directory (notes + database) rsync -av /opt/starpunk/data/ "$BACKUP_DIR/$DATE/" # Keep last 30 days find "$BACKUP_DIR" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} \; ``` #### Manual Backup ```bash # Simple copy cp -r /opt/starpunk/data /backup/starpunk-$(date +%Y%m%d) # Or with compression tar -czf starpunk-backup-$(date +%Y%m%d).tar.gz /opt/starpunk/data ``` ### Restore Process 1. Stop application: `sudo systemctl stop starpunk` 2. Restore data directory: `rsync -av /backup/starpunk/20241118/ /opt/starpunk/data/` 3. Fix permissions: `chown -R starpunk:starpunk /opt/starpunk/data` 4. Start application: `sudo systemctl start starpunk` 5. Verify: Visit site, check recent notes ## Testing Strategy ### Test Pyramid ``` ┌─────────────┐ / \ / Manual Tests \ Validation, Real Services /───────────────── \ / \ / Integration Tests \ API Flows, Database + Files /─────────────────────── \ / \ / Unit Tests \ Functions, Logic, Parsing /───────────────────────────────\ ``` ### Unit Tests (pytest) **Coverage**: Business logic, utilities, models **Examples**: - Slug generation and uniqueness - Markdown rendering with various inputs - Content hash calculation - File path validation - Token generation and verification - Date formatting for RSS - Micropub payload parsing ### Integration Tests **Coverage**: Component interactions, full flows **Examples**: - Create note: file write + database insert - Read note: database query + file read - IndieLogin flow with mocked API - Micropub creation with token validation - RSS feed generation with multiple notes - Session authentication on protected routes ### End-to-End Tests **Coverage**: Full user workflows **Examples**: - Admin login via IndieLogin (mocked) - Create note via web interface - Publish note via Micropub client (mocked) - View note on public site - Verify RSS feed includes note ### Validation Tests **Coverage**: Standards compliance **Tools**: - W3C HTML Validator (validate templates) - W3C Feed Validator (validate RSS output) - IndieWebify.me (verify microformats) - Micropub.rocks (test Micropub compliance) ### Manual Tests **Coverage**: Real-world usage **Examples**: - Authenticate with real indielogin.com - Publish from actual Micropub client (Quill, Indigenous) - Subscribe to feed in actual RSS reader - Browser compatibility (Chrome, Firefox, Safari, mobile) - Accessibility with screen reader ## Monitoring and Observability ### Logging Strategy #### Application Logs ```python # Structured logging import logging logger = logging.getLogger(__name__) # Info: Normal operations logger.info("Note created", extra={ "slug": slug, "published": published, "user": session.me }) # Warning: Recoverable issues logger.warning("State token expired", extra={ "state": state, "age": age_seconds }) # Error: Failed operations logger.error("File write failed", extra={ "path": file_path, "error": str(e) }) ``` #### Log Levels - **DEBUG**: Development only (verbose) - **INFO**: Normal operations (note creation, auth success) - **WARNING**: Unusual but handled (expired tokens, invalid input) - **ERROR**: Failed operations (file I/O errors, database errors) - **CRITICAL**: System failures (database unreachable) #### Log Destinations - **Development**: Console (stdout) - **Production**: File rotation (logrotate) + optional syslog ### Metrics (Optional for V2) **Simple Metrics** (if desired): - Note count (query database) - Request count (nginx logs) - Error rate (grep application logs) - Response times (nginx logs) **Advanced Metrics** (V2): - Prometheus exporter - Grafana dashboard - Alert on error rate spike ### Health Checks ```python @app.route('/health') def health_check(): """Simple health check for monitoring""" try: # Check database db.execute("SELECT 1").fetchone() # Check file system os.path.exists(DATA_PATH) return {"status": "ok"}, 200 except Exception as e: return {"status": "error", "detail": str(e)}, 500 ``` ## Migration and Evolution ### V1 to V2 Migration #### Database Schema Changes ```sql -- Add new column with default ALTER TABLE notes ADD COLUMN tags TEXT DEFAULT ''; -- Create new table CREATE TABLE tags ( id INTEGER PRIMARY KEY, name TEXT UNIQUE NOT NULL ); -- Migration script updates existing notes ``` #### File Format Evolution **V1**: Pure markdown **V2** (if needed): Add optional frontmatter ```markdown --- tags: indieweb, cms --- Note content here ``` **Backward Compatibility**: Parser checks for frontmatter, falls back to pure markdown. #### API Versioning ``` # V1 (current) GET /api/notes # V2 (future) GET /api/v2/notes # New features GET /api/notes # Still works, returns V1 response ``` ### Data Export/Import #### Export Formats 1. **Markdown Bundle**: Zip of all notes (already portable) 2. **JSON Export**: Notes + metadata ```json { "version": "1.0", "exported_at": "2024-11-18T12:00:00Z", "notes": [ { "slug": "my-note", "content": "Note content...", "created_at": "2024-11-01T12:00:00Z", "published": true } ] } ``` 3. **RSS Archive**: Existing feed.xml #### Import (V2) - From JSON export - From WordPress XML - From markdown directory - From other IndieWeb CMSs ## Success Metrics The architecture is successful if it enables: 1. **Fast Development**: < 1 week to implement V1 2. **Easy Deployment**: < 5 minutes to get running 3. **Low Maintenance**: Runs for months without intervention 4. **High Performance**: All responses < 300ms 5. **Data Ownership**: User has direct access to all content 6. **Standards Compliance**: Passes all validators 7. **Extensibility**: Can add V2 features without rewrite ## References ### Internal Documentation - [Technology Stack](/home/phil/Projects/starpunk/docs/architecture/technology-stack.md) - [ADR-001: Python Web Framework](/home/phil/Projects/starpunk/docs/decisions/ADR-001-python-web-framework.md) - [ADR-002: Flask Extensions](/home/phil/Projects/starpunk/docs/decisions/ADR-002-flask-extensions.md) - [ADR-003: Frontend Technology](/home/phil/Projects/starpunk/docs/decisions/ADR-003-frontend-technology.md) - [ADR-004: File-Based Storage](/home/phil/Projects/starpunk/docs/decisions/ADR-004-file-based-note-storage.md) - [ADR-005: IndieLogin Authentication](/home/phil/Projects/starpunk/docs/decisions/ADR-005-indielogin-authentication.md) ### External Standards - [IndieWeb](https://indieweb.org/) - [IndieAuth Spec](https://indieauth.spec.indieweb.org/) - [Micropub Spec](https://micropub.spec.indieweb.org/) - [Microformats2](http://microformats.org/wiki/h-entry) - [RSS 2.0](https://www.rssboard.org/rss-specification) - [Flask Documentation](https://flask.palletsprojects.com/)