30 KiB
StarPunk Architecture Overview
Executive Summary
StarPunk is a minimal, single-user IndieWeb CMS designed around the principle: "Every line of code must justify its existence." The architecture prioritizes simplicity, standards compliance, and user data ownership through careful technology selection and hybrid data storage.
Core Architecture: API-first Flask application with hybrid file+database storage, server-side rendering, and delegated authentication.
System Architecture
High-Level Components
┌─────────────────────────────────────────────────────────────┐
│ User Browser │
└───────────────┬─────────────────────────────────────────────┘
│
│ HTTP/HTTPS
↓
┌─────────────────────────────────────────────────────────────┐
│ Flask Application │
│ ┌─────────────────────────────────────────────────────────┤
│ │ Web Interface (Jinja2 Templates) │
│ │ - Public: Homepage, Note Permalinks │
│ │ - Admin: Dashboard, Note Editor │
│ └──────────────────────────────┬──────────────────────────┘
│ ┌──────────────────────────────┴──────────────────────────┐
│ │ API Layer (RESTful + Micropub) │
│ │ - Notes CRUD API │
│ │ - Micropub Endpoint │
│ │ - RSS Feed Generator │
│ │ - Authentication Handlers │
│ └──────────────────────────────┬──────────────────────────┘
│ ┌──────────────────────────────┴──────────────────────────┐
│ │ Business Logic │
│ │ - Note Management (create, read, update, delete) │
│ │ - File/Database Sync │
│ │ - Markdown Rendering │
│ │ - Slug Generation │
│ │ - Session Management │
│ └──────────────────────────────┬──────────────────────────┘
│ ┌──────────────────────────────┴──────────────────────────┐
│ │ Data Layer │
│ │ ┌──────────────────┐ ┌─────────────────────────┐ │
│ │ │ File Storage │ │ SQLite Database │ │
│ │ │ │ │ │ │
│ │ │ Markdown Files │ │ - Note Metadata │ │
│ │ │ (Pure Content) │ │ - Sessions │ │
│ │ │ │ │ - Tokens │ │
│ │ │ data/notes/ │ │ - Auth State │ │
│ │ │ YYYY/MM/ │ │ │ │
│ │ │ slug.md │ │ data/starpunk.db │ │
│ │ └──────────────────┘ └─────────────────────────┘ │
│ └─────────────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────────────┘
│
│ HTTPS
↓
┌─────────────────────────────────────────────────────────────┐
│ External Services │
│ - IndieLogin.com (Authentication) │
│ - User's Website (Identity Verification) │
│ - Micropub Clients (Publishing) │
└─────────────────────────────────────────────────────────────┘
Core Principles
1. Radical Simplicity
- Total dependencies: 6 direct packages
- No build tools, no npm, no bundlers
- Server-side rendering eliminates frontend complexity
- Single file SQLite database
- Zero configuration frameworks
2. Hybrid Data Architecture
Files for Content: Markdown notes stored as plain text files
- Maximum portability
- Human-readable
- Direct user access
- Easy backup (copy, rsync, git)
Database for Metadata: SQLite stores structured data
- Fast queries and indexes
- Referential integrity
- Efficient filtering and sorting
- Transaction support
Sync Strategy: Files are authoritative for content; database is authoritative for metadata. Both must stay in sync.
3. Standards-First Design
- IndieWeb: Microformats2, IndieAuth, Micropub
- Web: HTML5, RSS 2.0, HTTP standards
- Security: OAuth 2.0, HTTPS, secure cookies
- Data: CommonMark markdown
4. API-First Architecture
All functionality exposed via API, web interface consumes API. This enables:
- Micropub client support
- Future client applications
- Scriptable automation
- Clean separation of concerns
5. Progressive Enhancement
- Core functionality works without JavaScript
- JavaScript adds optional enhancements (markdown preview)
- Server-side rendering for fast initial loads
- Mobile-responsive from the start
Component Descriptions
Web Layer
Public Interface
Purpose: Display published notes to the world Technology: Server-side rendered HTML (Jinja2) Routes:
/- Homepage with recent notes/note/{slug}- Individual note permalink/feed.xml- RSS feed
Features:
- Microformats2 markup (h-entry, h-card)
- Reverse chronological note list
- Clean, minimal design
- Mobile-responsive
- No JavaScript required
Admin Interface
Purpose: Manage notes (create, edit, publish) Technology: Server-side rendered HTML (Jinja2) + optional vanilla JS Routes:
/admin/login- Authentication/admin- Dashboard (list of all notes)/admin/new- Create new note/admin/edit/{id}- Edit existing note
Features:
- Markdown editor
- Optional real-time preview (JS enhancement)
- Publish/draft toggle
- Protected by session authentication
API Layer
Notes API
Purpose: CRUD operations for notes Authentication: Session-based (admin interface) Routes:
GET /api/notes List published notes
POST /api/notes Create new note
GET /api/notes/{id} Get single note
PUT /api/notes/{id} Update note
DELETE /api/notes/{id} Delete note
Response Format: JSON
Micropub Endpoint
Purpose: Accept posts from external Micropub clients Authentication: IndieAuth bearer tokens Routes:
POST /api/micropub Create note (h-entry)
GET /api/micropub?q=config Query configuration
GET /api/micropub?q=source Query note source
Content Types:
- application/json
- application/x-www-form-urlencoded
Compliance: Full Micropub specification
RSS Feed
Purpose: Syndicate published notes
Technology: feedgen library
Route: /feed.xml
Format: Valid RSS 2.0 XML
Caching: 5 minutes
Features:
- All published notes
- RFC-822 date formatting
- CDATA-wrapped HTML content
- Proper GUID for each item
Business Logic Layer
Note Management
Operations:
- Create: Generate slug → write file → insert database record
- Read: Query database for path → read file → render markdown
- Update: Write file atomically → update database timestamp
- Delete: Mark deleted in database → optionally archive file
Key Components:
- Slug generation (URL-safe, unique)
- Markdown rendering (markdown library)
- Content hashing (integrity verification)
- Atomic file operations (prevent corruption)
File/Database Sync
Strategy: Write files first, then database Rollback: If database operation fails, delete/restore file Verification: Content hash detects external modifications Integrity Check: Optional scan for orphaned files/records
Authentication
Admin Auth: IndieLogin.com OAuth 2.0 flow
- User enters website URL
- Redirect to indielogin.com
- Verify identity via RelMeAuth or email
- Return verified "me" URL
- Create session token
- Store in HttpOnly cookie
Micropub Auth: IndieAuth token verification
- Client obtains token via IndieAuth flow
- Token sent as Bearer in Authorization header
- Verify token exists and not expired
- Check scope permissions
Data Layer
File Storage
Location: data/notes/
Structure: YYYY/MM/slug.md
Format: Pure markdown, no frontmatter
Operations:
- Atomic writes (temp file → rename)
- Directory creation (makedirs)
- Content reading (UTF-8 encoding)
Example:
data/notes/
├── 2024/
│ ├── 11/
│ │ ├── my-first-note.md
│ │ └── another-note.md
│ └── 12/
│ └── december-note.md
Database Storage
Location: data/starpunk.db
Engine: SQLite3
Tables:
notes- Metadata (slug, file_path, published, timestamps, hash)sessions- Auth sessions (token, me, expiry)tokens- Micropub tokens (token, me, client_id, scope)auth_state- CSRF tokens (state, expiry)
Indexes:
notes.created_at(DESC) - Fast chronological queriesnotes.published- Fast filteringnotes.slug- Fast lookup by slugsessions.session_token- Fast auth checks
Queries: Direct SQL using Python sqlite3 module (no ORM)
Data Flow Examples
Creating a Note (via Admin Interface)
1. User fills out form at /admin/new
↓
2. POST to /api/notes with markdown content
↓
3. Verify user session (check session cookie)
↓
4. Generate unique slug from content or timestamp
↓
5. Determine file path: data/notes/2024/11/slug.md
↓
6. Create directories if needed (makedirs)
↓
7. Write markdown content to file (atomic write)
↓
8. Calculate SHA-256 hash of content
↓
9. Begin database transaction
↓
10. Insert record into notes table:
- slug
- file_path
- published (from form)
- created_at (now)
- updated_at (now)
- content_hash
↓
11. If database insert fails:
- Delete file
- Return error to user
↓
12. If database insert succeeds:
- Commit transaction
- Return success with note URL
↓
13. Redirect user to /admin (dashboard)
Reading a Note (via Public Interface)
1. User visits /note/my-first-note
↓
2. Extract slug from URL
↓
3. Query database:
SELECT file_path, created_at, published
FROM notes
WHERE slug = 'my-first-note' AND published = 1
↓
4. If not found → 404 error
↓
5. Read markdown content from file:
- Open data/notes/2024/11/my-first-note.md
- Read UTF-8 content
↓
6. Render markdown to HTML (markdown.markdown())
↓
7. Render Jinja2 template with:
- content_html (rendered HTML)
- created_at (timestamp)
- slug (for permalink)
↓
8. Return HTML with microformats markup
Publishing via Micropub
1. Micropub client POSTs to /api/micropub
Headers: Authorization: Bearer {token}
Body: {"type": ["h-entry"], "properties": {"content": ["..."]}}
↓
2. Extract bearer token from Authorization header
↓
3. Query database:
SELECT me, scope FROM tokens
WHERE token = {token} AND expires_at > now()
↓
4. If token invalid → 401 Unauthorized
↓
5. Parse Micropub JSON payload
↓
6. Extract content from properties.content[0]
↓
7. Create note (same flow as admin interface):
- Generate slug
- Write file
- Insert database record
↓
8. If successful:
- Return 201 Created
- Set Location header to note URL
↓
9. Client receives note URL, displays success
IndieLogin Authentication Flow
1. User visits /admin/login
↓
2. User enters their website: https://alice.example.com
↓
3. POST to /admin/login with "me" parameter
↓
4. Validate URL format
↓
5. Generate random state token (CSRF protection)
↓
6. Store state in database with 5-minute expiry
↓
7. Build IndieLogin authorization URL:
https://indielogin.com/auth?
me=https://alice.example.com
client_id=https://starpunk.example.com
redirect_uri=https://starpunk.example.com/auth/callback
state={random_state}
↓
8. Redirect user to IndieLogin
↓
9. IndieLogin verifies user's identity:
- Checks rel="me" links on alice.example.com
- Or sends email verification
- User authenticates via chosen method
↓
10. IndieLogin redirects back:
/auth/callback?code={auth_code}&state={state}
↓
11. Verify state matches stored value (CSRF check)
↓
12. Exchange code for verified identity:
POST https://indielogin.com/auth
code={auth_code}
client_id=https://starpunk.example.com
redirect_uri=https://starpunk.example.com/auth/callback
↓
13. IndieLogin returns: {"me": "https://alice.example.com"}
↓
14. Verify me == ADMIN_ME (config)
↓
15. If match:
- Generate session token
- Insert into sessions table
- Set HttpOnly, Secure cookie
- Redirect to /admin
↓
16. If no match:
- Return "Unauthorized" error
- Log attempt
Security Architecture
Authentication Security
Session Management
- Token Generation:
secrets.token_urlsafe(32)(256-bit entropy) - Storage: Hash before storing in database
- Cookies: HttpOnly, Secure, SameSite=Lax
- Expiry: 30 days, extendable on use
- Validation: Every protected route checks session
CSRF Protection
- State Tokens: Random tokens for OAuth flows
- Expiry: 5 minutes (short-lived)
- Single-Use: Deleted after verification
- SameSite: Cookies set to Lax mode
Access Control
- Admin Routes: Require valid session
- Micropub Routes: Require valid bearer token
- Public Routes: No authentication needed
- Identity Verification: Only ADMIN_ME can authenticate
Input Validation
User Input
- Markdown: Sanitize to prevent XSS in rendered HTML
- URLs: Validate format and scheme (https://)
- Slugs: Alphanumeric + hyphens only
- JSON: Parse and validate structure
- File Paths: Prevent directory traversal (validate against base path)
Micropub Payloads
- Content-Type: Verify matches expected format
- Required Fields: Validate h-entry structure
- Size Limits: Prevent DoS via large payloads
- Scope Verification: Check token has required permissions
Database Security
SQL Injection Prevention
- Parameterized Queries: Always use parameter substitution
- No String Interpolation: Never build SQL with f-strings
- Input Sanitization: Validate before database operations
Example:
# GOOD
cursor.execute("SELECT * FROM notes WHERE slug = ?", (slug,))
# BAD (SQL injection vulnerable)
cursor.execute(f"SELECT * FROM notes WHERE slug = '{slug}'")
Data Integrity
- Transactions: Use for multi-step operations
- Constraints: UNIQUE on slugs, file_paths
- Foreign Keys: Enforce relationships (if applicable)
- Content Hashing: Detect unauthorized file modifications
Network Security
HTTPS
- Production Requirement: TLS 1.2+ required
- Reverse Proxy: Nginx/Caddy handles SSL termination
- Certificate Validation: Verify SSL certs on outbound requests
- HSTS: Set Strict-Transport-Security header
Security Headers
# Set on all responses
Content-Security-Policy: default-src 'self'
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Referrer-Policy: strict-origin-when-cross-origin
Rate Limiting
- Implementation: Reverse proxy (nginx/Caddy)
- Admin Routes: Stricter limits
- API Routes: Moderate limits
- Public Routes: Permissive limits
File System Security
Atomic Operations
# Write to temp file, then atomic rename
temp_path = f"{target_path}.tmp"
with open(temp_path, 'w') as f:
f.write(content)
os.rename(temp_path, target_path) # Atomic on POSIX
Path Validation
# Prevent directory traversal
base_path = os.path.abspath(DATA_PATH)
requested_path = os.path.abspath(os.path.join(base_path, user_input))
if not requested_path.startswith(base_path):
raise SecurityError("Path traversal detected")
File Permissions
- Data Directory: 700 (owner only)
- Database File: 600 (owner read/write)
- Note Files: 600 (owner read/write)
- Application User: Dedicated non-root user
Performance Considerations
Response Time Targets
- API Responses: < 100ms (database + file read)
- Page Renders: < 200ms (template rendering)
- RSS Feed: < 300ms (query + file reads + XML generation)
Optimization Strategies
Database
- Indexes: On frequently queried columns (created_at, slug, published)
- Connection Pooling: Single connection (single-user, no contention)
- Query Optimization: SELECT only needed columns
- Prepared Statements: Reuse compiled queries
File System
- Caching: Consider caching rendered HTML in memory (optional)
- Directory Structure: Year/Month prevents large directories
- Atomic Reads: Fast sequential reads, no locking needed
HTTP
- Static Assets: Cache headers on CSS/JS (1 year)
- RSS Feed: Cache for 5 minutes (Cache-Control)
- Compression: gzip/brotli via reverse proxy
- ETags: For conditional requests
Rendering
- Template Compilation: Jinja2 compiles templates automatically
- Minimal Templating: Simple templates render fast
- Server-Side: No client-side rendering overhead
Resource Usage
Memory
- Flask Process: ~50MB base
- SQLite: ~10MB typical working set
- Total: < 100MB under normal load
Disk
- Application: ~5MB (code + dependencies)
- Database: ~1MB per 1000 notes
- Notes: ~5KB average per markdown file
- Total: Scales linearly with note count
CPU
- Idle: Near zero
- Request Handling: Minimal (no heavy processing)
- Markdown Rendering: Fast (pure Python)
- Database Queries: Indexed, sub-millisecond
Deployment Architecture
Single-Server Deployment
┌─────────────────────────────────────────────────┐
│ Internet │
└────────────────┬────────────────────────────────┘
│
│ Port 443 (HTTPS)
↓
┌─────────────────────────────────────────────────┐
│ Nginx/Caddy (Reverse Proxy) │
│ - SSL/TLS termination │
│ - Static file serving │
│ - Rate limiting │
│ - Compression │
└────────────────┬────────────────────────────────┘
│
│ Port 8000 (HTTP)
↓
┌─────────────────────────────────────────────────┐
│ Gunicorn (WSGI Server) │
│ - 4 worker processes │
│ - Process management │
│ - Load balancing (round-robin) │
└────────────────┬────────────────────────────────┘
│
│ WSGI
↓
┌─────────────────────────────────────────────────┐
│ Flask Application │
│ - Request handling │
│ - Business logic │
│ - Template rendering │
└────────────────┬────────────────────────────────┘
│
↓
┌────────────────────────────┬────────────────────┐
│ File System │ SQLite Database │
│ data/notes/ │ data/starpunk.db │
│ YYYY/MM/slug.md │ │
└────────────────────────────┴────────────────────┘
Process Management (systemd)
[Unit]
Description=StarPunk CMS
After=network.target
[Service]
Type=notify
User=starpunk
WorkingDirectory=/opt/starpunk
Environment="PATH=/opt/starpunk/venv/bin"
ExecStart=/opt/starpunk/venv/bin/gunicorn -w 4 -b 127.0.0.1:8000 app:app
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Backup Strategy
Automated Daily Backup
#!/bin/bash
# backup.sh - Run daily via cron
DATE=$(date +%Y%m%d)
BACKUP_DIR="/backup/starpunk"
# Backup data directory (notes + database)
rsync -av /opt/starpunk/data/ "$BACKUP_DIR/$DATE/"
# Keep last 30 days
find "$BACKUP_DIR" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} \;
Manual Backup
# Simple copy
cp -r /opt/starpunk/data /backup/starpunk-$(date +%Y%m%d)
# Or with compression
tar -czf starpunk-backup-$(date +%Y%m%d).tar.gz /opt/starpunk/data
Restore Process
- Stop application:
sudo systemctl stop starpunk - Restore data directory:
rsync -av /backup/starpunk/20241118/ /opt/starpunk/data/ - Fix permissions:
chown -R starpunk:starpunk /opt/starpunk/data - Start application:
sudo systemctl start starpunk - Verify: Visit site, check recent notes
Testing Strategy
Test Pyramid
┌─────────────┐
/ \
/ Manual Tests \ Validation, Real Services
/───────────────── \
/ \
/ Integration Tests \ API Flows, Database + Files
/─────────────────────── \
/ \
/ Unit Tests \ Functions, Logic, Parsing
/───────────────────────────────\
Unit Tests (pytest)
Coverage: Business logic, utilities, models Examples:
- Slug generation and uniqueness
- Markdown rendering with various inputs
- Content hash calculation
- File path validation
- Token generation and verification
- Date formatting for RSS
- Micropub payload parsing
Integration Tests
Coverage: Component interactions, full flows Examples:
- Create note: file write + database insert
- Read note: database query + file read
- IndieLogin flow with mocked API
- Micropub creation with token validation
- RSS feed generation with multiple notes
- Session authentication on protected routes
End-to-End Tests
Coverage: Full user workflows Examples:
- Admin login via IndieLogin (mocked)
- Create note via web interface
- Publish note via Micropub client (mocked)
- View note on public site
- Verify RSS feed includes note
Validation Tests
Coverage: Standards compliance Tools:
- W3C HTML Validator (validate templates)
- W3C Feed Validator (validate RSS output)
- IndieWebify.me (verify microformats)
- Micropub.rocks (test Micropub compliance)
Manual Tests
Coverage: Real-world usage Examples:
- Authenticate with real indielogin.com
- Publish from actual Micropub client (Quill, Indigenous)
- Subscribe to feed in actual RSS reader
- Browser compatibility (Chrome, Firefox, Safari, mobile)
- Accessibility with screen reader
Monitoring and Observability
Logging Strategy
Application Logs
# Structured logging
import logging
logger = logging.getLogger(__name__)
# Info: Normal operations
logger.info("Note created", extra={
"slug": slug,
"published": published,
"user": session.me
})
# Warning: Recoverable issues
logger.warning("State token expired", extra={
"state": state,
"age": age_seconds
})
# Error: Failed operations
logger.error("File write failed", extra={
"path": file_path,
"error": str(e)
})
Log Levels
- DEBUG: Development only (verbose)
- INFO: Normal operations (note creation, auth success)
- WARNING: Unusual but handled (expired tokens, invalid input)
- ERROR: Failed operations (file I/O errors, database errors)
- CRITICAL: System failures (database unreachable)
Log Destinations
- Development: Console (stdout)
- Production: File rotation (logrotate) + optional syslog
Metrics (Optional for V2)
Simple Metrics (if desired):
- Note count (query database)
- Request count (nginx logs)
- Error rate (grep application logs)
- Response times (nginx logs)
Advanced Metrics (V2):
- Prometheus exporter
- Grafana dashboard
- Alert on error rate spike
Health Checks
@app.route('/health')
def health_check():
"""Simple health check for monitoring"""
try:
# Check database
db.execute("SELECT 1").fetchone()
# Check file system
os.path.exists(DATA_PATH)
return {"status": "ok"}, 200
except Exception as e:
return {"status": "error", "detail": str(e)}, 500
Migration and Evolution
V1 to V2 Migration
Database Schema Changes
-- Add new column with default
ALTER TABLE notes ADD COLUMN tags TEXT DEFAULT '';
-- Create new table
CREATE TABLE tags (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE NOT NULL
);
-- Migration script updates existing notes
File Format Evolution
V1: Pure markdown V2 (if needed): Add optional frontmatter
---
tags: indieweb, cms
---
Note content here
Backward Compatibility: Parser checks for frontmatter, falls back to pure markdown.
API Versioning
# V1 (current)
GET /api/notes
# V2 (future)
GET /api/v2/notes # New features
GET /api/notes # Still works, returns V1 response
Data Export/Import
Export Formats
- Markdown Bundle: Zip of all notes (already portable)
- JSON Export: Notes + metadata
{ "version": "1.0", "exported_at": "2024-11-18T12:00:00Z", "notes": [ { "slug": "my-note", "content": "Note content...", "created_at": "2024-11-01T12:00:00Z", "published": true } ] } - RSS Archive: Existing feed.xml
Import (V2)
- From JSON export
- From WordPress XML
- From markdown directory
- From other IndieWeb CMSs
Success Metrics
The architecture is successful if it enables:
- Fast Development: < 1 week to implement V1
- Easy Deployment: < 5 minutes to get running
- Low Maintenance: Runs for months without intervention
- High Performance: All responses < 300ms
- Data Ownership: User has direct access to all content
- Standards Compliance: Passes all validators
- Extensibility: Can add V2 features without rewrite
References
Internal Documentation
- Technology Stack
- ADR-001: Python Web Framework
- ADR-002: Flask Extensions
- ADR-003: Frontend Technology
- ADR-004: File-Based Storage
- ADR-005: IndieLogin Authentication