Files
StarPunk/docs/architecture/overview.md
2025-11-18 19:21:31 -07:00

30 KiB

StarPunk Architecture Overview

Executive Summary

StarPunk is a minimal, single-user IndieWeb CMS designed around the principle: "Every line of code must justify its existence." The architecture prioritizes simplicity, standards compliance, and user data ownership through careful technology selection and hybrid data storage.

Core Architecture: API-first Flask application with hybrid file+database storage, server-side rendering, and delegated authentication.

System Architecture

High-Level Components

┌─────────────────────────────────────────────────────────────┐
│                         User Browser                         │
└───────────────┬─────────────────────────────────────────────┘
                │
                │ HTTP/HTTPS
                ↓
┌─────────────────────────────────────────────────────────────┐
│                      Flask Application                       │
│  ┌─────────────────────────────────────────────────────────┤
│  │ Web Interface (Jinja2 Templates)                         │
│  │  - Public: Homepage, Note Permalinks                     │
│  │  - Admin: Dashboard, Note Editor                         │
│  └──────────────────────────────┬──────────────────────────┘
│  ┌──────────────────────────────┴──────────────────────────┐
│  │ API Layer (RESTful + Micropub)                           │
│  │  - Notes CRUD API                                        │
│  │  - Micropub Endpoint                                     │
│  │  - RSS Feed Generator                                    │
│  │  - Authentication Handlers                               │
│  └──────────────────────────────┬──────────────────────────┘
│  ┌──────────────────────────────┴──────────────────────────┐
│  │ Business Logic                                           │
│  │  - Note Management (create, read, update, delete)        │
│  │  - File/Database Sync                                    │
│  │  - Markdown Rendering                                    │
│  │  - Slug Generation                                       │
│  │  - Session Management                                    │
│  └──────────────────────────────┬──────────────────────────┘
│  ┌──────────────────────────────┴──────────────────────────┐
│  │ Data Layer                                               │
│  │  ┌──────────────────┐    ┌─────────────────────────┐   │
│  │  │ File Storage     │    │ SQLite Database         │   │
│  │  │                  │    │                         │   │
│  │  │ Markdown Files   │    │ - Note Metadata         │   │
│  │  │ (Pure Content)   │    │ - Sessions              │   │
│  │  │                  │    │ - Tokens                │   │
│  │  │ data/notes/      │    │ - Auth State            │   │
│  │  │   YYYY/MM/       │    │                         │   │
│  │  │     slug.md      │    │ data/starpunk.db        │   │
│  │  └──────────────────┘    └─────────────────────────┘   │
│  └─────────────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────────────┘
                │
                │ HTTPS
                ↓
┌─────────────────────────────────────────────────────────────┐
│               External Services                              │
│  - IndieLogin.com (Authentication)                           │
│  - User's Website (Identity Verification)                    │
│  - Micropub Clients (Publishing)                             │
└─────────────────────────────────────────────────────────────┘

Core Principles

1. Radical Simplicity

  • Total dependencies: 6 direct packages
  • No build tools, no npm, no bundlers
  • Server-side rendering eliminates frontend complexity
  • Single file SQLite database
  • Zero configuration frameworks

2. Hybrid Data Architecture

Files for Content: Markdown notes stored as plain text files

  • Maximum portability
  • Human-readable
  • Direct user access
  • Easy backup (copy, rsync, git)

Database for Metadata: SQLite stores structured data

  • Fast queries and indexes
  • Referential integrity
  • Efficient filtering and sorting
  • Transaction support

Sync Strategy: Files are authoritative for content; database is authoritative for metadata. Both must stay in sync.

3. Standards-First Design

  • IndieWeb: Microformats2, IndieAuth, Micropub
  • Web: HTML5, RSS 2.0, HTTP standards
  • Security: OAuth 2.0, HTTPS, secure cookies
  • Data: CommonMark markdown

4. API-First Architecture

All functionality exposed via API, web interface consumes API. This enables:

  • Micropub client support
  • Future client applications
  • Scriptable automation
  • Clean separation of concerns

5. Progressive Enhancement

  • Core functionality works without JavaScript
  • JavaScript adds optional enhancements (markdown preview)
  • Server-side rendering for fast initial loads
  • Mobile-responsive from the start

Component Descriptions

Web Layer

Public Interface

Purpose: Display published notes to the world Technology: Server-side rendered HTML (Jinja2) Routes:

  • / - Homepage with recent notes
  • /note/{slug} - Individual note permalink
  • /feed.xml - RSS feed

Features:

  • Microformats2 markup (h-entry, h-card)
  • Reverse chronological note list
  • Clean, minimal design
  • Mobile-responsive
  • No JavaScript required

Admin Interface

Purpose: Manage notes (create, edit, publish) Technology: Server-side rendered HTML (Jinja2) + optional vanilla JS Routes:

  • /admin/login - Authentication
  • /admin - Dashboard (list of all notes)
  • /admin/new - Create new note
  • /admin/edit/{id} - Edit existing note

Features:

  • Markdown editor
  • Optional real-time preview (JS enhancement)
  • Publish/draft toggle
  • Protected by session authentication

API Layer

Notes API

Purpose: CRUD operations for notes Authentication: Session-based (admin interface) Routes:

GET    /api/notes           List published notes
POST   /api/notes           Create new note
GET    /api/notes/{id}      Get single note
PUT    /api/notes/{id}      Update note
DELETE /api/notes/{id}      Delete note

Response Format: JSON

Micropub Endpoint

Purpose: Accept posts from external Micropub clients Authentication: IndieAuth bearer tokens Routes:

POST /api/micropub          Create note (h-entry)
GET  /api/micropub?q=config Query configuration
GET  /api/micropub?q=source Query note source

Content Types:

  • application/json
  • application/x-www-form-urlencoded

Compliance: Full Micropub specification

RSS Feed

Purpose: Syndicate published notes Technology: feedgen library Route: /feed.xml Format: Valid RSS 2.0 XML Caching: 5 minutes Features:

  • All published notes
  • RFC-822 date formatting
  • CDATA-wrapped HTML content
  • Proper GUID for each item

Business Logic Layer

Note Management

Operations:

  1. Create: Generate slug → write file → insert database record
  2. Read: Query database for path → read file → render markdown
  3. Update: Write file atomically → update database timestamp
  4. Delete: Mark deleted in database → optionally archive file

Key Components:

  • Slug generation (URL-safe, unique)
  • Markdown rendering (markdown library)
  • Content hashing (integrity verification)
  • Atomic file operations (prevent corruption)

File/Database Sync

Strategy: Write files first, then database Rollback: If database operation fails, delete/restore file Verification: Content hash detects external modifications Integrity Check: Optional scan for orphaned files/records

Authentication

Admin Auth: IndieLogin.com OAuth 2.0 flow

  • User enters website URL
  • Redirect to indielogin.com
  • Verify identity via RelMeAuth or email
  • Return verified "me" URL
  • Create session token
  • Store in HttpOnly cookie

Micropub Auth: IndieAuth token verification

  • Client obtains token via IndieAuth flow
  • Token sent as Bearer in Authorization header
  • Verify token exists and not expired
  • Check scope permissions

Data Layer

File Storage

Location: data/notes/ Structure: YYYY/MM/slug.md Format: Pure markdown, no frontmatter Operations:

  • Atomic writes (temp file → rename)
  • Directory creation (makedirs)
  • Content reading (UTF-8 encoding)

Example:

data/notes/
├── 2024/
│   ├── 11/
│   │   ├── my-first-note.md
│   │   └── another-note.md
│   └── 12/
│       └── december-note.md

Database Storage

Location: data/starpunk.db Engine: SQLite3 Tables:

  • notes - Metadata (slug, file_path, published, timestamps, hash)
  • sessions - Auth sessions (token, me, expiry)
  • tokens - Micropub tokens (token, me, client_id, scope)
  • auth_state - CSRF tokens (state, expiry)

Indexes:

  • notes.created_at (DESC) - Fast chronological queries
  • notes.published - Fast filtering
  • notes.slug - Fast lookup by slug
  • sessions.session_token - Fast auth checks

Queries: Direct SQL using Python sqlite3 module (no ORM)

Data Flow Examples

Creating a Note (via Admin Interface)

1. User fills out form at /admin/new
   ↓
2. POST to /api/notes with markdown content
   ↓
3. Verify user session (check session cookie)
   ↓
4. Generate unique slug from content or timestamp
   ↓
5. Determine file path: data/notes/2024/11/slug.md
   ↓
6. Create directories if needed (makedirs)
   ↓
7. Write markdown content to file (atomic write)
   ↓
8. Calculate SHA-256 hash of content
   ↓
9. Begin database transaction
   ↓
10. Insert record into notes table:
    - slug
    - file_path
    - published (from form)
    - created_at (now)
    - updated_at (now)
    - content_hash
   ↓
11. If database insert fails:
    - Delete file
    - Return error to user
   ↓
12. If database insert succeeds:
    - Commit transaction
    - Return success with note URL
   ↓
13. Redirect user to /admin (dashboard)

Reading a Note (via Public Interface)

1. User visits /note/my-first-note
   ↓
2. Extract slug from URL
   ↓
3. Query database:
    SELECT file_path, created_at, published
    FROM notes
    WHERE slug = 'my-first-note' AND published = 1
   ↓
4. If not found → 404 error
   ↓
5. Read markdown content from file:
    - Open data/notes/2024/11/my-first-note.md
    - Read UTF-8 content
   ↓
6. Render markdown to HTML (markdown.markdown())
   ↓
7. Render Jinja2 template with:
    - content_html (rendered HTML)
    - created_at (timestamp)
    - slug (for permalink)
   ↓
8. Return HTML with microformats markup

Publishing via Micropub

1. Micropub client POSTs to /api/micropub
   Headers: Authorization: Bearer {token}
   Body: {"type": ["h-entry"], "properties": {"content": ["..."]}}
   ↓
2. Extract bearer token from Authorization header
   ↓
3. Query database:
    SELECT me, scope FROM tokens
    WHERE token = {token} AND expires_at > now()
   ↓
4. If token invalid → 401 Unauthorized
   ↓
5. Parse Micropub JSON payload
   ↓
6. Extract content from properties.content[0]
   ↓
7. Create note (same flow as admin interface):
    - Generate slug
    - Write file
    - Insert database record
   ↓
8. If successful:
    - Return 201 Created
    - Set Location header to note URL
   ↓
9. Client receives note URL, displays success

IndieLogin Authentication Flow

1. User visits /admin/login
   ↓
2. User enters their website: https://alice.example.com
   ↓
3. POST to /admin/login with "me" parameter
   ↓
4. Validate URL format
   ↓
5. Generate random state token (CSRF protection)
   ↓
6. Store state in database with 5-minute expiry
   ↓
7. Build IndieLogin authorization URL:
    https://indielogin.com/auth?
      me=https://alice.example.com
      client_id=https://starpunk.example.com
      redirect_uri=https://starpunk.example.com/auth/callback
      state={random_state}
   ↓
8. Redirect user to IndieLogin
   ↓
9. IndieLogin verifies user's identity:
    - Checks rel="me" links on alice.example.com
    - Or sends email verification
    - User authenticates via chosen method
   ↓
10. IndieLogin redirects back:
    /auth/callback?code={auth_code}&state={state}
   ↓
11. Verify state matches stored value (CSRF check)
   ↓
12. Exchange code for verified identity:
    POST https://indielogin.com/auth
      code={auth_code}
      client_id=https://starpunk.example.com
      redirect_uri=https://starpunk.example.com/auth/callback
   ↓
13. IndieLogin returns: {"me": "https://alice.example.com"}
   ↓
14. Verify me == ADMIN_ME (config)
   ↓
15. If match:
    - Generate session token
    - Insert into sessions table
    - Set HttpOnly, Secure cookie
    - Redirect to /admin
   ↓
16. If no match:
    - Return "Unauthorized" error
    - Log attempt

Security Architecture

Authentication Security

Session Management

  • Token Generation: secrets.token_urlsafe(32) (256-bit entropy)
  • Storage: Hash before storing in database
  • Cookies: HttpOnly, Secure, SameSite=Lax
  • Expiry: 30 days, extendable on use
  • Validation: Every protected route checks session

CSRF Protection

  • State Tokens: Random tokens for OAuth flows
  • Expiry: 5 minutes (short-lived)
  • Single-Use: Deleted after verification
  • SameSite: Cookies set to Lax mode

Access Control

  • Admin Routes: Require valid session
  • Micropub Routes: Require valid bearer token
  • Public Routes: No authentication needed
  • Identity Verification: Only ADMIN_ME can authenticate

Input Validation

User Input

  • Markdown: Sanitize to prevent XSS in rendered HTML
  • URLs: Validate format and scheme (https://)
  • Slugs: Alphanumeric + hyphens only
  • JSON: Parse and validate structure
  • File Paths: Prevent directory traversal (validate against base path)

Micropub Payloads

  • Content-Type: Verify matches expected format
  • Required Fields: Validate h-entry structure
  • Size Limits: Prevent DoS via large payloads
  • Scope Verification: Check token has required permissions

Database Security

SQL Injection Prevention

  • Parameterized Queries: Always use parameter substitution
  • No String Interpolation: Never build SQL with f-strings
  • Input Sanitization: Validate before database operations

Example:

# GOOD
cursor.execute("SELECT * FROM notes WHERE slug = ?", (slug,))

# BAD (SQL injection vulnerable)
cursor.execute(f"SELECT * FROM notes WHERE slug = '{slug}'")

Data Integrity

  • Transactions: Use for multi-step operations
  • Constraints: UNIQUE on slugs, file_paths
  • Foreign Keys: Enforce relationships (if applicable)
  • Content Hashing: Detect unauthorized file modifications

Network Security

HTTPS

  • Production Requirement: TLS 1.2+ required
  • Reverse Proxy: Nginx/Caddy handles SSL termination
  • Certificate Validation: Verify SSL certs on outbound requests
  • HSTS: Set Strict-Transport-Security header

Security Headers

# Set on all responses
Content-Security-Policy: default-src 'self'
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Referrer-Policy: strict-origin-when-cross-origin

Rate Limiting

  • Implementation: Reverse proxy (nginx/Caddy)
  • Admin Routes: Stricter limits
  • API Routes: Moderate limits
  • Public Routes: Permissive limits

File System Security

Atomic Operations

# Write to temp file, then atomic rename
temp_path = f"{target_path}.tmp"
with open(temp_path, 'w') as f:
    f.write(content)
os.rename(temp_path, target_path)  # Atomic on POSIX

Path Validation

# Prevent directory traversal
base_path = os.path.abspath(DATA_PATH)
requested_path = os.path.abspath(os.path.join(base_path, user_input))
if not requested_path.startswith(base_path):
    raise SecurityError("Path traversal detected")

File Permissions

  • Data Directory: 700 (owner only)
  • Database File: 600 (owner read/write)
  • Note Files: 600 (owner read/write)
  • Application User: Dedicated non-root user

Performance Considerations

Response Time Targets

  • API Responses: < 100ms (database + file read)
  • Page Renders: < 200ms (template rendering)
  • RSS Feed: < 300ms (query + file reads + XML generation)

Optimization Strategies

Database

  • Indexes: On frequently queried columns (created_at, slug, published)
  • Connection Pooling: Single connection (single-user, no contention)
  • Query Optimization: SELECT only needed columns
  • Prepared Statements: Reuse compiled queries

File System

  • Caching: Consider caching rendered HTML in memory (optional)
  • Directory Structure: Year/Month prevents large directories
  • Atomic Reads: Fast sequential reads, no locking needed

HTTP

  • Static Assets: Cache headers on CSS/JS (1 year)
  • RSS Feed: Cache for 5 minutes (Cache-Control)
  • Compression: gzip/brotli via reverse proxy
  • ETags: For conditional requests

Rendering

  • Template Compilation: Jinja2 compiles templates automatically
  • Minimal Templating: Simple templates render fast
  • Server-Side: No client-side rendering overhead

Resource Usage

Memory

  • Flask Process: ~50MB base
  • SQLite: ~10MB typical working set
  • Total: < 100MB under normal load

Disk

  • Application: ~5MB (code + dependencies)
  • Database: ~1MB per 1000 notes
  • Notes: ~5KB average per markdown file
  • Total: Scales linearly with note count

CPU

  • Idle: Near zero
  • Request Handling: Minimal (no heavy processing)
  • Markdown Rendering: Fast (pure Python)
  • Database Queries: Indexed, sub-millisecond

Deployment Architecture

Single-Server Deployment

┌─────────────────────────────────────────────────┐
│ Internet                                        │
└────────────────┬────────────────────────────────┘
                 │
                 │ Port 443 (HTTPS)
                 ↓
┌─────────────────────────────────────────────────┐
│ Nginx/Caddy (Reverse Proxy)                     │
│  - SSL/TLS termination                          │
│  - Static file serving                          │
│  - Rate limiting                                │
│  - Compression                                  │
└────────────────┬────────────────────────────────┘
                 │
                 │ Port 8000 (HTTP)
                 ↓
┌─────────────────────────────────────────────────┐
│ Gunicorn (WSGI Server)                          │
│  - 4 worker processes                           │
│  - Process management                           │
│  - Load balancing (round-robin)                 │
└────────────────┬────────────────────────────────┘
                 │
                 │ WSGI
                 ↓
┌─────────────────────────────────────────────────┐
│ Flask Application                               │
│  - Request handling                             │
│  - Business logic                               │
│  - Template rendering                           │
└────────────────┬────────────────────────────────┘
                 │
                 ↓
┌────────────────────────────┬────────────────────┐
│ File System                │ SQLite Database    │
│  data/notes/               │  data/starpunk.db  │
│    YYYY/MM/slug.md         │                    │
└────────────────────────────┴────────────────────┘

Process Management (systemd)

[Unit]
Description=StarPunk CMS
After=network.target

[Service]
Type=notify
User=starpunk
WorkingDirectory=/opt/starpunk
Environment="PATH=/opt/starpunk/venv/bin"
ExecStart=/opt/starpunk/venv/bin/gunicorn -w 4 -b 127.0.0.1:8000 app:app
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Backup Strategy

Automated Daily Backup

#!/bin/bash
# backup.sh - Run daily via cron

DATE=$(date +%Y%m%d)
BACKUP_DIR="/backup/starpunk"

# Backup data directory (notes + database)
rsync -av /opt/starpunk/data/ "$BACKUP_DIR/$DATE/"

# Keep last 30 days
find "$BACKUP_DIR" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} \;

Manual Backup

# Simple copy
cp -r /opt/starpunk/data /backup/starpunk-$(date +%Y%m%d)

# Or with compression
tar -czf starpunk-backup-$(date +%Y%m%d).tar.gz /opt/starpunk/data

Restore Process

  1. Stop application: sudo systemctl stop starpunk
  2. Restore data directory: rsync -av /backup/starpunk/20241118/ /opt/starpunk/data/
  3. Fix permissions: chown -R starpunk:starpunk /opt/starpunk/data
  4. Start application: sudo systemctl start starpunk
  5. Verify: Visit site, check recent notes

Testing Strategy

Test Pyramid

           ┌─────────────┐
          /               \
         /   Manual Tests  \      Validation, Real Services
        /─────────────────  \
       /                     \
      /  Integration Tests    \   API Flows, Database + Files
     /───────────────────────  \
    /                           \
   /        Unit Tests            \  Functions, Logic, Parsing
  /───────────────────────────────\

Unit Tests (pytest)

Coverage: Business logic, utilities, models Examples:

  • Slug generation and uniqueness
  • Markdown rendering with various inputs
  • Content hash calculation
  • File path validation
  • Token generation and verification
  • Date formatting for RSS
  • Micropub payload parsing

Integration Tests

Coverage: Component interactions, full flows Examples:

  • Create note: file write + database insert
  • Read note: database query + file read
  • IndieLogin flow with mocked API
  • Micropub creation with token validation
  • RSS feed generation with multiple notes
  • Session authentication on protected routes

End-to-End Tests

Coverage: Full user workflows Examples:

  • Admin login via IndieLogin (mocked)
  • Create note via web interface
  • Publish note via Micropub client (mocked)
  • View note on public site
  • Verify RSS feed includes note

Validation Tests

Coverage: Standards compliance Tools:

  • W3C HTML Validator (validate templates)
  • W3C Feed Validator (validate RSS output)
  • IndieWebify.me (verify microformats)
  • Micropub.rocks (test Micropub compliance)

Manual Tests

Coverage: Real-world usage Examples:

  • Authenticate with real indielogin.com
  • Publish from actual Micropub client (Quill, Indigenous)
  • Subscribe to feed in actual RSS reader
  • Browser compatibility (Chrome, Firefox, Safari, mobile)
  • Accessibility with screen reader

Monitoring and Observability

Logging Strategy

Application Logs

# Structured logging
import logging

logger = logging.getLogger(__name__)

# Info: Normal operations
logger.info("Note created", extra={
    "slug": slug,
    "published": published,
    "user": session.me
})

# Warning: Recoverable issues
logger.warning("State token expired", extra={
    "state": state,
    "age": age_seconds
})

# Error: Failed operations
logger.error("File write failed", extra={
    "path": file_path,
    "error": str(e)
})

Log Levels

  • DEBUG: Development only (verbose)
  • INFO: Normal operations (note creation, auth success)
  • WARNING: Unusual but handled (expired tokens, invalid input)
  • ERROR: Failed operations (file I/O errors, database errors)
  • CRITICAL: System failures (database unreachable)

Log Destinations

  • Development: Console (stdout)
  • Production: File rotation (logrotate) + optional syslog

Metrics (Optional for V2)

Simple Metrics (if desired):

  • Note count (query database)
  • Request count (nginx logs)
  • Error rate (grep application logs)
  • Response times (nginx logs)

Advanced Metrics (V2):

  • Prometheus exporter
  • Grafana dashboard
  • Alert on error rate spike

Health Checks

@app.route('/health')
def health_check():
    """Simple health check for monitoring"""
    try:
        # Check database
        db.execute("SELECT 1").fetchone()

        # Check file system
        os.path.exists(DATA_PATH)

        return {"status": "ok"}, 200
    except Exception as e:
        return {"status": "error", "detail": str(e)}, 500

Migration and Evolution

V1 to V2 Migration

Database Schema Changes

-- Add new column with default
ALTER TABLE notes ADD COLUMN tags TEXT DEFAULT '';

-- Create new table
CREATE TABLE tags (
    id INTEGER PRIMARY KEY,
    name TEXT UNIQUE NOT NULL
);

-- Migration script updates existing notes

File Format Evolution

V1: Pure markdown V2 (if needed): Add optional frontmatter

---
tags: indieweb, cms
---
Note content here

Backward Compatibility: Parser checks for frontmatter, falls back to pure markdown.

API Versioning

# V1 (current)
GET /api/notes

# V2 (future)
GET /api/v2/notes  # New features
GET /api/notes     # Still works, returns V1 response

Data Export/Import

Export Formats

  1. Markdown Bundle: Zip of all notes (already portable)
  2. JSON Export: Notes + metadata
    {
      "version": "1.0",
      "exported_at": "2024-11-18T12:00:00Z",
      "notes": [
        {
          "slug": "my-note",
          "content": "Note content...",
          "created_at": "2024-11-01T12:00:00Z",
          "published": true
        }
      ]
    }
    
  3. RSS Archive: Existing feed.xml

Import (V2)

  • From JSON export
  • From WordPress XML
  • From markdown directory
  • From other IndieWeb CMSs

Success Metrics

The architecture is successful if it enables:

  1. Fast Development: < 1 week to implement V1
  2. Easy Deployment: < 5 minutes to get running
  3. Low Maintenance: Runs for months without intervention
  4. High Performance: All responses < 300ms
  5. Data Ownership: User has direct access to all content
  6. Standards Compliance: Passes all validators
  7. Extensibility: Can add V2 features without rewrite

References

Internal Documentation

External Standards