Files

Phil Skentelbery a68fd570c7 that initial commit

2025-11-18 19:21:31 -07:00

30 KiB

Raw Blame History

StarPunk Architecture Overview

Executive Summary

StarPunk is a minimal, single-user IndieWeb CMS designed around the principle: "Every line of code must justify its existence." The architecture prioritizes simplicity, standards compliance, and user data ownership through careful technology selection and hybrid data storage.

Core Architecture: API-first Flask application with hybrid file+database storage, server-side rendering, and delegated authentication.

System Architecture

High-Level Components

┌─────────────────────────────────────────────────────────────┐
│                         User Browser                         │
└───────────────┬─────────────────────────────────────────────┘
                │
                │ HTTP/HTTPS
                ↓
┌─────────────────────────────────────────────────────────────┐
│                      Flask Application                       │
│  ┌─────────────────────────────────────────────────────────┤
│  │ Web Interface (Jinja2 Templates)                         │
│  │  - Public: Homepage, Note Permalinks                     │
│  │  - Admin: Dashboard, Note Editor                         │
│  └──────────────────────────────┬──────────────────────────┘
│  ┌──────────────────────────────┴──────────────────────────┐
│  │ API Layer (RESTful + Micropub)                           │
│  │  - Notes CRUD API                                        │
│  │  - Micropub Endpoint                                     │
│  │  - RSS Feed Generator                                    │
│  │  - Authentication Handlers                               │
│  └──────────────────────────────┬──────────────────────────┘
│  ┌──────────────────────────────┴──────────────────────────┐
│  │ Business Logic                                           │
│  │  - Note Management (create, read, update, delete)        │
│  │  - File/Database Sync                                    │
│  │  - Markdown Rendering                                    │
│  │  - Slug Generation                                       │
│  │  - Session Management                                    │
│  └──────────────────────────────┬──────────────────────────┘
│  ┌──────────────────────────────┴──────────────────────────┐
│  │ Data Layer                                               │
│  │  ┌──────────────────┐    ┌─────────────────────────┐   │
│  │  │ File Storage     │    │ SQLite Database         │   │
│  │  │                  │    │                         │   │
│  │  │ Markdown Files   │    │ - Note Metadata         │   │
│  │  │ (Pure Content)   │    │ - Sessions              │   │
│  │  │                  │    │ - Tokens                │   │
│  │  │ data/notes/      │    │ - Auth State            │   │
│  │  │   YYYY/MM/       │    │                         │   │
│  │  │     slug.md      │    │ data/starpunk.db        │   │
│  │  └──────────────────┘    └─────────────────────────┘   │
│  └─────────────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────────────┘
                │
                │ HTTPS
                ↓
┌─────────────────────────────────────────────────────────────┐
│               External Services                              │
│  - IndieLogin.com (Authentication)                           │
│  - User's Website (Identity Verification)                    │
│  - Micropub Clients (Publishing)                             │
└─────────────────────────────────────────────────────────────┘

Core Principles

1. Radical Simplicity

Total dependencies: 6 direct packages
No build tools, no npm, no bundlers
Server-side rendering eliminates frontend complexity
Single file SQLite database
Zero configuration frameworks

2. Hybrid Data Architecture

Files for Content: Markdown notes stored as plain text files

Maximum portability
Human-readable
Direct user access
Easy backup (copy, rsync, git)

Database for Metadata: SQLite stores structured data

Fast queries and indexes
Referential integrity
Efficient filtering and sorting
Transaction support

Sync Strategy: Files are authoritative for content; database is authoritative for metadata. Both must stay in sync.

3. Standards-First Design

IndieWeb: Microformats2, IndieAuth, Micropub
Web: HTML5, RSS 2.0, HTTP standards
Security: OAuth 2.0, HTTPS, secure cookies
Data: CommonMark markdown

4. API-First Architecture

All functionality exposed via API, web interface consumes API. This enables:

Micropub client support
Future client applications
Scriptable automation
Clean separation of concerns

5. Progressive Enhancement

Core functionality works without JavaScript
JavaScript adds optional enhancements (markdown preview)
Server-side rendering for fast initial loads
Mobile-responsive from the start

Component Descriptions

Web Layer

Public Interface

Purpose: Display published notes to the world Technology: Server-side rendered HTML (Jinja2) Routes:

/ - Homepage with recent notes
/note/{slug} - Individual note permalink
/feed.xml - RSS feed

Features:

Microformats2 markup (h-entry, h-card)
Reverse chronological note list
Clean, minimal design
Mobile-responsive
No JavaScript required

Admin Interface

Purpose: Manage notes (create, edit, publish) Technology: Server-side rendered HTML (Jinja2) + optional vanilla JS Routes:

/admin/login - Authentication
/admin - Dashboard (list of all notes)
/admin/new - Create new note
/admin/edit/{id} - Edit existing note

Features:

Markdown editor
Optional real-time preview (JS enhancement)
Publish/draft toggle
Protected by session authentication

API Layer

Notes API

Purpose: CRUD operations for notes Authentication: Session-based (admin interface) Routes:

GET    /api/notes           List published notes
POST   /api/notes           Create new note
GET    /api/notes/{id}      Get single note
PUT    /api/notes/{id}      Update note
DELETE /api/notes/{id}      Delete note

Response Format: JSON

Micropub Endpoint

Purpose: Accept posts from external Micropub clients Authentication: IndieAuth bearer tokens Routes:

POST /api/micropub          Create note (h-entry)
GET  /api/micropub?q=config Query configuration
GET  /api/micropub?q=source Query note source

Content Types:

application/json
application/x-www-form-urlencoded

Compliance: Full Micropub specification

RSS Feed

Purpose: Syndicate published notes Technology: feedgen library Route: /feed.xml Format: Valid RSS 2.0 XML Caching: 5 minutes Features:

All published notes
RFC-822 date formatting
CDATA-wrapped HTML content
Proper GUID for each item

Business Logic Layer

Note Management

Operations:

Create: Generate slug → write file → insert database record
Read: Query database for path → read file → render markdown
Update: Write file atomically → update database timestamp
Delete: Mark deleted in database → optionally archive file

Key Components:

Slug generation (URL-safe, unique)
Markdown rendering (markdown library)
Content hashing (integrity verification)
Atomic file operations (prevent corruption)

File/Database Sync

Strategy: Write files first, then database Rollback: If database operation fails, delete/restore file Verification: Content hash detects external modifications Integrity Check: Optional scan for orphaned files/records

Authentication

Admin Auth: IndieLogin.com OAuth 2.0 flow

User enters website URL
Redirect to indielogin.com
Verify identity via RelMeAuth or email
Return verified "me" URL
Create session token
Store in HttpOnly cookie

Micropub Auth: IndieAuth token verification

Client obtains token via IndieAuth flow
Token sent as Bearer in Authorization header
Verify token exists and not expired
Check scope permissions

Data Layer

File Storage

Location: data/notes/ Structure: YYYY/MM/slug.md Format: Pure markdown, no frontmatter Operations:

Atomic writes (temp file → rename)
Directory creation (makedirs)
Content reading (UTF-8 encoding)

Example:

data/notes/
├── 2024/
│   ├── 11/
│   │   ├── my-first-note.md
│   │   └── another-note.md
│   └── 12/
│       └── december-note.md

Database Storage

Location: data/starpunk.db Engine: SQLite3 Tables:

notes - Metadata (slug, file_path, published, timestamps, hash)
sessions - Auth sessions (token, me, expiry)
tokens - Micropub tokens (token, me, client_id, scope)
auth_state - CSRF tokens (state, expiry)

Indexes:

notes.created_at (DESC) - Fast chronological queries
notes.published - Fast filtering
notes.slug - Fast lookup by slug
sessions.session_token - Fast auth checks

Queries: Direct SQL using Python sqlite3 module (no ORM)

Data Flow Examples

Creating a Note (via Admin Interface)

1. User fills out form at /admin/new
   ↓
2. POST to /api/notes with markdown content
   ↓
3. Verify user session (check session cookie)
   ↓
4. Generate unique slug from content or timestamp
   ↓
5. Determine file path: data/notes/2024/11/slug.md
   ↓
6. Create directories if needed (makedirs)
   ↓
7. Write markdown content to file (atomic write)
   ↓
8. Calculate SHA-256 hash of content
   ↓
9. Begin database transaction
   ↓
10. Insert record into notes table:
    - slug
    - file_path
    - published (from form)
    - created_at (now)
    - updated_at (now)
    - content_hash
   ↓
11. If database insert fails:
    - Delete file
    - Return error to user
   ↓
12. If database insert succeeds:
    - Commit transaction
    - Return success with note URL
   ↓
13. Redirect user to /admin (dashboard)

Reading a Note (via Public Interface)

1. User visits /note/my-first-note
   ↓
2. Extract slug from URL
   ↓
3. Query database:
    SELECT file_path, created_at, published
    FROM notes
    WHERE slug = 'my-first-note' AND published = 1
   ↓
4. If not found → 404 error
   ↓
5. Read markdown content from file:
    - Open data/notes/2024/11/my-first-note.md
    - Read UTF-8 content
   ↓
6. Render markdown to HTML (markdown.markdown())
   ↓
7. Render Jinja2 template with:
    - content_html (rendered HTML)
    - created_at (timestamp)
    - slug (for permalink)
   ↓
8. Return HTML with microformats markup

Publishing via Micropub

1. Micropub client POSTs to /api/micropub
   Headers: Authorization: Bearer {token}
   Body: {"type": ["h-entry"], "properties": {"content": ["..."]}}
   ↓
2. Extract bearer token from Authorization header
   ↓
3. Query database:
    SELECT me, scope FROM tokens
    WHERE token = {token} AND expires_at > now()
   ↓
4. If token invalid → 401 Unauthorized
   ↓
5. Parse Micropub JSON payload
   ↓
6. Extract content from properties.content[0]
   ↓
7. Create note (same flow as admin interface):
    - Generate slug
    - Write file
    - Insert database record
   ↓
8. If successful:
    - Return 201 Created
    - Set Location header to note URL
   ↓
9. Client receives note URL, displays success

IndieLogin Authentication Flow

1. User visits /admin/login
   ↓
2. User enters their website: https://alice.example.com
   ↓
3. POST to /admin/login with "me" parameter
   ↓
4. Validate URL format
   ↓
5. Generate random state token (CSRF protection)
   ↓
6. Store state in database with 5-minute expiry
   ↓
7. Build IndieLogin authorization URL:
    https://indielogin.com/auth?
      me=https://alice.example.com
      client_id=https://starpunk.example.com
      redirect_uri=https://starpunk.example.com/auth/callback
      state={random_state}
   ↓
8. Redirect user to IndieLogin
   ↓
9. IndieLogin verifies user's identity:
    - Checks rel="me" links on alice.example.com
    - Or sends email verification
    - User authenticates via chosen method
   ↓
10. IndieLogin redirects back:
    /auth/callback?code={auth_code}&state={state}
   ↓
11. Verify state matches stored value (CSRF check)
   ↓
12. Exchange code for verified identity:
    POST https://indielogin.com/auth
      code={auth_code}
      client_id=https://starpunk.example.com
      redirect_uri=https://starpunk.example.com/auth/callback
   ↓
13. IndieLogin returns: {"me": "https://alice.example.com"}
   ↓
14. Verify me == ADMIN_ME (config)
   ↓
15. If match:
    - Generate session token
    - Insert into sessions table
    - Set HttpOnly, Secure cookie
    - Redirect to /admin
   ↓
16. If no match:
    - Return "Unauthorized" error
    - Log attempt

Security Architecture

Authentication Security

Session Management

Token Generation: secrets.token_urlsafe(32) (256-bit entropy)
Storage: Hash before storing in database
Cookies: HttpOnly, Secure, SameSite=Lax
Expiry: 30 days, extendable on use
Validation: Every protected route checks session

CSRF Protection

State Tokens: Random tokens for OAuth flows
Expiry: 5 minutes (short-lived)
Single-Use: Deleted after verification
SameSite: Cookies set to Lax mode

Access Control

Admin Routes: Require valid session
Micropub Routes: Require valid bearer token
Public Routes: No authentication needed
Identity Verification: Only ADMIN_ME can authenticate

Input Validation

User Input

Markdown: Sanitize to prevent XSS in rendered HTML
URLs: Validate format and scheme (https://)
Slugs: Alphanumeric + hyphens only
JSON: Parse and validate structure
File Paths: Prevent directory traversal (validate against base path)

Micropub Payloads

Content-Type: Verify matches expected format
Required Fields: Validate h-entry structure
Size Limits: Prevent DoS via large payloads
Scope Verification: Check token has required permissions

Database Security

SQL Injection Prevention

Parameterized Queries: Always use parameter substitution
No String Interpolation: Never build SQL with f-strings
Input Sanitization: Validate before database operations

Example:

# GOOD
cursor.execute("SELECT * FROM notes WHERE slug = ?", (slug,))

# BAD (SQL injection vulnerable)
cursor.execute(f"SELECT * FROM notes WHERE slug = '{slug}'")

Data Integrity

Transactions: Use for multi-step operations
Constraints: UNIQUE on slugs, file_paths
Foreign Keys: Enforce relationships (if applicable)
Content Hashing: Detect unauthorized file modifications

Network Security

HTTPS

Production Requirement: TLS 1.2+ required
Reverse Proxy: Nginx/Caddy handles SSL termination
Certificate Validation: Verify SSL certs on outbound requests
HSTS: Set Strict-Transport-Security header

Security Headers

# Set on all responses
Content-Security-Policy: default-src 'self'
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Referrer-Policy: strict-origin-when-cross-origin

Rate Limiting

Implementation: Reverse proxy (nginx/Caddy)
Admin Routes: Stricter limits
API Routes: Moderate limits
Public Routes: Permissive limits

File System Security

Atomic Operations

# Write to temp file, then atomic rename
temp_path = f"{target_path}.tmp"
with open(temp_path, 'w') as f:
    f.write(content)
os.rename(temp_path, target_path)  # Atomic on POSIX

Path Validation

# Prevent directory traversal
base_path = os.path.abspath(DATA_PATH)
requested_path = os.path.abspath(os.path.join(base_path, user_input))
if not requested_path.startswith(base_path):
    raise SecurityError("Path traversal detected")

File Permissions

Data Directory: 700 (owner only)
Database File: 600 (owner read/write)
Note Files: 600 (owner read/write)
Application User: Dedicated non-root user

Performance Considerations

Response Time Targets

API Responses: < 100ms (database + file read)
Page Renders: < 200ms (template rendering)
RSS Feed: < 300ms (query + file reads + XML generation)

Optimization Strategies

Database

Indexes: On frequently queried columns (created_at, slug, published)
Connection Pooling: Single connection (single-user, no contention)
Query Optimization: SELECT only needed columns
Prepared Statements: Reuse compiled queries

File System

Caching: Consider caching rendered HTML in memory (optional)
Directory Structure: Year/Month prevents large directories
Atomic Reads: Fast sequential reads, no locking needed

HTTP

Static Assets: Cache headers on CSS/JS (1 year)
RSS Feed: Cache for 5 minutes (Cache-Control)
Compression: gzip/brotli via reverse proxy
ETags: For conditional requests

Rendering

Template Compilation: Jinja2 compiles templates automatically
Minimal Templating: Simple templates render fast
Server-Side: No client-side rendering overhead

Resource Usage

Memory

Flask Process: ~50MB base
SQLite: ~10MB typical working set
Total: < 100MB under normal load

Disk

Application: ~5MB (code + dependencies)
Database: ~1MB per 1000 notes
Notes: ~5KB average per markdown file
Total: Scales linearly with note count

CPU

Idle: Near zero
Request Handling: Minimal (no heavy processing)
Markdown Rendering: Fast (pure Python)
Database Queries: Indexed, sub-millisecond

Deployment Architecture

Single-Server Deployment

┌─────────────────────────────────────────────────┐
│ Internet                                        │
└────────────────┬────────────────────────────────┘
                 │
                 │ Port 443 (HTTPS)
                 ↓
┌─────────────────────────────────────────────────┐
│ Nginx/Caddy (Reverse Proxy)                     │
│  - SSL/TLS termination                          │
│  - Static file serving                          │
│  - Rate limiting                                │
│  - Compression                                  │
└────────────────┬────────────────────────────────┘
                 │
                 │ Port 8000 (HTTP)
                 ↓
┌─────────────────────────────────────────────────┐
│ Gunicorn (WSGI Server)                          │
│  - 4 worker processes                           │
│  - Process management                           │
│  - Load balancing (round-robin)                 │
└────────────────┬────────────────────────────────┘
                 │
                 │ WSGI
                 ↓
┌─────────────────────────────────────────────────┐
│ Flask Application                               │
│  - Request handling                             │
│  - Business logic                               │
│  - Template rendering                           │
└────────────────┬────────────────────────────────┘
                 │
                 ↓
┌────────────────────────────┬────────────────────┐
│ File System                │ SQLite Database    │
│  data/notes/               │  data/starpunk.db  │
│    YYYY/MM/slug.md         │                    │
└────────────────────────────┴────────────────────┘

Process Management (systemd)

[Unit]
Description=StarPunk CMS
After=network.target

[Service]
Type=notify
User=starpunk
WorkingDirectory=/opt/starpunk
Environment="PATH=/opt/starpunk/venv/bin"
ExecStart=/opt/starpunk/venv/bin/gunicorn -w 4 -b 127.0.0.1:8000 app:app
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Backup Strategy

Automated Daily Backup

#!/bin/bash
# backup.sh - Run daily via cron

DATE=$(date +%Y%m%d)
BACKUP_DIR="/backup/starpunk"

# Backup data directory (notes + database)
rsync -av /opt/starpunk/data/ "$BACKUP_DIR/$DATE/"

# Keep last 30 days
find "$BACKUP_DIR" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} \;

Manual Backup

# Simple copy
cp -r /opt/starpunk/data /backup/starpunk-$(date +%Y%m%d)

# Or with compression
tar -czf starpunk-backup-$(date +%Y%m%d).tar.gz /opt/starpunk/data

Restore Process

Stop application: sudo systemctl stop starpunk
Restore data directory: rsync -av /backup/starpunk/20241118/ /opt/starpunk/data/
Fix permissions: chown -R starpunk:starpunk /opt/starpunk/data
Start application: sudo systemctl start starpunk
Verify: Visit site, check recent notes

Testing Strategy

Test Pyramid

           ┌─────────────┐
          /               \
         /   Manual Tests  \      Validation, Real Services
        /─────────────────  \
       /                     \
      /  Integration Tests    \   API Flows, Database + Files
     /───────────────────────  \
    /                           \
   /        Unit Tests            \  Functions, Logic, Parsing
  /───────────────────────────────\

Unit Tests (pytest)

Coverage: Business logic, utilities, models Examples:

Slug generation and uniqueness
Markdown rendering with various inputs
Content hash calculation
File path validation
Token generation and verification
Date formatting for RSS
Micropub payload parsing

Integration Tests

Coverage: Component interactions, full flows Examples:

Create note: file write + database insert
Read note: database query + file read
IndieLogin flow with mocked API
Micropub creation with token validation
RSS feed generation with multiple notes
Session authentication on protected routes

End-to-End Tests

Coverage: Full user workflows Examples:

Admin login via IndieLogin (mocked)
Create note via web interface
Publish note via Micropub client (mocked)
View note on public site
Verify RSS feed includes note

Validation Tests

Coverage: Standards compliance Tools:

W3C HTML Validator (validate templates)
W3C Feed Validator (validate RSS output)
IndieWebify.me (verify microformats)
Micropub.rocks (test Micropub compliance)

Manual Tests

Coverage: Real-world usage Examples:

Authenticate with real indielogin.com
Publish from actual Micropub client (Quill, Indigenous)
Subscribe to feed in actual RSS reader
Browser compatibility (Chrome, Firefox, Safari, mobile)
Accessibility with screen reader

Monitoring and Observability

Logging Strategy

Application Logs

# Structured logging
import logging

logger = logging.getLogger(__name__)

# Info: Normal operations
logger.info("Note created", extra={
    "slug": slug,
    "published": published,
    "user": session.me
})

# Warning: Recoverable issues
logger.warning("State token expired", extra={
    "state": state,
    "age": age_seconds
})

# Error: Failed operations
logger.error("File write failed", extra={
    "path": file_path,
    "error": str(e)
})

Log Levels

DEBUG: Development only (verbose)
INFO: Normal operations (note creation, auth success)
WARNING: Unusual but handled (expired tokens, invalid input)
ERROR: Failed operations (file I/O errors, database errors)
CRITICAL: System failures (database unreachable)

Log Destinations

Development: Console (stdout)
Production: File rotation (logrotate) + optional syslog

Metrics (Optional for V2)

Simple Metrics (if desired):

Note count (query database)
Request count (nginx logs)
Error rate (grep application logs)
Response times (nginx logs)

Advanced Metrics (V2):

Prometheus exporter
Grafana dashboard
Alert on error rate spike

Health Checks

@app.route('/health')
def health_check():
    """Simple health check for monitoring"""
    try:
        # Check database
        db.execute("SELECT 1").fetchone()

        # Check file system
        os.path.exists(DATA_PATH)

        return {"status": "ok"}, 200
    except Exception as e:
        return {"status": "error", "detail": str(e)}, 500

Migration and Evolution

V1 to V2 Migration

Database Schema Changes

-- Add new column with default
ALTER TABLE notes ADD COLUMN tags TEXT DEFAULT '';

-- Create new table
CREATE TABLE tags (
    id INTEGER PRIMARY KEY,
    name TEXT UNIQUE NOT NULL
);

-- Migration script updates existing notes

File Format Evolution

V1: Pure markdown V2 (if needed): Add optional frontmatter

---
tags: indieweb, cms
---
Note content here

Backward Compatibility: Parser checks for frontmatter, falls back to pure markdown.

API Versioning

# V1 (current)
GET /api/notes

# V2 (future)
GET /api/v2/notes  # New features
GET /api/notes     # Still works, returns V1 response

Data Export/Import

Export Formats

Markdown Bundle: Zip of all notes (already portable)

JSON Export: Notes + metadata

{
  "version": "1.0",
  "exported_at": "2024-11-18T12:00:00Z",
  "notes": [
    {
      "slug": "my-note",
      "content": "Note content...",
      "created_at": "2024-11-01T12:00:00Z",
      "published": true
    }
  ]
}

RSS Archive: Existing feed.xml

Import (V2)

From JSON export
From WordPress XML
From markdown directory
From other IndieWeb CMSs

Success Metrics

The architecture is successful if it enables:

Fast Development: < 1 week to implement V1
Easy Deployment: < 5 minutes to get running
Low Maintenance: Runs for months without intervention
High Performance: All responses < 300ms
Data Ownership: User has direct access to all content
Standards Compliance: Passes all validators
Extensibility: Can add V2 features without rewrite

30 KiB Raw Blame History