Files
StarPunk/docs/architecture/overview.md
Phil Skentelbery 800bc1069d docs: Update architecture overview to reflect v0.9.5 implementation
Comprehensively updated docs/architecture/overview.md to document the
actual v0.9.5 implementation instead of aspirational V1 features.

Major Changes:

1. Executive Summary
   - Added version tag (v0.9.5) and status (Pre-V1 Release)
   - Updated tech stack: Python 3.11, uv, Gunicorn, Gitea Actions
   - Added deployment context (container-based, CI/CD)

2. Route Documentation
   - Public routes: Documented actual routes (/, /note/<slug>, /feed.xml, /health)
   - Admin routes: Updated from /admin/* to /auth/* (v0.9.2 change)
   - Added development routes (/dev/login)
   - Clearly marked implemented vs. planned routes

3. API Layer Reality Check
   - Notes API: Marked as NOT IMPLEMENTED (optional, deferred to V2)
   - Micropub endpoint: Marked as NOT IMPLEMENTED (critical V1 blocker)
   - RSS feed: Marked as IMPLEMENTED with full feature list (v0.6.0)

4. Authentication Flow Updates
   - Documented PKCE implementation (v0.8.0)
   - Updated IndieLogin flow to use /authorize endpoint (v0.9.4)
   - Added trailing slash normalization (v0.9.1)
   - Documented session token hashing (SHA-256)
   - Updated cookie name (starpunk_session, v0.5.1)
   - Corrected code verification endpoint usage

5. Database Schema
   - Added schema_migrations table (v0.9.0)
   - Added code_verifier to auth_state (v0.8.0)
   - Documented automatic migration system
   - Added session metadata fields (user_agent, ip_address)
   - Updated indexes for performance

6. Container Deployment (NEW)
   - Multi-stage Containerfile documentation
   - Gunicorn WSGI server configuration
   - Health check endpoint
   - CI/CD pipeline (Gitea Actions)
   - Volume persistence strategy

7. Implementation Status Section (NEW)
   - Comprehensive list of implemented features (v0.3.0-v0.9.5)
   - Clear documentation of unimplemented features
   - Micropub marked as critical V1 blocker
   - Standards validation status (partial)

8. Success Metrics
   - Updated with actual achievements
   - 70% complete toward V1
   - Container deployment working
   - Automated migrations implemented

Security documentation now accurately reflects PKCE implementation,
session token hashing, and correct IndieLogin.com API usage.

All route tables, data flow diagrams, and examples updated to match
v0.9.5 codebase reality.

Related: Architect validation report identified need to update
architecture docs to reflect actual implementation vs. planned features.
2025-11-24 11:03:44 -07:00

39 KiB

StarPunk Architecture Overview

Version: v0.9.5 (2025-11-24) Status: Pre-V1 Release (Micropub endpoint pending)

Executive Summary

StarPunk is a minimal, single-user IndieWeb CMS designed around the principle: "Every line of code must justify its existence." The architecture prioritizes simplicity, standards compliance, and user data ownership through careful technology selection and hybrid data storage.

Core Architecture: Flask web application with hybrid file+database storage, server-side rendering, delegated authentication (IndieLogin.com), and containerized deployment.

Technology Stack: Python 3.11, Flask, SQLite, Jinja2, Gunicorn, uv package manager Deployment: Container-based (Podman/Docker) with automated CI/CD (Gitea Actions) Authentication: IndieAuth via IndieLogin.com with PKCE security

System Architecture

High-Level Components

┌─────────────────────────────────────────────────────────────┐
│                         User Browser                         │
└───────────────┬─────────────────────────────────────────────┘
                │
                │ HTTP/HTTPS
                ↓
┌─────────────────────────────────────────────────────────────┐
│                      Flask Application                       │
│  ┌─────────────────────────────────────────────────────────┤
│  │ Web Interface (Jinja2 Templates)                         │
│  │  - Public: Homepage, Note Permalinks                     │
│  │  - Admin: Dashboard, Note Editor                         │
│  └──────────────────────────────┬──────────────────────────┘
│  ┌──────────────────────────────┴──────────────────────────┐
│  │ API Layer (RESTful + Micropub)                           │
│  │  - Notes CRUD API                                        │
│  │  - Micropub Endpoint                                     │
│  │  - RSS Feed Generator                                    │
│  │  - Authentication Handlers                               │
│  └──────────────────────────────┬──────────────────────────┘
│  ┌──────────────────────────────┴──────────────────────────┐
│  │ Business Logic                                           │
│  │  - Note Management (create, read, update, delete)        │
│  │  - File/Database Sync                                    │
│  │  - Markdown Rendering                                    │
│  │  - Slug Generation                                       │
│  │  - Session Management                                    │
│  └──────────────────────────────┬──────────────────────────┘
│  ┌──────────────────────────────┴──────────────────────────┐
│  │ Data Layer                                               │
│  │  ┌──────────────────┐    ┌─────────────────────────┐   │
│  │  │ File Storage     │    │ SQLite Database         │   │
│  │  │                  │    │                         │   │
│  │  │ Markdown Files   │    │ - Note Metadata         │   │
│  │  │ (Pure Content)   │    │ - Sessions              │   │
│  │  │                  │    │ - Tokens                │   │
│  │  │ data/notes/      │    │ - Auth State            │   │
│  │  │   YYYY/MM/       │    │                         │   │
│  │  │     slug.md      │    │ data/starpunk.db        │   │
│  │  └──────────────────┘    └─────────────────────────┘   │
│  └─────────────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────────────┘
                │
                │ HTTPS
                ↓
┌─────────────────────────────────────────────────────────────┐
│               External Services                              │
│  - IndieLogin.com (Authentication)                           │
│  - User's Website (Identity Verification)                    │
│  - Micropub Clients (Publishing)                             │
└─────────────────────────────────────────────────────────────┘

Core Principles

1. Radical Simplicity

  • Total dependencies: 6 direct packages
  • No build tools, no npm, no bundlers
  • Server-side rendering eliminates frontend complexity
  • Single file SQLite database
  • Zero configuration frameworks

2. Hybrid Data Architecture

Files for Content: Markdown notes stored as plain text files

  • Maximum portability
  • Human-readable
  • Direct user access
  • Easy backup (copy, rsync, git)

Database for Metadata: SQLite stores structured data

  • Fast queries and indexes
  • Referential integrity
  • Efficient filtering and sorting
  • Transaction support

Sync Strategy: Files are authoritative for content; database is authoritative for metadata. Both must stay in sync.

3. Standards-First Design

  • IndieWeb: Microformats2, IndieAuth, Micropub
  • Web: HTML5, RSS 2.0, HTTP standards
  • Security: OAuth 2.0, HTTPS, secure cookies
  • Data: CommonMark markdown

4. API-First Architecture

All functionality exposed via API, web interface consumes API. This enables:

  • Micropub client support
  • Future client applications
  • Scriptable automation
  • Clean separation of concerns

5. Progressive Enhancement

  • Core functionality works without JavaScript
  • JavaScript adds optional enhancements (markdown preview)
  • Server-side rendering for fast initial loads
  • Mobile-responsive from the start

Component Descriptions

Web Layer

Public Interface

Purpose: Display published notes to the world Technology: Server-side rendered HTML (Jinja2) Status: IMPLEMENTED (v0.5.0)

Routes (Implemented):

  • GET / - Homepage with recent published notes
  • GET /note/<slug> - Individual note permalink
  • GET /feed.xml - RSS 2.0 feed (v0.6.0)
  • GET /health - Health check endpoint (v0.6.0)

Features:

  • Microformats2 markup (h-entry, h-card, h-feed) - ⚠️ Not validated
  • Reverse chronological note list
  • Clean, minimal responsive CSS
  • Mobile-responsive
  • No JavaScript required

Admin Interface

Purpose: Manage notes (create, edit, publish) Technology: Server-side rendered HTML (Jinja2) Status: IMPLEMENTED (v0.5.2)

Routes (Implemented):

  • GET /auth/login - Login form (v0.9.2: moved from /admin/login)
  • POST /auth/login - Initiate IndieLogin OAuth flow
  • GET /auth/callback - Handle IndieLogin callback
  • POST /auth/logout - Logout and destroy session
  • GET /admin - Dashboard (list of all notes, published + drafts)
  • GET /admin/new - Create note form
  • POST /admin/new - Create note handler
  • GET /admin/edit/<slug> - Edit note form
  • POST /admin/edit/<slug> - Update note handler
  • POST /admin/delete/<slug> - Delete note handler

Development Routes (DEV_MODE only):

  • GET /dev/login - Development authentication bypass (v0.5.0)

Features:

  • Markdown editor (textarea)
  • No real-time preview (deferred to V2)
  • Publish/draft toggle
  • Protected by session authentication
  • Flash messages for feedback
  • Note: Admin routes changed from /admin/* to /auth/* for auth in v0.9.2

API Layer

Notes API

Purpose: RESTful CRUD operations for notes Authentication: Session-based (admin interface) Status: NOT IMPLEMENTED (Optional for V1, deferred to V2)

Planned Routes (Not Implemented):

GET    /api/notes           List published notes (JSON)
POST   /api/notes           Create new note (JSON)
GET    /api/notes/<slug>    Get single note (JSON)
PUT    /api/notes/<slug>    Update note (JSON)
DELETE /api/notes/<slug>    Delete note (JSON)

Current Workaround: Admin interface uses HTML forms (POST), not JSON API Note: Not required for V1, admin interface is fully functional without REST API

Micropub Endpoint

Purpose: Accept posts from external Micropub clients (Quill, Indigenous, etc.) Authentication: IndieAuth bearer tokens Status: NOT IMPLEMENTED (Critical blocker for V1)

Planned Routes (Not Implemented):

POST /api/micropub          Create note (h-entry)
GET  /api/micropub?q=config Query configuration
GET  /api/micropub?q=source Query note source by URL

Planned Content Types:

  • application/json
  • application/x-www-form-urlencoded

Target Compliance: Micropub specification Current Status:

  • Token model exists in database
  • No endpoint implementation
  • No token validation logic
  • Will require IndieAuth token endpoint or external token service

RSS Feed

Purpose: Syndicate published notes Technology: feedgen library Status: IMPLEMENTED (v0.6.0)

Route: GET /feed.xml Format: Valid RSS 2.0 XML Caching: 5 minutes server-side (configurable via FEED_CACHE_SECONDS) Features:

  • Limit to 50 most recent published notes (configurable via FEED_MAX_ITEMS)
  • RFC-822 date formatting (pubDate)
  • CDATA-wrapped HTML content for feed readers
  • Proper GUID for each item (note permalink)
  • Auto-discovery link in HTML templates ()
  • Cache-Control headers for client caching
  • ETag support for conditional requests

Business Logic Layer

Note Management

Operations:

  1. Create: Generate slug → write file → insert database record
  2. Read: Query database for path → read file → render markdown
  3. Update: Write file atomically → update database timestamp
  4. Delete: Mark deleted in database → optionally archive file

Key Components:

  • Slug generation (URL-safe, unique)
  • Markdown rendering (markdown library)
  • Content hashing (integrity verification)
  • Atomic file operations (prevent corruption)

File/Database Sync

Strategy: Write files first, then database Rollback: If database operation fails, delete/restore file Verification: Content hash detects external modifications Integrity Check: Optional scan for orphaned files/records

Authentication

Admin Auth: IndieLogin.com OAuth 2.0 flow with PKCE Status: IMPLEMENTED (v0.8.0, refined through v0.9.5)

Flow:

  1. User enters website URL (their "me" identity)
  2. Generate PKCE code_verifier and code_challenge (SHA-256)
  3. Store state token + code_verifier in database (5 min expiry)
  4. Redirect to indielogin.com/authorize with:
    • client_id (SITE_URL with trailing slash)
    • redirect_uri (SITE_URL/auth/callback)
    • state (CSRF protection)
    • code_challenge + code_challenge_method (S256)
  5. IndieLogin.com verifies identity via RelMeAuth or email
  6. Callback to /auth/callback with code + state
  7. Verify state token (CSRF check)
  8. POST code + code_verifier to indielogin.com/authorize (NOT /token)
  9. Receive verified "me" URL
  10. Verify "me" matches ADMIN_ME config
  11. Create session with SHA-256 hashed token
  12. Store in HttpOnly, Secure, SameSite=Lax cookie named "starpunk_session"

Security Features (v0.8.0-v0.9.5):

  • PKCE prevents authorization code interception
  • State tokens prevent CSRF attacks
  • Session token hashing (SHA-256) before database storage
  • Single-use state tokens with short expiry
  • Automatic trailing slash normalization on SITE_URL (v0.9.1)
  • Uses authorization endpoint (not token endpoint) per IndieAuth spec (v0.9.4)
  • Session cookie renamed to avoid Flask session collision (v0.5.1)

Development Mode (v0.5.0):

  • /dev/login bypasses IndieLogin for local development
  • Requires DEV_MODE=true and DEV_ADMIN_ME configuration
  • Shows warning in logs

Micropub Auth: IndieAuth token verification Status: NOT IMPLEMENTED (Required for Micropub)

Planned Implementation:

  • Client obtains token via external IndieAuth token endpoint
  • Token sent as Bearer in Authorization header
  • Verify token exists in database and not expired
  • Check scope permissions (create, update, delete)
  • OR: Delegate token verification to external IndieAuth server

Data Layer

File Storage

Location: data/notes/ Structure: YYYY/MM/slug.md Format: Pure markdown, no frontmatter Operations:

  • Atomic writes (temp file → rename)
  • Directory creation (makedirs)
  • Content reading (UTF-8 encoding)

Example:

data/notes/
├── 2024/
│   ├── 11/
│   │   ├── my-first-note.md
│   │   └── another-note.md
│   └── 12/
│       └── december-note.md

Database Storage

Location: data/starpunk.db Engine: SQLite3 Status: IMPLEMENTED with automatic migration system (v0.9.0)

Tables:

  • notes - Note metadata (slug, file_path, published, created_at, updated_at, deleted_at, content_hash)
  • sessions - Admin auth sessions (session_token_hash, me, created_at, expires_at, last_used_at, user_agent, ip_address)
  • tokens - Micropub bearer tokens (token, me, client_id, scope, created_at, expires_at) - Table exists but unused
  • auth_state - CSRF state tokens (state, created_at, expires_at, redirect_uri, code_verifier)
  • schema_migrations - Migration tracking (migration_name, applied_at) - Added v0.9.0

Indexes:

  • notes.created_at (DESC) - Fast chronological queries
  • notes.published - Fast published note filtering
  • notes.slug (UNIQUE) - Fast lookup by slug, uniqueness enforcement
  • notes.deleted_at - Fast soft-delete filtering
  • sessions.session_token_hash (UNIQUE) - Fast auth checks
  • sessions.me - Fast user lookups
  • auth_state.state (UNIQUE) - Fast state token validation

Migration System (v0.9.0):

  • Automatic schema updates on application startup
  • Migration files in migrations/ directory (SQL format)
  • Executed in alphanumeric order (001, 002, 003...)
  • Fresh database detection (marks migrations as applied without execution)
  • Legacy database detection (applies pending migrations automatically)
  • Migration tracking in schema_migrations table
  • Fail-safe: Application refuses to start if migrations fail

Queries: Direct SQL using Python sqlite3 module (no ORM)

Data Flow Examples

Creating a Note (via Admin Interface)

1. User fills out form at /admin/new
   ↓
2. POST to /api/notes with markdown content
   ↓
3. Verify user session (check session cookie)
   ↓
4. Generate unique slug from content or timestamp
   ↓
5. Determine file path: data/notes/2024/11/slug.md
   ↓
6. Create directories if needed (makedirs)
   ↓
7. Write markdown content to file (atomic write)
   ↓
8. Calculate SHA-256 hash of content
   ↓
9. Begin database transaction
   ↓
10. Insert record into notes table:
    - slug
    - file_path
    - published (from form)
    - created_at (now)
    - updated_at (now)
    - content_hash
   ↓
11. If database insert fails:
    - Delete file
    - Return error to user
   ↓
12. If database insert succeeds:
    - Commit transaction
    - Return success with note URL
   ↓
13. Redirect user to /admin (dashboard)

Reading a Note (via Public Interface)

1. User visits /note/my-first-note
   ↓
2. Extract slug from URL
   ↓
3. Query database:
    SELECT file_path, created_at, published
    FROM notes
    WHERE slug = 'my-first-note' AND published = 1
   ↓
4. If not found → 404 error
   ↓
5. Read markdown content from file:
    - Open data/notes/2024/11/my-first-note.md
    - Read UTF-8 content
   ↓
6. Render markdown to HTML (markdown.markdown())
   ↓
7. Render Jinja2 template with:
    - content_html (rendered HTML)
    - created_at (timestamp)
    - slug (for permalink)
   ↓
8. Return HTML with microformats markup

Publishing via Micropub

1. Micropub client POSTs to /api/micropub
   Headers: Authorization: Bearer {token}
   Body: {"type": ["h-entry"], "properties": {"content": ["..."]}}
   ↓
2. Extract bearer token from Authorization header
   ↓
3. Query database:
    SELECT me, scope FROM tokens
    WHERE token = {token} AND expires_at > now()
   ↓
4. If token invalid → 401 Unauthorized
   ↓
5. Parse Micropub JSON payload
   ↓
6. Extract content from properties.content[0]
   ↓
7. Create note (same flow as admin interface):
    - Generate slug
    - Write file
    - Insert database record
   ↓
8. If successful:
    - Return 201 Created
    - Set Location header to note URL
   ↓
9. Client receives note URL, displays success

IndieLogin Authentication Flow (v0.9.5 with PKCE)

1. User visits /auth/login
   ↓
2. User enters their website: https://alice.example.com
   ↓
3. POST to /auth/login with "me" parameter
   ↓
4. Validate URL format (must be https://)
   ↓
5. Generate PKCE code_verifier (43 random bytes, base64-url encoded)
   ↓
6. Generate code_challenge from code_verifier (SHA256 hash, base64-url encoded)
   ↓
7. Generate random state token (CSRF protection)
   ↓
8. Store state + code_verifier in auth_state table (5-minute expiry)
   ↓
9. Normalize client_id by adding trailing slash if missing (v0.9.1)
   ↓
10. Build IndieLogin authorization URL:
    https://indielogin.com/authorize?
      me=https://alice.example.com
      client_id=https://starpunk.example.com/  (note trailing slash)
      redirect_uri=https://starpunk.example.com/auth/callback
      state={random_state}
      code_challenge={code_challenge}
      code_challenge_method=S256
   ↓
11. Redirect user to IndieLogin
   ↓
12. IndieLogin verifies user's identity:
    - Checks rel="me" links on alice.example.com
    - Or sends email verification
    - User authenticates via chosen method
   ↓
13. IndieLogin redirects back:
    /auth/callback?code={auth_code}&state={state}
   ↓
14. Verify state matches stored value (CSRF check, single-use)
   ↓
15. Retrieve code_verifier from database using state
   ↓
16. Delete state token (single-use enforcement)
   ↓
17. Exchange code for verified identity (v0.9.4: uses /authorize, not /token):
    POST https://indielogin.com/authorize
      code={auth_code}
      client_id=https://starpunk.example.com/
      redirect_uri=https://starpunk.example.com/auth/callback
      code_verifier={code_verifier}
   ↓
18. IndieLogin returns: {"me": "https://alice.example.com"}
   ↓
19. Verify me == ADMIN_ME (config)
   ↓
20. If match:
    - Generate session token (secrets.token_urlsafe(32))
    - Hash token with SHA-256
    - Insert into sessions table with hash (not plaintext)
    - Set cookie "starpunk_session" (HttpOnly, Secure, SameSite=Lax)
    - Redirect to /admin
   ↓
21. If no match:
    - Return "Unauthorized" error
    - Log attempt with WARNING level

Key Security Features:

  • PKCE prevents code interception attacks (v0.8.0)
  • State tokens prevent CSRF (v0.4.0)
  • Session token hashing prevents token exposure if database compromised (v0.4.0)
  • Single-use state tokens (deleted after verification)
  • Short-lived state tokens (5 minutes)
  • Trailing slash normalization fixes client_id validation (v0.9.1)
  • Correct endpoint usage (/authorize not /token) per IndieAuth spec (v0.9.4)

Security Architecture

Authentication Security

Session Management

  • Token Generation: secrets.token_urlsafe(32) (256-bit entropy)
  • Storage: SHA-256 hash stored in database (plaintext token NEVER stored)
  • Cookie Name: starpunk_session (v0.5.1: renamed to avoid Flask session collision)
  • Cookies: HttpOnly, Secure, SameSite=Lax
  • Expiry: 30 days, extendable on use
  • Validation: Every protected route checks session via @require_auth decorator
  • Metadata: Tracks user_agent and ip_address for audit purposes

CSRF Protection

  • State Tokens: Random tokens for OAuth flows
  • Expiry: 5 minutes (short-lived)
  • Single-Use: Deleted after verification
  • SameSite: Cookies set to Lax mode

Access Control

  • Admin Routes: Require valid session
  • Micropub Routes: Require valid bearer token
  • Public Routes: No authentication needed
  • Identity Verification: Only ADMIN_ME can authenticate

Input Validation

User Input

  • Markdown: Sanitize to prevent XSS in rendered HTML
  • URLs: Validate format and scheme (https://)
  • Slugs: Alphanumeric + hyphens only
  • JSON: Parse and validate structure
  • File Paths: Prevent directory traversal (validate against base path)

Micropub Payloads

  • Content-Type: Verify matches expected format
  • Required Fields: Validate h-entry structure
  • Size Limits: Prevent DoS via large payloads
  • Scope Verification: Check token has required permissions

Database Security

SQL Injection Prevention

  • Parameterized Queries: Always use parameter substitution
  • No String Interpolation: Never build SQL with f-strings
  • Input Sanitization: Validate before database operations

Example:

# GOOD
cursor.execute("SELECT * FROM notes WHERE slug = ?", (slug,))

# BAD (SQL injection vulnerable)
cursor.execute(f"SELECT * FROM notes WHERE slug = '{slug}'")

Data Integrity

  • Transactions: Use for multi-step operations
  • Constraints: UNIQUE on slugs, file_paths
  • Foreign Keys: Enforce relationships (if applicable)
  • Content Hashing: Detect unauthorized file modifications

Network Security

HTTPS

  • Production Requirement: TLS 1.2+ required
  • Reverse Proxy: Nginx/Caddy handles SSL termination
  • Certificate Validation: Verify SSL certs on outbound requests
  • HSTS: Set Strict-Transport-Security header

Security Headers

# Set on all responses
Content-Security-Policy: default-src 'self'
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Referrer-Policy: strict-origin-when-cross-origin

Rate Limiting

  • Implementation: Reverse proxy (nginx/Caddy)
  • Admin Routes: Stricter limits
  • API Routes: Moderate limits
  • Public Routes: Permissive limits

File System Security

Atomic Operations

# Write to temp file, then atomic rename
temp_path = f"{target_path}.tmp"
with open(temp_path, 'w') as f:
    f.write(content)
os.rename(temp_path, target_path)  # Atomic on POSIX

Path Validation

# Prevent directory traversal
base_path = os.path.abspath(DATA_PATH)
requested_path = os.path.abspath(os.path.join(base_path, user_input))
if not requested_path.startswith(base_path):
    raise SecurityError("Path traversal detected")

File Permissions

  • Data Directory: 700 (owner only)
  • Database File: 600 (owner read/write)
  • Note Files: 600 (owner read/write)
  • Application User: Dedicated non-root user

Performance Considerations

Response Time Targets

  • API Responses: < 100ms (database + file read)
  • Page Renders: < 200ms (template rendering)
  • RSS Feed: < 300ms (query + file reads + XML generation)

Optimization Strategies

Database

  • Indexes: On frequently queried columns (created_at, slug, published)
  • Connection Pooling: Single connection (single-user, no contention)
  • Query Optimization: SELECT only needed columns
  • Prepared Statements: Reuse compiled queries

File System

  • Caching: Consider caching rendered HTML in memory (optional)
  • Directory Structure: Year/Month prevents large directories
  • Atomic Reads: Fast sequential reads, no locking needed

HTTP

  • Static Assets: Cache headers on CSS/JS (1 year)
  • RSS Feed: Cache for 5 minutes (Cache-Control)
  • Compression: gzip/brotli via reverse proxy
  • ETags: For conditional requests

Rendering

  • Template Compilation: Jinja2 compiles templates automatically
  • Minimal Templating: Simple templates render fast
  • Server-Side: No client-side rendering overhead

Resource Usage

Memory

  • Flask Process: ~50MB base
  • SQLite: ~10MB typical working set
  • Total: < 100MB under normal load

Disk

  • Application: ~5MB (code + dependencies)
  • Database: ~1MB per 1000 notes
  • Notes: ~5KB average per markdown file
  • Total: Scales linearly with note count

CPU

  • Idle: Near zero
  • Request Handling: Minimal (no heavy processing)
  • Markdown Rendering: Fast (pure Python)
  • Database Queries: Indexed, sub-millisecond

Deployment Architecture

Current State: IMPLEMENTED (v0.6.0 - v0.9.5) Technology: Container-based with Gunicorn WSGI server CI/CD: Gitea Actions automated builds (v0.9.5)

Container Deployment (v0.6.0)

Containerfile: Multi-stage build using Python 3.11-slim base

  • Stage 1: Build dependencies with uv package manager
  • Stage 2: Production image with non-root user (starpunk:1000)
  • Final size: ~174MB

Features:

  • Health check endpoint: /health (validates database and filesystem)
  • Gunicorn WSGI server with 4 workers (configurable)
  • Log rotation (10MB max, 3 files)
  • Resource limits (memory, CPU)
  • SELinux compatibility (volume mount flags)
  • Automatic database initialization on first run

Container Orchestration:

  • Podman-compatible (rootless, userns=keep-id)
  • Docker Compose compatible
  • Volume mounts for data persistence (./data:/app/data)
  • Port mapping (8080:8000)
  • Environment variables for configuration

CI/CD Pipeline (v0.9.5):

  • Gitea Actions workflow (.gitea/workflows/build-container.yml)
  • Automated builds on push to main branch
  • Manual trigger support
  • Container registry push
  • Docker and git dependencies installed
  • Node.js support for GitHub Actions compatibility

Single-Server Deployment

┌─────────────────────────────────────────────────┐
│ Internet                                        │
└────────────────┬────────────────────────────────┘
                 │
                 │ Port 443 (HTTPS)
                 ↓
┌─────────────────────────────────────────────────┐
│ Nginx/Caddy (Reverse Proxy)                     │
│  - SSL/TLS termination                          │
│  - Static file serving                          │
│  - Rate limiting                                │
│  - Compression                                  │
└────────────────┬────────────────────────────────┘
                 │
                 │ Port 8000 (HTTP)
                 ↓
┌─────────────────────────────────────────────────┐
│ Gunicorn (WSGI Server)                          │
│  - 4 worker processes                           │
│  - Process management                           │
│  - Load balancing (round-robin)                 │
└────────────────┬────────────────────────────────┘
                 │
                 │ WSGI
                 ↓
┌─────────────────────────────────────────────────┐
│ Flask Application                               │
│  - Request handling                             │
│  - Business logic                               │
│  - Template rendering                           │
└────────────────┬────────────────────────────────┘
                 │
                 ↓
┌────────────────────────────┬────────────────────┐
│ File System                │ SQLite Database    │
│  data/notes/               │  data/starpunk.db  │
│    YYYY/MM/slug.md         │                    │
└────────────────────────────┴────────────────────┘

Process Management (systemd)

[Unit]
Description=StarPunk CMS
After=network.target

[Service]
Type=notify
User=starpunk
WorkingDirectory=/opt/starpunk
Environment="PATH=/opt/starpunk/venv/bin"
ExecStart=/opt/starpunk/venv/bin/gunicorn -w 4 -b 127.0.0.1:8000 app:app
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Backup Strategy

Automated Daily Backup

#!/bin/bash
# backup.sh - Run daily via cron

DATE=$(date +%Y%m%d)
BACKUP_DIR="/backup/starpunk"

# Backup data directory (notes + database)
rsync -av /opt/starpunk/data/ "$BACKUP_DIR/$DATE/"

# Keep last 30 days
find "$BACKUP_DIR" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} \;

Manual Backup

# Simple copy
cp -r /opt/starpunk/data /backup/starpunk-$(date +%Y%m%d)

# Or with compression
tar -czf starpunk-backup-$(date +%Y%m%d).tar.gz /opt/starpunk/data

Restore Process

  1. Stop application: sudo systemctl stop starpunk
  2. Restore data directory: rsync -av /backup/starpunk/20241118/ /opt/starpunk/data/
  3. Fix permissions: chown -R starpunk:starpunk /opt/starpunk/data
  4. Start application: sudo systemctl start starpunk
  5. Verify: Visit site, check recent notes

Testing Strategy

Test Pyramid

           ┌─────────────┐
          /               \
         /   Manual Tests  \      Validation, Real Services
        /─────────────────  \
       /                     \
      /  Integration Tests    \   API Flows, Database + Files
     /───────────────────────  \
    /                           \
   /        Unit Tests            \  Functions, Logic, Parsing
  /───────────────────────────────\

Unit Tests (pytest)

Coverage: Business logic, utilities, models Examples:

  • Slug generation and uniqueness
  • Markdown rendering with various inputs
  • Content hash calculation
  • File path validation
  • Token generation and verification
  • Date formatting for RSS
  • Micropub payload parsing

Integration Tests

Coverage: Component interactions, full flows Examples:

  • Create note: file write + database insert
  • Read note: database query + file read
  • IndieLogin flow with mocked API
  • Micropub creation with token validation
  • RSS feed generation with multiple notes
  • Session authentication on protected routes

End-to-End Tests

Coverage: Full user workflows Examples:

  • Admin login via IndieLogin (mocked)
  • Create note via web interface
  • Publish note via Micropub client (mocked)
  • View note on public site
  • Verify RSS feed includes note

Validation Tests

Coverage: Standards compliance Tools:

  • W3C HTML Validator (validate templates)
  • W3C Feed Validator (validate RSS output)
  • IndieWebify.me (verify microformats)
  • Micropub.rocks (test Micropub compliance)

Manual Tests

Coverage: Real-world usage Examples:

  • Authenticate with real indielogin.com
  • Publish from actual Micropub client (Quill, Indigenous)
  • Subscribe to feed in actual RSS reader
  • Browser compatibility (Chrome, Firefox, Safari, mobile)
  • Accessibility with screen reader

Monitoring and Observability

Logging Strategy

Application Logs

# Structured logging
import logging

logger = logging.getLogger(__name__)

# Info: Normal operations
logger.info("Note created", extra={
    "slug": slug,
    "published": published,
    "user": session.me
})

# Warning: Recoverable issues
logger.warning("State token expired", extra={
    "state": state,
    "age": age_seconds
})

# Error: Failed operations
logger.error("File write failed", extra={
    "path": file_path,
    "error": str(e)
})

Log Levels

  • DEBUG: Development only (verbose)
  • INFO: Normal operations (note creation, auth success)
  • WARNING: Unusual but handled (expired tokens, invalid input)
  • ERROR: Failed operations (file I/O errors, database errors)
  • CRITICAL: System failures (database unreachable)

Log Destinations

  • Development: Console (stdout)
  • Production: File rotation (logrotate) + optional syslog

Metrics (Optional for V2)

Simple Metrics (if desired):

  • Note count (query database)
  • Request count (nginx logs)
  • Error rate (grep application logs)
  • Response times (nginx logs)

Advanced Metrics (V2):

  • Prometheus exporter
  • Grafana dashboard
  • Alert on error rate spike

Health Checks

@app.route('/health')
def health_check():
    """Simple health check for monitoring"""
    try:
        # Check database
        db.execute("SELECT 1").fetchone()

        # Check file system
        os.path.exists(DATA_PATH)

        return {"status": "ok"}, 200
    except Exception as e:
        return {"status": "error", "detail": str(e)}, 500

Migration and Evolution

V1 to V2 Migration

Database Schema Changes

-- Add new column with default
ALTER TABLE notes ADD COLUMN tags TEXT DEFAULT '';

-- Create new table
CREATE TABLE tags (
    id INTEGER PRIMARY KEY,
    name TEXT UNIQUE NOT NULL
);

-- Migration script updates existing notes

File Format Evolution

V1: Pure markdown V2 (if needed): Add optional frontmatter

---
tags: indieweb, cms
---
Note content here

Backward Compatibility: Parser checks for frontmatter, falls back to pure markdown.

API Versioning

# V1 (current)
GET /api/notes

# V2 (future)
GET /api/v2/notes  # New features
GET /api/notes     # Still works, returns V1 response

Data Export/Import

Export Formats

  1. Markdown Bundle: Zip of all notes (already portable)
  2. JSON Export: Notes + metadata
    {
      "version": "1.0",
      "exported_at": "2024-11-18T12:00:00Z",
      "notes": [
        {
          "slug": "my-note",
          "content": "Note content...",
          "created_at": "2024-11-01T12:00:00Z",
          "published": true
        }
      ]
    }
    
  3. RSS Archive: Existing feed.xml

Import (V2)

  • From JSON export
  • From WordPress XML
  • From markdown directory
  • From other IndieWeb CMSs

Implementation Status (v0.9.5)

Fully Implemented Features

  1. Note Management (v0.3.0)

    • Full CRUD operations (create, read, update, delete)
    • Hybrid file+database storage with sync
    • Soft and hard delete support
    • Markdown rendering
    • Slug generation with uniqueness
  2. Authentication (v0.8.0)

    • IndieLogin.com OAuth 2.0 with PKCE
    • Session management with token hashing
    • CSRF protection with state tokens
    • Development mode authentication bypass
  3. Web Interface (v0.5.2)

    • Public site: homepage and note permalinks
    • Admin dashboard with note management
    • Login/logout flows
    • Responsive design
    • Microformats2 markup (h-entry, h-card, h-feed)
  4. RSS Feed (v0.6.0)

    • RSS 2.0 compliant feed generation
    • Auto-discovery links
    • Server-side caching
    • ETag support
  5. Container Deployment (v0.6.0)

    • Multi-stage Containerfile
    • Gunicorn WSGI server
    • Health check endpoint
    • Volume persistence
  6. CI/CD Pipeline (v0.9.5)

    • Gitea Actions workflow
    • Automated container builds
    • Registry push
  7. Database Migrations (v0.9.0)

    • Automatic migration system
    • Fresh database detection
    • Legacy database migration
    • Migration tracking
  8. Development Tools

    • uv package manager for Python
    • Comprehensive test suite (87% coverage)
    • Black code formatting
    • Flake8 linting

Not Yet Implemented (Blocking V1)

  1. Micropub Endpoint

    • POST /api/micropub for creating notes
    • GET /api/micropub?q=config
    • GET /api/micropub?q=source
    • Token validation
    • Status: Critical blocker for V1 release
  2. IndieAuth Token Endpoint

    • Token issuance for Micropub clients
    • Alternative: May use external IndieAuth server

⚠️ Partially Implemented

  1. Standards Validation

    • HTML5: Markup exists, not validated
    • Microformats: Markup exists, not validated
    • RSS: Validated and compliant
    • Micropub: N/A (not implemented)
  2. REST API (Optional)

    • JSON API for notes CRUD
    • Status: Deferred to V2 (admin interface works without it)

Success Metrics

The architecture is successful if it enables:

  1. Fast Development: < 1 week to implement V1 - ACHIEVED (~35 hours, 70% complete)
  2. Easy Deployment: < 5 minutes to get running - ACHIEVED (containerized)
  3. Low Maintenance: Runs for months without intervention - ACHIEVED (automated migrations)
  4. High Performance: All responses < 300ms - ACHIEVED
  5. Data Ownership: User has direct access to all content - ACHIEVED (file-based storage)
  6. Standards Compliance: Passes all validators - ⚠️ PARTIAL (RSS yes, others pending)
  7. Extensibility: Can add V2 features without rewrite - ACHIEVED (migration system ready)

References

Internal Documentation

External Standards