Files
StarPunk/docs/architecture/overview.md
Phil Skentelbery 2eaf67279d docs: Standardize all IndieAuth spec references to W3C URL
- Updated 42 references from indieauth.spec.indieweb.org to www.w3.org/TR/indieauth
- Ensures consistency across all documentation
- Points to the authoritative W3C specification
- No functional changes, documentation update only

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 11:54:04 -07:00

1131 lines
39 KiB
Markdown

# StarPunk Architecture Overview
**Version**: v0.9.5 (2025-11-24)
**Status**: Pre-V1 Release (Micropub endpoint pending)
## Executive Summary
StarPunk is a minimal, single-user IndieWeb CMS designed around the principle: "Every line of code must justify its existence." The architecture prioritizes simplicity, standards compliance, and user data ownership through careful technology selection and hybrid data storage.
**Core Architecture**: Flask web application with hybrid file+database storage, server-side rendering, delegated authentication (IndieLogin.com), and containerized deployment.
**Technology Stack**: Python 3.11, Flask, SQLite, Jinja2, Gunicorn, uv package manager
**Deployment**: Container-based (Podman/Docker) with automated CI/CD (Gitea Actions)
**Authentication**: IndieAuth via IndieLogin.com with PKCE security
## System Architecture
### High-Level Components
```
┌─────────────────────────────────────────────────────────────┐
│ User Browser │
└───────────────┬─────────────────────────────────────────────┘
│ HTTP/HTTPS
┌─────────────────────────────────────────────────────────────┐
│ Flask Application │
│ ┌─────────────────────────────────────────────────────────┤
│ │ Web Interface (Jinja2 Templates) │
│ │ - Public: Homepage, Note Permalinks │
│ │ - Admin: Dashboard, Note Editor │
│ └──────────────────────────────┬──────────────────────────┘
│ ┌──────────────────────────────┴──────────────────────────┐
│ │ API Layer (RESTful + Micropub) │
│ │ - Notes CRUD API │
│ │ - Micropub Endpoint │
│ │ - RSS Feed Generator │
│ │ - Authentication Handlers │
│ └──────────────────────────────┬──────────────────────────┘
│ ┌──────────────────────────────┴──────────────────────────┐
│ │ Business Logic │
│ │ - Note Management (create, read, update, delete) │
│ │ - File/Database Sync │
│ │ - Markdown Rendering │
│ │ - Slug Generation │
│ │ - Session Management │
│ └──────────────────────────────┬──────────────────────────┘
│ ┌──────────────────────────────┴──────────────────────────┐
│ │ Data Layer │
│ │ ┌──────────────────┐ ┌─────────────────────────┐ │
│ │ │ File Storage │ │ SQLite Database │ │
│ │ │ │ │ │ │
│ │ │ Markdown Files │ │ - Note Metadata │ │
│ │ │ (Pure Content) │ │ - Sessions │ │
│ │ │ │ │ - Tokens │ │
│ │ │ data/notes/ │ │ - Auth State │ │
│ │ │ YYYY/MM/ │ │ │ │
│ │ │ slug.md │ │ data/starpunk.db │ │
│ │ └──────────────────┘ └─────────────────────────┘ │
│ └─────────────────────────────────────────────────────────┘
└─────────────────────────────────────────────────────────────┘
│ HTTPS
┌─────────────────────────────────────────────────────────────┐
│ External Services │
│ - IndieLogin.com (Authentication) │
│ - User's Website (Identity Verification) │
│ - Micropub Clients (Publishing) │
└─────────────────────────────────────────────────────────────┘
```
## Core Principles
### 1. Radical Simplicity
- Total dependencies: 6 direct packages
- No build tools, no npm, no bundlers
- Server-side rendering eliminates frontend complexity
- Single file SQLite database
- Zero configuration frameworks
### 2. Hybrid Data Architecture
**Files for Content**: Markdown notes stored as plain text files
- Maximum portability
- Human-readable
- Direct user access
- Easy backup (copy, rsync, git)
**Database for Metadata**: SQLite stores structured data
- Fast queries and indexes
- Referential integrity
- Efficient filtering and sorting
- Transaction support
**Sync Strategy**: Files are authoritative for content; database is authoritative for metadata. Both must stay in sync.
### 3. Standards-First Design
- IndieWeb: Microformats2, IndieAuth, Micropub
- Web: HTML5, RSS 2.0, HTTP standards
- Security: OAuth 2.0, HTTPS, secure cookies
- Data: CommonMark markdown
### 4. API-First Architecture
All functionality exposed via API, web interface consumes API. This enables:
- Micropub client support
- Future client applications
- Scriptable automation
- Clean separation of concerns
### 5. Progressive Enhancement
- Core functionality works without JavaScript
- JavaScript adds optional enhancements (markdown preview)
- Server-side rendering for fast initial loads
- Mobile-responsive from the start
## Component Descriptions
### Web Layer
#### Public Interface
**Purpose**: Display published notes to the world
**Technology**: Server-side rendered HTML (Jinja2)
**Status**: ✅ IMPLEMENTED (v0.5.0)
**Routes** (Implemented):
- `GET /` - Homepage with recent published notes
- `GET /note/<slug>` - Individual note permalink
- `GET /feed.xml` - RSS 2.0 feed (v0.6.0)
- `GET /health` - Health check endpoint (v0.6.0)
**Features**:
- Microformats2 markup (h-entry, h-card, h-feed) - ⚠️ Not validated
- Reverse chronological note list
- Clean, minimal responsive CSS
- Mobile-responsive
- No JavaScript required
#### Admin Interface
**Purpose**: Manage notes (create, edit, publish)
**Technology**: Server-side rendered HTML (Jinja2)
**Status**: ✅ IMPLEMENTED (v0.5.2)
**Routes** (Implemented):
- `GET /auth/login` - Login form (v0.9.2: moved from /admin/login)
- `POST /auth/login` - Initiate IndieLogin OAuth flow
- `GET /auth/callback` - Handle IndieLogin callback
- `POST /auth/logout` - Logout and destroy session
- `GET /admin` - Dashboard (list of all notes, published + drafts)
- `GET /admin/new` - Create note form
- `POST /admin/new` - Create note handler
- `GET /admin/edit/<slug>` - Edit note form
- `POST /admin/edit/<slug>` - Update note handler
- `POST /admin/delete/<slug>` - Delete note handler
**Development Routes** (DEV_MODE only):
- `GET /dev/login` - Development authentication bypass (v0.5.0)
**Features**:
- Markdown editor (textarea)
- No real-time preview (deferred to V2)
- Publish/draft toggle
- Protected by session authentication
- Flash messages for feedback
- Note: Admin routes changed from `/admin/*` to `/auth/*` for auth in v0.9.2
### API Layer
#### Notes API
**Purpose**: RESTful CRUD operations for notes
**Authentication**: Session-based (admin interface)
**Status**: ❌ NOT IMPLEMENTED (Optional for V1, deferred to V2)
**Planned Routes** (Not Implemented):
```
GET /api/notes List published notes (JSON)
POST /api/notes Create new note (JSON)
GET /api/notes/<slug> Get single note (JSON)
PUT /api/notes/<slug> Update note (JSON)
DELETE /api/notes/<slug> Delete note (JSON)
```
**Current Workaround**: Admin interface uses HTML forms (POST), not JSON API
**Note**: Not required for V1, admin interface is fully functional without REST API
#### Micropub Endpoint
**Purpose**: Accept posts from external Micropub clients (Quill, Indigenous, etc.)
**Authentication**: IndieAuth bearer tokens
**Status**: ❌ NOT IMPLEMENTED (Critical blocker for V1)
**Planned Routes** (Not Implemented):
```
POST /api/micropub Create note (h-entry)
GET /api/micropub?q=config Query configuration
GET /api/micropub?q=source Query note source by URL
```
**Planned Content Types**:
- application/json
- application/x-www-form-urlencoded
**Target Compliance**: Micropub specification
**Current Status**:
- Token model exists in database
- No endpoint implementation
- No token validation logic
- Will require IndieAuth token endpoint or external token service
#### RSS Feed
**Purpose**: Syndicate published notes
**Technology**: feedgen library
**Status**: ✅ IMPLEMENTED (v0.6.0)
**Route**: `GET /feed.xml`
**Format**: Valid RSS 2.0 XML
**Caching**: 5 minutes server-side (configurable via FEED_CACHE_SECONDS)
**Features**:
- Limit to 50 most recent published notes (configurable via FEED_MAX_ITEMS)
- RFC-822 date formatting (pubDate)
- CDATA-wrapped HTML content for feed readers
- Proper GUID for each item (note permalink)
- Auto-discovery link in HTML templates (<link rel="alternate">)
- Cache-Control headers for client caching
- ETag support for conditional requests
### Business Logic Layer
#### Note Management
**Operations**:
1. **Create**: Generate slug → write file → insert database record
2. **Read**: Query database for path → read file → render markdown
3. **Update**: Write file atomically → update database timestamp
4. **Delete**: Mark deleted in database → optionally archive file
**Key Components**:
- Slug generation (URL-safe, unique)
- Markdown rendering (markdown library)
- Content hashing (integrity verification)
- Atomic file operations (prevent corruption)
#### File/Database Sync
**Strategy**: Write files first, then database
**Rollback**: If database operation fails, delete/restore file
**Verification**: Content hash detects external modifications
**Integrity Check**: Optional scan for orphaned files/records
#### Authentication
**Admin Auth**: IndieLogin.com OAuth 2.0 flow with PKCE
**Status**: ✅ IMPLEMENTED (v0.8.0, refined through v0.9.5)
**Flow**:
1. User enters website URL (their "me" identity)
2. Generate PKCE code_verifier and code_challenge (SHA-256)
3. Store state token + code_verifier in database (5 min expiry)
4. Redirect to indielogin.com/authorize with:
- client_id (SITE_URL with trailing slash)
- redirect_uri (SITE_URL/auth/callback)
- state (CSRF protection)
- code_challenge + code_challenge_method (S256)
5. IndieLogin.com verifies identity via RelMeAuth or email
6. Callback to /auth/callback with code + state
7. Verify state token (CSRF check)
8. POST code + code_verifier to indielogin.com/authorize (NOT /token)
9. Receive verified "me" URL
10. Verify "me" matches ADMIN_ME config
11. Create session with SHA-256 hashed token
12. Store in HttpOnly, Secure, SameSite=Lax cookie named "starpunk_session"
**Security Features** (v0.8.0-v0.9.5):
- PKCE prevents authorization code interception
- State tokens prevent CSRF attacks
- Session token hashing (SHA-256) before database storage
- Single-use state tokens with short expiry
- Automatic trailing slash normalization on SITE_URL (v0.9.1)
- Uses authorization endpoint (not token endpoint) per IndieAuth spec (v0.9.4)
- Session cookie renamed to avoid Flask session collision (v0.5.1)
**Development Mode** (v0.5.0):
- `/dev/login` bypasses IndieLogin for local development
- Requires DEV_MODE=true and DEV_ADMIN_ME configuration
- Shows warning in logs
**Micropub Auth**: IndieAuth token verification
**Status**: ❌ NOT IMPLEMENTED (Required for Micropub)
**Planned Implementation**:
- Client obtains token via external IndieAuth token endpoint
- Token sent as Bearer in Authorization header
- Verify token exists in database and not expired
- Check scope permissions (create, update, delete)
- OR: Delegate token verification to external IndieAuth server
### Data Layer
#### File Storage
**Location**: `data/notes/`
**Structure**: `YYYY/MM/slug.md`
**Format**: Pure markdown, no frontmatter
**Operations**:
- Atomic writes (temp file → rename)
- Directory creation (makedirs)
- Content reading (UTF-8 encoding)
**Example**:
```
data/notes/
├── 2024/
│ ├── 11/
│ │ ├── my-first-note.md
│ │ └── another-note.md
│ └── 12/
│ └── december-note.md
```
#### Database Storage
**Location**: `data/starpunk.db`
**Engine**: SQLite3
**Status**: ✅ IMPLEMENTED with automatic migration system (v0.9.0)
**Tables**:
- `notes` - Note metadata (slug, file_path, published, created_at, updated_at, deleted_at, content_hash)
- `sessions` - Admin auth sessions (session_token_hash, me, created_at, expires_at, last_used_at, user_agent, ip_address)
- `tokens` - Micropub bearer tokens (token, me, client_id, scope, created_at, expires_at) - **Table exists but unused**
- `auth_state` - CSRF state tokens (state, created_at, expires_at, redirect_uri, code_verifier)
- `schema_migrations` - Migration tracking (migration_name, applied_at) - **Added v0.9.0**
**Indexes**:
- `notes.created_at` (DESC) - Fast chronological queries
- `notes.published` - Fast published note filtering
- `notes.slug` (UNIQUE) - Fast lookup by slug, uniqueness enforcement
- `notes.deleted_at` - Fast soft-delete filtering
- `sessions.session_token_hash` (UNIQUE) - Fast auth checks
- `sessions.me` - Fast user lookups
- `auth_state.state` (UNIQUE) - Fast state token validation
**Migration System** (v0.9.0):
- Automatic schema updates on application startup
- Migration files in `migrations/` directory (SQL format)
- Executed in alphanumeric order (001, 002, 003...)
- Fresh database detection (marks migrations as applied without execution)
- Legacy database detection (applies pending migrations automatically)
- Migration tracking in schema_migrations table
- Fail-safe: Application refuses to start if migrations fail
**Queries**: Direct SQL using Python sqlite3 module (no ORM)
## Data Flow Examples
### Creating a Note (via Admin Interface)
```
1. User fills out form at /admin/new
2. POST to /api/notes with markdown content
3. Verify user session (check session cookie)
4. Generate unique slug from content or timestamp
5. Determine file path: data/notes/2024/11/slug.md
6. Create directories if needed (makedirs)
7. Write markdown content to file (atomic write)
8. Calculate SHA-256 hash of content
9. Begin database transaction
10. Insert record into notes table:
- slug
- file_path
- published (from form)
- created_at (now)
- updated_at (now)
- content_hash
11. If database insert fails:
- Delete file
- Return error to user
12. If database insert succeeds:
- Commit transaction
- Return success with note URL
13. Redirect user to /admin (dashboard)
```
### Reading a Note (via Public Interface)
```
1. User visits /note/my-first-note
2. Extract slug from URL
3. Query database:
SELECT file_path, created_at, published
FROM notes
WHERE slug = 'my-first-note' AND published = 1
4. If not found → 404 error
5. Read markdown content from file:
- Open data/notes/2024/11/my-first-note.md
- Read UTF-8 content
6. Render markdown to HTML (markdown.markdown())
7. Render Jinja2 template with:
- content_html (rendered HTML)
- created_at (timestamp)
- slug (for permalink)
8. Return HTML with microformats markup
```
### Publishing via Micropub
```
1. Micropub client POSTs to /api/micropub
Headers: Authorization: Bearer {token}
Body: {"type": ["h-entry"], "properties": {"content": ["..."]}}
2. Extract bearer token from Authorization header
3. Query database:
SELECT me, scope FROM tokens
WHERE token = {token} AND expires_at > now()
4. If token invalid → 401 Unauthorized
5. Parse Micropub JSON payload
6. Extract content from properties.content[0]
7. Create note (same flow as admin interface):
- Generate slug
- Write file
- Insert database record
8. If successful:
- Return 201 Created
- Set Location header to note URL
9. Client receives note URL, displays success
```
### IndieLogin Authentication Flow (v0.9.5 with PKCE)
```
1. User visits /auth/login
2. User enters their website: https://alice.example.com
3. POST to /auth/login with "me" parameter
4. Validate URL format (must be https://)
5. Generate PKCE code_verifier (43 random bytes, base64-url encoded)
6. Generate code_challenge from code_verifier (SHA256 hash, base64-url encoded)
7. Generate random state token (CSRF protection)
8. Store state + code_verifier in auth_state table (5-minute expiry)
9. Normalize client_id by adding trailing slash if missing (v0.9.1)
10. Build IndieLogin authorization URL:
https://indielogin.com/authorize?
me=https://alice.example.com
client_id=https://starpunk.example.com/ (note trailing slash)
redirect_uri=https://starpunk.example.com/auth/callback
state={random_state}
code_challenge={code_challenge}
code_challenge_method=S256
11. Redirect user to IndieLogin
12. IndieLogin verifies user's identity:
- Checks rel="me" links on alice.example.com
- Or sends email verification
- User authenticates via chosen method
13. IndieLogin redirects back:
/auth/callback?code={auth_code}&state={state}
14. Verify state matches stored value (CSRF check, single-use)
15. Retrieve code_verifier from database using state
16. Delete state token (single-use enforcement)
17. Exchange code for verified identity (v0.9.4: uses /authorize, not /token):
POST https://indielogin.com/authorize
code={auth_code}
client_id=https://starpunk.example.com/
redirect_uri=https://starpunk.example.com/auth/callback
code_verifier={code_verifier}
18. IndieLogin returns: {"me": "https://alice.example.com"}
19. Verify me == ADMIN_ME (config)
20. If match:
- Generate session token (secrets.token_urlsafe(32))
- Hash token with SHA-256
- Insert into sessions table with hash (not plaintext)
- Set cookie "starpunk_session" (HttpOnly, Secure, SameSite=Lax)
- Redirect to /admin
21. If no match:
- Return "Unauthorized" error
- Log attempt with WARNING level
```
**Key Security Features**:
- PKCE prevents code interception attacks (v0.8.0)
- State tokens prevent CSRF (v0.4.0)
- Session token hashing prevents token exposure if database compromised (v0.4.0)
- Single-use state tokens (deleted after verification)
- Short-lived state tokens (5 minutes)
- Trailing slash normalization fixes client_id validation (v0.9.1)
- Correct endpoint usage (/authorize not /token) per IndieAuth spec (v0.9.4)
## Security Architecture
### Authentication Security
#### Session Management
- **Token Generation**: `secrets.token_urlsafe(32)` (256-bit entropy)
- **Storage**: SHA-256 hash stored in database (plaintext token NEVER stored)
- **Cookie Name**: `starpunk_session` (v0.5.1: renamed to avoid Flask session collision)
- **Cookies**: HttpOnly, Secure, SameSite=Lax
- **Expiry**: 30 days, extendable on use
- **Validation**: Every protected route checks session via `@require_auth` decorator
- **Metadata**: Tracks user_agent and ip_address for audit purposes
#### CSRF Protection
- **State Tokens**: Random tokens for OAuth flows
- **Expiry**: 5 minutes (short-lived)
- **Single-Use**: Deleted after verification
- **SameSite**: Cookies set to Lax mode
#### Access Control
- **Admin Routes**: Require valid session
- **Micropub Routes**: Require valid bearer token
- **Public Routes**: No authentication needed
- **Identity Verification**: Only ADMIN_ME can authenticate
### Input Validation
#### User Input
- **Markdown**: Sanitize to prevent XSS in rendered HTML
- **URLs**: Validate format and scheme (https://)
- **Slugs**: Alphanumeric + hyphens only
- **JSON**: Parse and validate structure
- **File Paths**: Prevent directory traversal (validate against base path)
#### Micropub Payloads
- **Content-Type**: Verify matches expected format
- **Required Fields**: Validate h-entry structure
- **Size Limits**: Prevent DoS via large payloads
- **Scope Verification**: Check token has required permissions
### Database Security
#### SQL Injection Prevention
- **Parameterized Queries**: Always use parameter substitution
- **No String Interpolation**: Never build SQL with f-strings
- **Input Sanitization**: Validate before database operations
Example:
```python
# GOOD
cursor.execute("SELECT * FROM notes WHERE slug = ?", (slug,))
# BAD (SQL injection vulnerable)
cursor.execute(f"SELECT * FROM notes WHERE slug = '{slug}'")
```
#### Data Integrity
- **Transactions**: Use for multi-step operations
- **Constraints**: UNIQUE on slugs, file_paths
- **Foreign Keys**: Enforce relationships (if applicable)
- **Content Hashing**: Detect unauthorized file modifications
### Network Security
#### HTTPS
- **Production Requirement**: TLS 1.2+ required
- **Reverse Proxy**: Nginx/Caddy handles SSL termination
- **Certificate Validation**: Verify SSL certs on outbound requests
- **HSTS**: Set Strict-Transport-Security header
#### Security Headers
```python
# Set on all responses
Content-Security-Policy: default-src 'self'
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Referrer-Policy: strict-origin-when-cross-origin
```
#### Rate Limiting
- **Implementation**: Reverse proxy (nginx/Caddy)
- **Admin Routes**: Stricter limits
- **API Routes**: Moderate limits
- **Public Routes**: Permissive limits
### File System Security
#### Atomic Operations
```python
# Write to temp file, then atomic rename
temp_path = f"{target_path}.tmp"
with open(temp_path, 'w') as f:
f.write(content)
os.rename(temp_path, target_path) # Atomic on POSIX
```
#### Path Validation
```python
# Prevent directory traversal
base_path = os.path.abspath(DATA_PATH)
requested_path = os.path.abspath(os.path.join(base_path, user_input))
if not requested_path.startswith(base_path):
raise SecurityError("Path traversal detected")
```
#### File Permissions
- **Data Directory**: 700 (owner only)
- **Database File**: 600 (owner read/write)
- **Note Files**: 600 (owner read/write)
- **Application User**: Dedicated non-root user
## Performance Considerations
### Response Time Targets
- **API Responses**: < 100ms (database + file read)
- **Page Renders**: < 200ms (template rendering)
- **RSS Feed**: < 300ms (query + file reads + XML generation)
### Optimization Strategies
#### Database
- **Indexes**: On frequently queried columns (created_at, slug, published)
- **Connection Pooling**: Single connection (single-user, no contention)
- **Query Optimization**: SELECT only needed columns
- **Prepared Statements**: Reuse compiled queries
#### File System
- **Caching**: Consider caching rendered HTML in memory (optional)
- **Directory Structure**: Year/Month prevents large directories
- **Atomic Reads**: Fast sequential reads, no locking needed
#### HTTP
- **Static Assets**: Cache headers on CSS/JS (1 year)
- **RSS Feed**: Cache for 5 minutes (Cache-Control)
- **Compression**: gzip/brotli via reverse proxy
- **ETags**: For conditional requests
#### Rendering
- **Template Compilation**: Jinja2 compiles templates automatically
- **Minimal Templating**: Simple templates render fast
- **Server-Side**: No client-side rendering overhead
### Resource Usage
#### Memory
- **Flask Process**: ~50MB base
- **SQLite**: ~10MB typical working set
- **Total**: < 100MB under normal load
#### Disk
- **Application**: ~5MB (code + dependencies)
- **Database**: ~1MB per 1000 notes
- **Notes**: ~5KB average per markdown file
- **Total**: Scales linearly with note count
#### CPU
- **Idle**: Near zero
- **Request Handling**: Minimal (no heavy processing)
- **Markdown Rendering**: Fast (pure Python)
- **Database Queries**: Indexed, sub-millisecond
## Deployment Architecture
**Current State**: ✅ IMPLEMENTED (v0.6.0 - v0.9.5)
**Technology**: Container-based with Gunicorn WSGI server
**CI/CD**: Gitea Actions automated builds (v0.9.5)
### Container Deployment (v0.6.0)
**Containerfile**: Multi-stage build using Python 3.11-slim base
- Stage 1: Build dependencies with uv package manager
- Stage 2: Production image with non-root user (starpunk:1000)
- Final size: ~174MB
**Features**:
- Health check endpoint: `/health` (validates database and filesystem)
- Gunicorn WSGI server with 4 workers (configurable)
- Log rotation (10MB max, 3 files)
- Resource limits (memory, CPU)
- SELinux compatibility (volume mount flags)
- Automatic database initialization on first run
**Container Orchestration**:
- Podman-compatible (rootless, userns=keep-id)
- Docker Compose compatible
- Volume mounts for data persistence (`./data:/app/data`)
- Port mapping (8080:8000)
- Environment variables for configuration
**CI/CD Pipeline** (v0.9.5):
- Gitea Actions workflow (.gitea/workflows/build-container.yml)
- Automated builds on push to main branch
- Manual trigger support
- Container registry push
- Docker and git dependencies installed
- Node.js support for GitHub Actions compatibility
### Single-Server Deployment
```
┌─────────────────────────────────────────────────┐
│ Internet │
└────────────────┬────────────────────────────────┘
│ Port 443 (HTTPS)
┌─────────────────────────────────────────────────┐
│ Nginx/Caddy (Reverse Proxy) │
│ - SSL/TLS termination │
│ - Static file serving │
│ - Rate limiting │
│ - Compression │
└────────────────┬────────────────────────────────┘
│ Port 8000 (HTTP)
┌─────────────────────────────────────────────────┐
│ Gunicorn (WSGI Server) │
│ - 4 worker processes │
│ - Process management │
│ - Load balancing (round-robin) │
└────────────────┬────────────────────────────────┘
│ WSGI
┌─────────────────────────────────────────────────┐
│ Flask Application │
│ - Request handling │
│ - Business logic │
│ - Template rendering │
└────────────────┬────────────────────────────────┘
┌────────────────────────────┬────────────────────┐
│ File System │ SQLite Database │
│ data/notes/ │ data/starpunk.db │
│ YYYY/MM/slug.md │ │
└────────────────────────────┴────────────────────┘
```
### Process Management (systemd)
```ini
[Unit]
Description=StarPunk CMS
After=network.target
[Service]
Type=notify
User=starpunk
WorkingDirectory=/opt/starpunk
Environment="PATH=/opt/starpunk/venv/bin"
ExecStart=/opt/starpunk/venv/bin/gunicorn -w 4 -b 127.0.0.1:8000 app:app
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
```
### Backup Strategy
#### Automated Daily Backup
```bash
#!/bin/bash
# backup.sh - Run daily via cron
DATE=$(date +%Y%m%d)
BACKUP_DIR="/backup/starpunk"
# Backup data directory (notes + database)
rsync -av /opt/starpunk/data/ "$BACKUP_DIR/$DATE/"
# Keep last 30 days
find "$BACKUP_DIR" -maxdepth 1 -type d -mtime +30 -exec rm -rf {} \;
```
#### Manual Backup
```bash
# Simple copy
cp -r /opt/starpunk/data /backup/starpunk-$(date +%Y%m%d)
# Or with compression
tar -czf starpunk-backup-$(date +%Y%m%d).tar.gz /opt/starpunk/data
```
### Restore Process
1. Stop application: `sudo systemctl stop starpunk`
2. Restore data directory: `rsync -av /backup/starpunk/20241118/ /opt/starpunk/data/`
3. Fix permissions: `chown -R starpunk:starpunk /opt/starpunk/data`
4. Start application: `sudo systemctl start starpunk`
5. Verify: Visit site, check recent notes
## Testing Strategy
### Test Pyramid
```
┌─────────────┐
/ \
/ Manual Tests \ Validation, Real Services
/───────────────── \
/ \
/ Integration Tests \ API Flows, Database + Files
/─────────────────────── \
/ \
/ Unit Tests \ Functions, Logic, Parsing
/───────────────────────────────\
```
### Unit Tests (pytest)
**Coverage**: Business logic, utilities, models
**Examples**:
- Slug generation and uniqueness
- Markdown rendering with various inputs
- Content hash calculation
- File path validation
- Token generation and verification
- Date formatting for RSS
- Micropub payload parsing
### Integration Tests
**Coverage**: Component interactions, full flows
**Examples**:
- Create note: file write + database insert
- Read note: database query + file read
- IndieLogin flow with mocked API
- Micropub creation with token validation
- RSS feed generation with multiple notes
- Session authentication on protected routes
### End-to-End Tests
**Coverage**: Full user workflows
**Examples**:
- Admin login via IndieLogin (mocked)
- Create note via web interface
- Publish note via Micropub client (mocked)
- View note on public site
- Verify RSS feed includes note
### Validation Tests
**Coverage**: Standards compliance
**Tools**:
- W3C HTML Validator (validate templates)
- W3C Feed Validator (validate RSS output)
- IndieWebify.me (verify microformats)
- Micropub.rocks (test Micropub compliance)
### Manual Tests
**Coverage**: Real-world usage
**Examples**:
- Authenticate with real indielogin.com
- Publish from actual Micropub client (Quill, Indigenous)
- Subscribe to feed in actual RSS reader
- Browser compatibility (Chrome, Firefox, Safari, mobile)
- Accessibility with screen reader
## Monitoring and Observability
### Logging Strategy
#### Application Logs
```python
# Structured logging
import logging
logger = logging.getLogger(__name__)
# Info: Normal operations
logger.info("Note created", extra={
"slug": slug,
"published": published,
"user": session.me
})
# Warning: Recoverable issues
logger.warning("State token expired", extra={
"state": state,
"age": age_seconds
})
# Error: Failed operations
logger.error("File write failed", extra={
"path": file_path,
"error": str(e)
})
```
#### Log Levels
- **DEBUG**: Development only (verbose)
- **INFO**: Normal operations (note creation, auth success)
- **WARNING**: Unusual but handled (expired tokens, invalid input)
- **ERROR**: Failed operations (file I/O errors, database errors)
- **CRITICAL**: System failures (database unreachable)
#### Log Destinations
- **Development**: Console (stdout)
- **Production**: File rotation (logrotate) + optional syslog
### Metrics (Optional for V2)
**Simple Metrics** (if desired):
- Note count (query database)
- Request count (nginx logs)
- Error rate (grep application logs)
- Response times (nginx logs)
**Advanced Metrics** (V2):
- Prometheus exporter
- Grafana dashboard
- Alert on error rate spike
### Health Checks
```python
@app.route('/health')
def health_check():
"""Simple health check for monitoring"""
try:
# Check database
db.execute("SELECT 1").fetchone()
# Check file system
os.path.exists(DATA_PATH)
return {"status": "ok"}, 200
except Exception as e:
return {"status": "error", "detail": str(e)}, 500
```
## Migration and Evolution
### V1 to V2 Migration
#### Database Schema Changes
```sql
-- Add new column with default
ALTER TABLE notes ADD COLUMN tags TEXT DEFAULT '';
-- Create new table
CREATE TABLE tags (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE NOT NULL
);
-- Migration script updates existing notes
```
#### File Format Evolution
**V1**: Pure markdown
**V2** (if needed): Add optional frontmatter
```markdown
---
tags: indieweb, cms
---
Note content here
```
**Backward Compatibility**: Parser checks for frontmatter, falls back to pure markdown.
#### API Versioning
```
# V1 (current)
GET /api/notes
# V2 (future)
GET /api/v2/notes # New features
GET /api/notes # Still works, returns V1 response
```
### Data Export/Import
#### Export Formats
1. **Markdown Bundle**: Zip of all notes (already portable)
2. **JSON Export**: Notes + metadata
```json
{
"version": "1.0",
"exported_at": "2024-11-18T12:00:00Z",
"notes": [
{
"slug": "my-note",
"content": "Note content...",
"created_at": "2024-11-01T12:00:00Z",
"published": true
}
]
}
```
3. **RSS Archive**: Existing feed.xml
#### Import (V2)
- From JSON export
- From WordPress XML
- From markdown directory
- From other IndieWeb CMSs
## Implementation Status (v0.9.5)
### ✅ Fully Implemented Features
1. **Note Management** (v0.3.0)
- Full CRUD operations (create, read, update, delete)
- Hybrid file+database storage with sync
- Soft and hard delete support
- Markdown rendering
- Slug generation with uniqueness
2. **Authentication** (v0.8.0)
- IndieLogin.com OAuth 2.0 with PKCE
- Session management with token hashing
- CSRF protection with state tokens
- Development mode authentication bypass
3. **Web Interface** (v0.5.2)
- Public site: homepage and note permalinks
- Admin dashboard with note management
- Login/logout flows
- Responsive design
- Microformats2 markup (h-entry, h-card, h-feed)
4. **RSS Feed** (v0.6.0)
- RSS 2.0 compliant feed generation
- Auto-discovery links
- Server-side caching
- ETag support
5. **Container Deployment** (v0.6.0)
- Multi-stage Containerfile
- Gunicorn WSGI server
- Health check endpoint
- Volume persistence
6. **CI/CD Pipeline** (v0.9.5)
- Gitea Actions workflow
- Automated container builds
- Registry push
7. **Database Migrations** (v0.9.0)
- Automatic migration system
- Fresh database detection
- Legacy database migration
- Migration tracking
8. **Development Tools**
- uv package manager for Python
- Comprehensive test suite (87% coverage)
- Black code formatting
- Flake8 linting
### ❌ Not Yet Implemented (Blocking V1)
1. **Micropub Endpoint**
- POST /api/micropub for creating notes
- GET /api/micropub?q=config
- GET /api/micropub?q=source
- Token validation
- **Status**: Critical blocker for V1 release
2. **IndieAuth Token Endpoint**
- Token issuance for Micropub clients
- **Alternative**: May use external IndieAuth server
### ⚠️ Partially Implemented
1. **Standards Validation**
- HTML5: Markup exists, not validated
- Microformats: Markup exists, not validated
- RSS: Validated and compliant
- Micropub: N/A (not implemented)
2. **REST API** (Optional)
- JSON API for notes CRUD
- **Status**: Deferred to V2 (admin interface works without it)
## Success Metrics
The architecture is successful if it enables:
1. **Fast Development**: < 1 week to implement V1 - ✅ **ACHIEVED** (~35 hours, 70% complete)
2. **Easy Deployment**: < 5 minutes to get running - ✅ **ACHIEVED** (containerized)
3. **Low Maintenance**: Runs for months without intervention - ✅ **ACHIEVED** (automated migrations)
4. **High Performance**: All responses < 300ms - ✅ **ACHIEVED**
5. **Data Ownership**: User has direct access to all content - ✅ **ACHIEVED** (file-based storage)
6. **Standards Compliance**: Passes all validators - ⚠️ **PARTIAL** (RSS yes, others pending)
7. **Extensibility**: Can add V2 features without rewrite - ✅ **ACHIEVED** (migration system ready)
## References
### Internal Documentation
- [Technology Stack](/home/phil/Projects/starpunk/docs/architecture/technology-stack.md)
- [ADR-001: Python Web Framework](/home/phil/Projects/starpunk/docs/decisions/ADR-001-python-web-framework.md)
- [ADR-002: Flask Extensions](/home/phil/Projects/starpunk/docs/decisions/ADR-002-flask-extensions.md)
- [ADR-003: Frontend Technology](/home/phil/Projects/starpunk/docs/decisions/ADR-003-frontend-technology.md)
- [ADR-004: File-Based Storage](/home/phil/Projects/starpunk/docs/decisions/ADR-004-file-based-note-storage.md)
- [ADR-005: IndieLogin Authentication](/home/phil/Projects/starpunk/docs/decisions/ADR-005-indielogin-authentication.md)
### External Standards
- [IndieWeb](https://indieweb.org/)
- [IndieAuth Spec](https://www.w3.org/TR/indieauth/)
- [Micropub Spec](https://micropub.spec.indieweb.org/)
- [Microformats2](http://microformats.org/wiki/h-entry)
- [RSS 2.0](https://www.rssboard.org/rss-specification)
- [Flask Documentation](https://flask.palletsprojects.com/)