This commit resolves all documentation issues identified in the comprehensive review: CRITICAL FIXES: - Renumbered duplicate ADRs to eliminate conflicts: * ADR-022-migration-race-condition-fix → ADR-037 * ADR-022-syndication-formats → ADR-038 * ADR-023-microformats2-compliance → ADR-040 * ADR-027-versioning-strategy-for-authorization-removal → ADR-042 * ADR-030-CORRECTED-indieauth-endpoint-discovery → ADR-043 * ADR-031-endpoint-discovery-implementation → ADR-044 - Updated all cross-references to renumbered ADRs in: * docs/projectplan/ROADMAP.md * docs/reports/v1.0.0-rc.5-migration-race-condition-implementation.md * docs/reports/2025-11-24-endpoint-discovery-analysis.md * docs/decisions/ADR-043-CORRECTED-indieauth-endpoint-discovery.md * docs/decisions/ADR-044-endpoint-discovery-implementation.md - Updated README.md version from 1.0.0 to 1.1.0 - Tracked ADR-021-indieauth-provider-strategy.md in git DOCUMENTATION IMPROVEMENTS: - Created comprehensive INDEX.md files for all docs/ subdirectories: * docs/architecture/INDEX.md (28 documents indexed) * docs/decisions/INDEX.md (55 ADRs indexed with topical grouping) * docs/design/INDEX.md (phase plans and feature designs) * docs/standards/INDEX.md (9 standards with compliance checklist) * docs/reports/INDEX.md (57 implementation reports) * docs/deployment/INDEX.md (deployment guides) * docs/examples/INDEX.md (code samples and usage patterns) * docs/migration/INDEX.md (version migration guides) * docs/releases/INDEX.md (release documentation) * docs/reviews/INDEX.md (architectural reviews) * docs/security/INDEX.md (security documentation) - Updated CLAUDE.md with complete folder descriptions including: * docs/migration/ * docs/releases/ * docs/security/ VERIFICATION: - All ADR numbers now sequential and unique (50 total ADRs) - No duplicate ADR numbers remain - All cross-references updated and verified - Documentation structure consistent and well-organized These changes improve documentation discoverability, maintainability, and ensure proper version tracking. All index files follow consistent format with clear navigation guidance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
7.2 KiB
ADR-053: Performance Monitoring Strategy
Status
Accepted
Context
StarPunk v1.1.1 introduces performance monitoring to help operators understand system behavior in production. Currently, we have no visibility into:
- Database query performance
- Memory usage patterns
- Request processing times
- Bottlenecks and slow operations
We need a lightweight, zero-dependency monitoring solution that provides actionable insights without impacting performance.
Decision
Implement a built-in performance monitoring system using Python's standard library, with optional detailed tracking controlled by configuration.
Architecture Overview
Request → Middleware (timing) → Handler
↓ ↓
Context Manager Decorators
↓ ↓
Metrics Store ← Database Hooks
↓
Admin Dashboard
Core Components
1. Metrics Collector
Location: starpunk/monitoring/collector.py
Responsibilities:
- Collect timing data
- Track memory usage
- Store recent metrics in memory
- Provide aggregation functions
Data Structure:
@dataclass
class Metric:
timestamp: float
category: str # "db", "http", "function"
operation: str # specific operation name
duration: float # in seconds
metadata: dict # additional context
2. Database Performance Tracking
Location: starpunk/monitoring/db_monitor.py
Features:
- Query execution timing
- Slow query detection
- Query pattern analysis
- Connection pool monitoring
Implementation via SQLite callbacks:
# Wrap database operations
with monitor.track_query("SELECT", "notes"):
cursor.execute(query)
3. Memory Tracking
Location: starpunk/monitoring/memory.py
Track:
- Process memory (RSS)
- Memory growth over time
- Per-request memory delta
- Memory high water mark
Uses resource module (stdlib).
4. Request Performance
Location: starpunk/monitoring/http.py
Track:
- Request processing time
- Response size
- Status code distribution
- Slowest endpoints
5. Admin Dashboard
Location: /admin/performance
Display:
- Real-time metrics (last 15 minutes)
- Slow query log
- Memory usage graph
- Endpoint performance table
- Database statistics
Data Retention
In-memory circular buffer approach:
- Last 1000 metrics retained
- Automatic old data eviction
- No persistent storage (privacy/simplicity)
- Reset on restart
Performance Overhead
Target: <1% overhead when enabled
Strategies:
- Sampling for high-frequency operations
- Lazy computation of aggregates
- Minimal memory footprint (1MB max)
- Conditional compilation via config
Rationale
Why Built-in Monitoring?
- Zero Dependencies: Uses only Python stdlib
- Privacy: No external services
- Simplicity: No complex setup
- Integrated: Direct access to internals
- Lightweight: Minimal overhead
Why Not External Tools?
Prometheus/Grafana:
- Requires external services
- Complex setup
- Overkill for single-user system
APM Services (New Relic, DataDog):
- Privacy concerns
- Subscription costs
- Network dependency
- Too heavy for our needs
OpenTelemetry:
- Large dependency
- Complex configuration
- Designed for distributed systems
Design Principles
- Opt-in: Disabled by default
- Lightweight: Minimal resource usage
- Actionable: Focus on useful metrics
- Temporary: No permanent storage
- Private: No external data transmission
Consequences
Positive
- Production Visibility: Understand behavior under load
- Performance Debugging: Identify bottlenecks quickly
- No Dependencies: Pure Python solution
- Privacy Preserving: Data stays local
- Simple Deployment: No additional services
Negative
- Limited History: Only recent data available
- Memory Usage: ~1MB for metrics buffer
- No Alerting: Manual monitoring required
- Single Node: No distributed tracing
Mitigations
- Export capability for external tools
- Configurable buffer size
- Webhook support for alerts (future)
- Focus on most valuable metrics
Alternatives Considered
1. Logging-based Monitoring
Approach: Parse performance data from logs Pros: Simple, no new code Cons: Log parsing complexity, no real-time view Decision: Dedicated monitoring is cleaner
2. External Monitoring Service
Approach: Use service like Sentry Pros: Full-featured, alerting included Cons: Privacy, cost, complexity Decision: Violates self-hosted principle
3. Prometheus Exporter
Approach: Expose /metrics endpoint Pros: Standard, good tooling Cons: Requires Prometheus setup Decision: Too complex for target users
4. No Monitoring
Approach: Rely on logs and external tools Pros: Simplest Cons: Poor production visibility Decision: v1.1.1 specifically targets production readiness
Implementation Details
Instrumentation Points
-
Database Layer
- All queries automatically timed
- Connection acquisition/release
- Transaction duration
- Migration execution
-
HTTP Layer
- Middleware wraps all requests
- Per-endpoint timing
- Static file serving
- Error handling
-
Core Functions
- Note creation/update
- Search operations
- RSS generation
- Authentication flow
Performance Dashboard Layout
Performance Dashboard
═══════════════════
Overview
--------
Uptime: 5d 3h 15m
Requests: 10,234
Avg Response: 45ms
Memory: 128MB
Slow Queries (>1s)
------------------
[timestamp] SELECT ... FROM notes (1.2s)
[timestamp] UPDATE ... SET ... (1.1s)
Endpoint Performance
-------------------
GET / : avg 23ms, p99 45ms
GET /notes/:id : avg 35ms, p99 67ms
POST /micropub : avg 125ms, p99 234ms
Memory Usage
-----------
[ASCII graph showing last 15 minutes]
Database Stats
-------------
Pool Size: 3/5
Queries/sec: 4.2
Cache Hit Rate: 87%
Configuration Options
# All under STARPUNK_PERF_* prefix
MONITORING_ENABLED = False # Master switch
SLOW_QUERY_THRESHOLD = 1.0 # seconds
LOG_QUERIES = False # Log all queries
MEMORY_TRACKING = False # Track memory usage
SAMPLE_RATE = 1.0 # 1.0 = all, 0.1 = 10%
BUFFER_SIZE = 1000 # Number of metrics
DASHBOARD_ENABLED = True # Enable web UI
Testing Strategy
- Unit Tests: Mock collectors, verify metrics
- Integration Tests: End-to-end monitoring flow
- Performance Tests: Verify low overhead
- Load Tests: Behavior under stress
Security Considerations
- Dashboard requires admin authentication
- No sensitive data in metrics
- No external data transmission
- Metrics cleared on logout
- Rate limiting on dashboard endpoint
Migration Path
No migration required - monitoring is opt-in via configuration.
Future Enhancements
v1.2.0 and beyond:
- Metric export (CSV/JSON)
- Alert thresholds
- Historical trending
- Custom metric points
- Plugin architecture
References
Document History
- 2025-11-25: Initial draft for v1.1.1 release planning