Files
StarPunk/docs/decisions/ADR-053-performance-monitoring-strategy.md
Phil Skentelbery e589f5bd6c docs: Fix ADR numbering conflicts and create comprehensive documentation indices
This commit resolves all documentation issues identified in the comprehensive review:

CRITICAL FIXES:
- Renumbered duplicate ADRs to eliminate conflicts:
  * ADR-022-migration-race-condition-fix → ADR-037
  * ADR-022-syndication-formats → ADR-038
  * ADR-023-microformats2-compliance → ADR-040
  * ADR-027-versioning-strategy-for-authorization-removal → ADR-042
  * ADR-030-CORRECTED-indieauth-endpoint-discovery → ADR-043
  * ADR-031-endpoint-discovery-implementation → ADR-044

- Updated all cross-references to renumbered ADRs in:
  * docs/projectplan/ROADMAP.md
  * docs/reports/v1.0.0-rc.5-migration-race-condition-implementation.md
  * docs/reports/2025-11-24-endpoint-discovery-analysis.md
  * docs/decisions/ADR-043-CORRECTED-indieauth-endpoint-discovery.md
  * docs/decisions/ADR-044-endpoint-discovery-implementation.md

- Updated README.md version from 1.0.0 to 1.1.0
- Tracked ADR-021-indieauth-provider-strategy.md in git

DOCUMENTATION IMPROVEMENTS:
- Created comprehensive INDEX.md files for all docs/ subdirectories:
  * docs/architecture/INDEX.md (28 documents indexed)
  * docs/decisions/INDEX.md (55 ADRs indexed with topical grouping)
  * docs/design/INDEX.md (phase plans and feature designs)
  * docs/standards/INDEX.md (9 standards with compliance checklist)
  * docs/reports/INDEX.md (57 implementation reports)
  * docs/deployment/INDEX.md (deployment guides)
  * docs/examples/INDEX.md (code samples and usage patterns)
  * docs/migration/INDEX.md (version migration guides)
  * docs/releases/INDEX.md (release documentation)
  * docs/reviews/INDEX.md (architectural reviews)
  * docs/security/INDEX.md (security documentation)

- Updated CLAUDE.md with complete folder descriptions including:
  * docs/migration/
  * docs/releases/
  * docs/security/

VERIFICATION:
- All ADR numbers now sequential and unique (50 total ADRs)
- No duplicate ADR numbers remain
- All cross-references updated and verified
- Documentation structure consistent and well-organized

These changes improve documentation discoverability, maintainability, and
ensure proper version tracking. All index files follow consistent format
with clear navigation guidance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 13:28:56 -07:00

7.2 KiB

ADR-053: Performance Monitoring Strategy

Status

Accepted

Context

StarPunk v1.1.1 introduces performance monitoring to help operators understand system behavior in production. Currently, we have no visibility into:

  • Database query performance
  • Memory usage patterns
  • Request processing times
  • Bottlenecks and slow operations

We need a lightweight, zero-dependency monitoring solution that provides actionable insights without impacting performance.

Decision

Implement a built-in performance monitoring system using Python's standard library, with optional detailed tracking controlled by configuration.

Architecture Overview

Request → Middleware (timing) → Handler
              ↓                    ↓
         Context Manager      Decorators
              ↓                    ↓
         Metrics Store ← Database Hooks
              ↓
        Admin Dashboard

Core Components

1. Metrics Collector

Location: starpunk/monitoring/collector.py

Responsibilities:

  • Collect timing data
  • Track memory usage
  • Store recent metrics in memory
  • Provide aggregation functions

Data Structure:

@dataclass
class Metric:
    timestamp: float
    category: str  # "db", "http", "function"
    operation: str  # specific operation name
    duration: float  # in seconds
    metadata: dict  # additional context

2. Database Performance Tracking

Location: starpunk/monitoring/db_monitor.py

Features:

  • Query execution timing
  • Slow query detection
  • Query pattern analysis
  • Connection pool monitoring

Implementation via SQLite callbacks:

# Wrap database operations
with monitor.track_query("SELECT", "notes"):
    cursor.execute(query)

3. Memory Tracking

Location: starpunk/monitoring/memory.py

Track:

  • Process memory (RSS)
  • Memory growth over time
  • Per-request memory delta
  • Memory high water mark

Uses resource module (stdlib).

4. Request Performance

Location: starpunk/monitoring/http.py

Track:

  • Request processing time
  • Response size
  • Status code distribution
  • Slowest endpoints

5. Admin Dashboard

Location: /admin/performance

Display:

  • Real-time metrics (last 15 minutes)
  • Slow query log
  • Memory usage graph
  • Endpoint performance table
  • Database statistics

Data Retention

In-memory circular buffer approach:

  • Last 1000 metrics retained
  • Automatic old data eviction
  • No persistent storage (privacy/simplicity)
  • Reset on restart

Performance Overhead

Target: <1% overhead when enabled

Strategies:

  • Sampling for high-frequency operations
  • Lazy computation of aggregates
  • Minimal memory footprint (1MB max)
  • Conditional compilation via config

Rationale

Why Built-in Monitoring?

  1. Zero Dependencies: Uses only Python stdlib
  2. Privacy: No external services
  3. Simplicity: No complex setup
  4. Integrated: Direct access to internals
  5. Lightweight: Minimal overhead

Why Not External Tools?

Prometheus/Grafana:

  • Requires external services
  • Complex setup
  • Overkill for single-user system

APM Services (New Relic, DataDog):

  • Privacy concerns
  • Subscription costs
  • Network dependency
  • Too heavy for our needs

OpenTelemetry:

  • Large dependency
  • Complex configuration
  • Designed for distributed systems

Design Principles

  1. Opt-in: Disabled by default
  2. Lightweight: Minimal resource usage
  3. Actionable: Focus on useful metrics
  4. Temporary: No permanent storage
  5. Private: No external data transmission

Consequences

Positive

  1. Production Visibility: Understand behavior under load
  2. Performance Debugging: Identify bottlenecks quickly
  3. No Dependencies: Pure Python solution
  4. Privacy Preserving: Data stays local
  5. Simple Deployment: No additional services

Negative

  1. Limited History: Only recent data available
  2. Memory Usage: ~1MB for metrics buffer
  3. No Alerting: Manual monitoring required
  4. Single Node: No distributed tracing

Mitigations

  1. Export capability for external tools
  2. Configurable buffer size
  3. Webhook support for alerts (future)
  4. Focus on most valuable metrics

Alternatives Considered

1. Logging-based Monitoring

Approach: Parse performance data from logs Pros: Simple, no new code Cons: Log parsing complexity, no real-time view Decision: Dedicated monitoring is cleaner

2. External Monitoring Service

Approach: Use service like Sentry Pros: Full-featured, alerting included Cons: Privacy, cost, complexity Decision: Violates self-hosted principle

3. Prometheus Exporter

Approach: Expose /metrics endpoint Pros: Standard, good tooling Cons: Requires Prometheus setup Decision: Too complex for target users

4. No Monitoring

Approach: Rely on logs and external tools Pros: Simplest Cons: Poor production visibility Decision: v1.1.1 specifically targets production readiness

Implementation Details

Instrumentation Points

  1. Database Layer

    • All queries automatically timed
    • Connection acquisition/release
    • Transaction duration
    • Migration execution
  2. HTTP Layer

    • Middleware wraps all requests
    • Per-endpoint timing
    • Static file serving
    • Error handling
  3. Core Functions

    • Note creation/update
    • Search operations
    • RSS generation
    • Authentication flow

Performance Dashboard Layout

Performance Dashboard
═══════════════════

Overview
--------
Uptime: 5d 3h 15m
Requests: 10,234
Avg Response: 45ms
Memory: 128MB

Slow Queries (>1s)
------------------
[timestamp] SELECT ... FROM notes (1.2s)
[timestamp] UPDATE ... SET ... (1.1s)

Endpoint Performance
-------------------
GET /          : avg 23ms, p99 45ms
GET /notes/:id : avg 35ms, p99 67ms
POST /micropub : avg 125ms, p99 234ms

Memory Usage
-----------
[ASCII graph showing last 15 minutes]

Database Stats
-------------
Pool Size: 3/5
Queries/sec: 4.2
Cache Hit Rate: 87%

Configuration Options

# All under STARPUNK_PERF_* prefix
MONITORING_ENABLED = False  # Master switch
SLOW_QUERY_THRESHOLD = 1.0  # seconds
LOG_QUERIES = False  # Log all queries
MEMORY_TRACKING = False  # Track memory usage
SAMPLE_RATE = 1.0  # 1.0 = all, 0.1 = 10%
BUFFER_SIZE = 1000  # Number of metrics
DASHBOARD_ENABLED = True  # Enable web UI

Testing Strategy

  1. Unit Tests: Mock collectors, verify metrics
  2. Integration Tests: End-to-end monitoring flow
  3. Performance Tests: Verify low overhead
  4. Load Tests: Behavior under stress

Security Considerations

  1. Dashboard requires admin authentication
  2. No sensitive data in metrics
  3. No external data transmission
  4. Metrics cleared on logout
  5. Rate limiting on dashboard endpoint

Migration Path

No migration required - monitoring is opt-in via configuration.

Future Enhancements

v1.2.0 and beyond:

  • Metric export (CSV/JSON)
  • Alert thresholds
  • Historical trending
  • Custom metric points
  • Plugin architecture

References

Document History

  • 2025-11-25: Initial draft for v1.1.1 release planning