Files
StarPunk/docs/design/v1.1.1/performance-monitoring-spec.md
Phil Skentelbery e589f5bd6c docs: Fix ADR numbering conflicts and create comprehensive documentation indices
This commit resolves all documentation issues identified in the comprehensive review:

CRITICAL FIXES:
- Renumbered duplicate ADRs to eliminate conflicts:
  * ADR-022-migration-race-condition-fix → ADR-037
  * ADR-022-syndication-formats → ADR-038
  * ADR-023-microformats2-compliance → ADR-040
  * ADR-027-versioning-strategy-for-authorization-removal → ADR-042
  * ADR-030-CORRECTED-indieauth-endpoint-discovery → ADR-043
  * ADR-031-endpoint-discovery-implementation → ADR-044

- Updated all cross-references to renumbered ADRs in:
  * docs/projectplan/ROADMAP.md
  * docs/reports/v1.0.0-rc.5-migration-race-condition-implementation.md
  * docs/reports/2025-11-24-endpoint-discovery-analysis.md
  * docs/decisions/ADR-043-CORRECTED-indieauth-endpoint-discovery.md
  * docs/decisions/ADR-044-endpoint-discovery-implementation.md

- Updated README.md version from 1.0.0 to 1.1.0
- Tracked ADR-021-indieauth-provider-strategy.md in git

DOCUMENTATION IMPROVEMENTS:
- Created comprehensive INDEX.md files for all docs/ subdirectories:
  * docs/architecture/INDEX.md (28 documents indexed)
  * docs/decisions/INDEX.md (55 ADRs indexed with topical grouping)
  * docs/design/INDEX.md (phase plans and feature designs)
  * docs/standards/INDEX.md (9 standards with compliance checklist)
  * docs/reports/INDEX.md (57 implementation reports)
  * docs/deployment/INDEX.md (deployment guides)
  * docs/examples/INDEX.md (code samples and usage patterns)
  * docs/migration/INDEX.md (version migration guides)
  * docs/releases/INDEX.md (release documentation)
  * docs/reviews/INDEX.md (architectural reviews)
  * docs/security/INDEX.md (security documentation)

- Updated CLAUDE.md with complete folder descriptions including:
  * docs/migration/
  * docs/releases/
  * docs/security/

VERIFICATION:
- All ADR numbers now sequential and unique (50 total ADRs)
- No duplicate ADR numbers remain
- All cross-references updated and verified
- Documentation structure consistent and well-organized

These changes improve documentation discoverability, maintainability, and
ensure proper version tracking. All index files follow consistent format
with clear navigation guidance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 13:28:56 -07:00

14 KiB

Performance Monitoring Foundation Specification

Overview

The performance monitoring foundation provides operators with visibility into StarPunk's runtime behavior, helping identify bottlenecks, track resource usage, and ensure optimal performance in production.

Requirements

Functional Requirements

  1. Timing Instrumentation

    • Measure execution time for key operations
    • Track request processing duration
    • Monitor database query execution time
    • Measure template rendering time
    • Track static file serving time
  2. Database Performance Logging

    • Log all queries when enabled
    • Detect and warn about slow queries
    • Track connection pool usage
    • Monitor transaction duration
    • Count query frequency by type
  3. Memory Usage Tracking

    • Monitor process RSS memory
    • Track memory growth over time
    • Detect memory leaks
    • Per-request memory delta
    • Memory high water mark
  4. Performance Dashboard

    • Real-time metrics display
    • Historical data (last 15 minutes)
    • Slow query log
    • Memory usage visualization
    • Endpoint performance table

Non-Functional Requirements

  1. Performance Impact

    • Monitoring overhead <1% when enabled
    • Zero impact when disabled
    • Efficient memory usage (<1MB for metrics)
    • No blocking operations
  2. Usability

    • Simple enable/disable via configuration
    • Clear, actionable metrics
    • Self-explanatory dashboard
    • No external dependencies

Design

Architecture

┌──────────────────────────────────────┐
│         HTTP Request                  │
│              ↓                        │
│    Performance Middleware             │
│         (start timer)                 │
│              ↓                        │
│    ┌─────────────────┐               │
│    │  Request Handler │               │
│    │        ↓         │               │
│    │  Database Layer  │←── Query Monitor
│    │        ↓         │               │
│    │   Business Logic │←── Function Timer
│    │        ↓         │               │
│    │  Response Build  │               │
│    └─────────────────┘               │
│              ↓                        │
│     Performance Middleware            │
│         (stop timer)                  │
│              ↓                        │
│     Metrics Collector ← Memory Monitor
│              ↓                        │
│      Circular Buffer                  │
│              ↓                        │
│      Admin Dashboard                  │
└──────────────────────────────────────┘

Data Model

from dataclasses import dataclass
from typing import Optional, Dict, Any
from datetime import datetime
from collections import deque

@dataclass
class PerformanceMetric:
    """Single performance measurement"""
    timestamp: datetime
    category: str  # 'http', 'db', 'function', 'memory'
    operation: str  # Specific operation name
    duration_ms: Optional[float]  # For timed operations
    value: Optional[float]  # For measurements
    metadata: Dict[str, Any]  # Additional context

class MetricsBuffer:
    """Circular buffer for metrics storage"""

    def __init__(self, max_size: int = 1000):
        self.metrics = deque(maxlen=max_size)
        self.slow_queries = deque(maxlen=100)

    def add_metric(self, metric: PerformanceMetric):
        """Add metric to buffer"""
        self.metrics.append(metric)

        # Special handling for slow queries
        if (metric.category == 'db' and
            metric.duration_ms > config.PERF_SLOW_QUERY_THRESHOLD * 1000):
            self.slow_queries.append(metric)

    def get_recent(self, seconds: int = 900) -> List[PerformanceMetric]:
        """Get metrics from last N seconds"""
        cutoff = datetime.now() - timedelta(seconds=seconds)
        return [m for m in self.metrics if m.timestamp > cutoff]

    def get_summary(self) -> Dict[str, Any]:
        """Get summary statistics"""
        recent = self.get_recent()

        # Group by category and operation
        summary = defaultdict(lambda: {
            'count': 0,
            'total_ms': 0,
            'avg_ms': 0,
            'max_ms': 0,
            'p95_ms': 0,
            'p99_ms': 0
        })

        # Calculate statistics...
        return dict(summary)

Instrumentation Implementation

Database Query Monitoring

import sqlite3
import time
from contextlib import contextmanager

@contextmanager
def monitored_connection():
    """Database connection with monitoring"""
    conn = sqlite3.connect(DATABASE_PATH)

    if config.PERF_MONITORING_ENABLED:
        # Set trace callback for query logging
        def trace_callback(statement):
            start_time = time.perf_counter()

            # Execute query (via monkey-patching)
            original_execute = conn.execute

            def monitored_execute(sql, params=None):
                result = original_execute(sql, params)
                duration = time.perf_counter() - start_time

                metric = PerformanceMetric(
                    timestamp=datetime.now(),
                    category='db',
                    operation=sql.split()[0].upper(),  # SELECT, INSERT, etc
                    duration_ms=duration * 1000,
                    metadata={
                        'query': sql if config.PERF_LOG_QUERIES else None,
                        'params_count': len(params) if params else 0
                    }
                )
                metrics_buffer.add_metric(metric)

                if duration > config.PERF_SLOW_QUERY_THRESHOLD:
                    logger.warning(
                        "Slow query detected",
                        extra={
                            'query': sql,
                            'duration_ms': duration * 1000
                        }
                    )

                return result

            conn.execute = monitored_execute

        conn.set_trace_callback(trace_callback)

    yield conn
    conn.close()

HTTP Request Monitoring

from flask import g, request
import time

@app.before_request
def start_request_timer():
    """Start timing the request"""
    if config.PERF_MONITORING_ENABLED:
        g.start_time = time.perf_counter()
        g.start_memory = get_memory_usage()

@app.after_request
def end_request_timer(response):
    """End timing and record metrics"""
    if config.PERF_MONITORING_ENABLED and hasattr(g, 'start_time'):
        duration = time.perf_counter() - g.start_time
        memory_delta = get_memory_usage() - g.start_memory

        metric = PerformanceMetric(
            timestamp=datetime.now(),
            category='http',
            operation=f"{request.method} {request.endpoint}",
            duration_ms=duration * 1000,
            metadata={
                'method': request.method,
                'path': request.path,
                'status': response.status_code,
                'size': len(response.get_data()),
                'memory_delta': memory_delta
            }
        )
        metrics_buffer.add_metric(metric)

    return response

Memory Monitoring

import resource
import threading
import time

class MemoryMonitor:
    """Background thread for memory monitoring"""

    def __init__(self):
        self.running = False
        self.thread = None
        self.high_water_mark = 0

    def start(self):
        """Start memory monitoring"""
        if not config.PERF_MEMORY_TRACKING:
            return

        self.running = True
        self.thread = threading.Thread(target=self._monitor)
        self.thread.daemon = True
        self.thread.start()

    def _monitor(self):
        """Monitor memory usage"""
        while self.running:
            memory_mb = get_memory_usage()
            self.high_water_mark = max(self.high_water_mark, memory_mb)

            metric = PerformanceMetric(
                timestamp=datetime.now(),
                category='memory',
                operation='rss',
                value=memory_mb,
                metadata={
                    'high_water_mark': self.high_water_mark
                }
            )
            metrics_buffer.add_metric(metric)

            time.sleep(10)  # Check every 10 seconds

def get_memory_usage() -> float:
    """Get current memory usage in MB"""
    usage = resource.getrusage(resource.RUSAGE_SELF)
    return usage.ru_maxrss / 1024  # Convert KB to MB

Performance Dashboard

Dashboard Route

@app.route('/admin/performance')
@require_admin
def performance_dashboard():
    """Display performance metrics"""
    if not config.PERF_MONITORING_ENABLED:
        return render_template('admin/performance_disabled.html')

    summary = metrics_buffer.get_summary()
    slow_queries = list(metrics_buffer.slow_queries)
    memory_data = get_memory_graph_data()

    return render_template(
        'admin/performance.html',
        summary=summary,
        slow_queries=slow_queries,
        memory_data=memory_data,
        uptime=get_uptime(),
        config={
            'slow_threshold': config.PERF_SLOW_QUERY_THRESHOLD,
            'monitoring_enabled': config.PERF_MONITORING_ENABLED,
            'memory_tracking': config.PERF_MEMORY_TRACKING
        }
    )

Dashboard Template Structure

<div class="performance-dashboard">
  <h2>Performance Monitoring</h2>

  <!-- Overview Stats -->
  <div class="stats-grid">
    <div class="stat">
      <h3>Uptime</h3>
      <p>{{ uptime }}</p>
    </div>
    <div class="stat">
      <h3>Total Requests</h3>
      <p>{{ summary.http.count }}</p>
    </div>
    <div class="stat">
      <h3>Avg Response Time</h3>
      <p>{{ summary.http.avg_ms|round(2) }}ms</p>
    </div>
    <div class="stat">
      <h3>Memory Usage</h3>
      <p>{{ current_memory }}MB</p>
    </div>
  </div>

  <!-- Slow Queries -->
  <div class="slow-queries">
    <h3>Slow Queries (&gt;{{ config.slow_threshold }}s)</h3>
    <table>
      <thead>
        <tr>
          <th>Time</th>
          <th>Duration</th>
          <th>Query</th>
        </tr>
      </thead>
      <tbody>
        {% for query in slow_queries %}
        <tr>
          <td>{{ query.timestamp|timeago }}</td>
          <td>{{ query.duration_ms|round(2) }}ms</td>
          <td><code>{{ query.metadata.query|truncate(100) }}</code></td>
        </tr>
        {% endfor %}
      </tbody>
    </table>
  </div>

  <!-- Endpoint Performance -->
  <div class="endpoint-performance">
    <h3>Endpoint Performance</h3>
    <table>
      <thead>
        <tr>
          <th>Endpoint</th>
          <th>Calls</th>
          <th>Avg (ms)</th>
          <th>P95 (ms)</th>
          <th>P99 (ms)</th>
        </tr>
      </thead>
      <tbody>
        {% for endpoint, stats in summary.endpoints.items() %}
        <tr>
          <td>{{ endpoint }}</td>
          <td>{{ stats.count }}</td>
          <td>{{ stats.avg_ms|round(2) }}</td>
          <td>{{ stats.p95_ms|round(2) }}</td>
          <td>{{ stats.p99_ms|round(2) }}</td>
        </tr>
        {% endfor %}
      </tbody>
    </table>
  </div>

  <!-- Memory Graph -->
  <div class="memory-graph">
    <h3>Memory Usage (Last 15 Minutes)</h3>
    <canvas id="memory-chart"></canvas>
  </div>
</div>

Configuration Options

# Performance monitoring configuration
PERF_MONITORING_ENABLED = Config.get_bool("STARPUNK_PERF_MONITORING_ENABLED", False)
PERF_SLOW_QUERY_THRESHOLD = Config.get_float("STARPUNK_PERF_SLOW_QUERY_THRESHOLD", 1.0)
PERF_LOG_QUERIES = Config.get_bool("STARPUNK_PERF_LOG_QUERIES", False)
PERF_MEMORY_TRACKING = Config.get_bool("STARPUNK_PERF_MEMORY_TRACKING", False)
PERF_BUFFER_SIZE = Config.get_int("STARPUNK_PERF_BUFFER_SIZE", 1000)
PERF_SAMPLE_RATE = Config.get_float("STARPUNK_PERF_SAMPLE_RATE", 1.0)

Testing Strategy

Unit Tests

  1. Metric collection and storage
  2. Circular buffer behavior
  3. Summary statistics calculation
  4. Memory monitoring functions
  5. Query monitoring callbacks

Integration Tests

  1. End-to-end request monitoring
  2. Slow query detection
  3. Memory leak detection
  4. Dashboard rendering
  5. Performance overhead measurement

Performance Tests

def test_monitoring_overhead():
    """Verify monitoring overhead is <1%"""
    # Baseline without monitoring
    config.PERF_MONITORING_ENABLED = False
    baseline_time = measure_operation_time()

    # With monitoring
    config.PERF_MONITORING_ENABLED = True
    monitored_time = measure_operation_time()

    overhead = (monitored_time - baseline_time) / baseline_time
    assert overhead < 0.01  # Less than 1%

Security Considerations

  1. Authentication: Dashboard requires admin access
  2. Query Sanitization: Don't log sensitive query parameters
  3. Rate Limiting: Prevent dashboard DoS
  4. Data Retention: Automatic cleanup of old metrics
  5. Configuration: Validate all config values

Performance Impact

Expected Overhead

  • Request timing: <0.1ms per request
  • Query monitoring: <0.5ms per query
  • Memory tracking: <1% CPU (background thread)
  • Dashboard rendering: <50ms
  • Total overhead: <1% when fully enabled

Optimization Strategies

  1. Use sampling for high-frequency operations
  2. Lazy calculation of statistics
  3. Efficient circular buffer implementation
  4. Minimal string operations in hot path

Documentation Requirements

Administrator Guide

  • How to enable monitoring
  • Understanding metrics
  • Identifying performance issues
  • Tuning configuration

Dashboard User Guide

  • Navigating the dashboard
  • Interpreting metrics
  • Finding slow queries
  • Memory usage patterns

Acceptance Criteria

  1. Timing instrumentation for all key operations
  2. Database query performance logging
  3. Slow query detection with configurable threshold
  4. Memory usage tracking
  5. Performance dashboard at /admin/performance
  6. Monitoring overhead <1%
  7. Zero impact when disabled
  8. Circular buffer limits memory usage
  9. All metrics clearly documented
  10. Security review passed