Files

Phil Skentelbery e589f5bd6c docs: Fix ADR numbering conflicts and create comprehensive documentation indices

This commit resolves all documentation issues identified in the comprehensive review:

CRITICAL FIXES:
- Renumbered duplicate ADRs to eliminate conflicts:
  * ADR-022-migration-race-condition-fix → ADR-037
  * ADR-022-syndication-formats → ADR-038
  * ADR-023-microformats2-compliance → ADR-040
  * ADR-027-versioning-strategy-for-authorization-removal → ADR-042
  * ADR-030-CORRECTED-indieauth-endpoint-discovery → ADR-043
  * ADR-031-endpoint-discovery-implementation → ADR-044

- Updated all cross-references to renumbered ADRs in:
  * docs/projectplan/ROADMAP.md
  * docs/reports/v1.0.0-rc.5-migration-race-condition-implementation.md
  * docs/reports/2025-11-24-endpoint-discovery-analysis.md
  * docs/decisions/ADR-043-CORRECTED-indieauth-endpoint-discovery.md
  * docs/decisions/ADR-044-endpoint-discovery-implementation.md

- Updated README.md version from 1.0.0 to 1.1.0
- Tracked ADR-021-indieauth-provider-strategy.md in git

DOCUMENTATION IMPROVEMENTS:
- Created comprehensive INDEX.md files for all docs/ subdirectories:
  * docs/architecture/INDEX.md (28 documents indexed)
  * docs/decisions/INDEX.md (55 ADRs indexed with topical grouping)
  * docs/design/INDEX.md (phase plans and feature designs)
  * docs/standards/INDEX.md (9 standards with compliance checklist)
  * docs/reports/INDEX.md (57 implementation reports)
  * docs/deployment/INDEX.md (deployment guides)
  * docs/examples/INDEX.md (code samples and usage patterns)
  * docs/migration/INDEX.md (version migration guides)
  * docs/releases/INDEX.md (release documentation)
  * docs/reviews/INDEX.md (architectural reviews)
  * docs/security/INDEX.md (security documentation)

- Updated CLAUDE.md with complete folder descriptions including:
  * docs/migration/
  * docs/releases/
  * docs/security/

VERIFICATION:
- All ADR numbers now sequential and unique (50 total ADRs)
- No duplicate ADR numbers remain
- All cross-references updated and verified
- Documentation structure consistent and well-organized

These changes improve documentation discoverability, maintainability, and
ensure proper version tracking. All index files follow consistent format
with clear navigation guidance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-25 13:28:56 -07:00

14 KiB

Raw Blame History

Performance Monitoring Foundation Specification

Overview

The performance monitoring foundation provides operators with visibility into StarPunk's runtime behavior, helping identify bottlenecks, track resource usage, and ensure optimal performance in production.

Requirements

Functional Requirements

Timing Instrumentation
- Measure execution time for key operations
- Track request processing duration
- Monitor database query execution time
- Measure template rendering time
- Track static file serving time
Database Performance Logging
- Log all queries when enabled
- Detect and warn about slow queries
- Track connection pool usage
- Monitor transaction duration
- Count query frequency by type
Memory Usage Tracking
- Monitor process RSS memory
- Track memory growth over time
- Detect memory leaks
- Per-request memory delta
- Memory high water mark
Performance Dashboard
- Real-time metrics display
- Historical data (last 15 minutes)
- Slow query log
- Memory usage visualization
- Endpoint performance table

Non-Functional Requirements

Performance Impact
- Monitoring overhead <1% when enabled
- Zero impact when disabled
- Efficient memory usage (<1MB for metrics)
- No blocking operations
Usability
- Simple enable/disable via configuration
- Clear, actionable metrics
- Self-explanatory dashboard
- No external dependencies

Design

Architecture

┌──────────────────────────────────────┐
│         HTTP Request                  │
│              ↓                        │
│    Performance Middleware             │
│         (start timer)                 │
│              ↓                        │
│    ┌─────────────────┐               │
│    │  Request Handler │               │
│    │        ↓         │               │
│    │  Database Layer  │←── Query Monitor
│    │        ↓         │               │
│    │   Business Logic │←── Function Timer
│    │        ↓         │               │
│    │  Response Build  │               │
│    └─────────────────┘               │
│              ↓                        │
│     Performance Middleware            │
│         (stop timer)                  │
│              ↓                        │
│     Metrics Collector ← Memory Monitor
│              ↓                        │
│      Circular Buffer                  │
│              ↓                        │
│      Admin Dashboard                  │
└──────────────────────────────────────┘

Data Model

from dataclasses import dataclass
from typing import Optional, Dict, Any
from datetime import datetime
from collections import deque

@dataclass
class PerformanceMetric:
    """Single performance measurement"""
    timestamp: datetime
    category: str  # 'http', 'db', 'function', 'memory'
    operation: str  # Specific operation name
    duration_ms: Optional[float]  # For timed operations
    value: Optional[float]  # For measurements
    metadata: Dict[str, Any]  # Additional context

class MetricsBuffer:
    """Circular buffer for metrics storage"""

    def __init__(self, max_size: int = 1000):
        self.metrics = deque(maxlen=max_size)
        self.slow_queries = deque(maxlen=100)

    def add_metric(self, metric: PerformanceMetric):
        """Add metric to buffer"""
        self.metrics.append(metric)

        # Special handling for slow queries
        if (metric.category == 'db' and
            metric.duration_ms > config.PERF_SLOW_QUERY_THRESHOLD * 1000):
            self.slow_queries.append(metric)

    def get_recent(self, seconds: int = 900) -> List[PerformanceMetric]:
        """Get metrics from last N seconds"""
        cutoff = datetime.now() - timedelta(seconds=seconds)
        return [m for m in self.metrics if m.timestamp > cutoff]

    def get_summary(self) -> Dict[str, Any]:
        """Get summary statistics"""
        recent = self.get_recent()

        # Group by category and operation
        summary = defaultdict(lambda: {
            'count': 0,
            'total_ms': 0,
            'avg_ms': 0,
            'max_ms': 0,
            'p95_ms': 0,
            'p99_ms': 0
        })

        # Calculate statistics...
        return dict(summary)

Instrumentation Implementation

Database Query Monitoring

import sqlite3
import time
from contextlib import contextmanager

@contextmanager
def monitored_connection():
    """Database connection with monitoring"""
    conn = sqlite3.connect(DATABASE_PATH)

    if config.PERF_MONITORING_ENABLED:
        # Set trace callback for query logging
        def trace_callback(statement):
            start_time = time.perf_counter()

            # Execute query (via monkey-patching)
            original_execute = conn.execute

            def monitored_execute(sql, params=None):
                result = original_execute(sql, params)
                duration = time.perf_counter() - start_time

                metric = PerformanceMetric(
                    timestamp=datetime.now(),
                    category='db',
                    operation=sql.split()[0].upper(),  # SELECT, INSERT, etc
                    duration_ms=duration * 1000,
                    metadata={
                        'query': sql if config.PERF_LOG_QUERIES else None,
                        'params_count': len(params) if params else 0
                    }
                )
                metrics_buffer.add_metric(metric)

                if duration > config.PERF_SLOW_QUERY_THRESHOLD:
                    logger.warning(
                        "Slow query detected",
                        extra={
                            'query': sql,
                            'duration_ms': duration * 1000
                        }
                    )

                return result

            conn.execute = monitored_execute

        conn.set_trace_callback(trace_callback)

    yield conn
    conn.close()

HTTP Request Monitoring

from flask import g, request
import time

@app.before_request
def start_request_timer():
    """Start timing the request"""
    if config.PERF_MONITORING_ENABLED:
        g.start_time = time.perf_counter()
        g.start_memory = get_memory_usage()

@app.after_request
def end_request_timer(response):
    """End timing and record metrics"""
    if config.PERF_MONITORING_ENABLED and hasattr(g, 'start_time'):
        duration = time.perf_counter() - g.start_time
        memory_delta = get_memory_usage() - g.start_memory

        metric = PerformanceMetric(
            timestamp=datetime.now(),
            category='http',
            operation=f"{request.method} {request.endpoint}",
            duration_ms=duration * 1000,
            metadata={
                'method': request.method,
                'path': request.path,
                'status': response.status_code,
                'size': len(response.get_data()),
                'memory_delta': memory_delta
            }
        )
        metrics_buffer.add_metric(metric)

    return response

Memory Monitoring

import resource
import threading
import time

class MemoryMonitor:
    """Background thread for memory monitoring"""

    def __init__(self):
        self.running = False
        self.thread = None
        self.high_water_mark = 0

    def start(self):
        """Start memory monitoring"""
        if not config.PERF_MEMORY_TRACKING:
            return

        self.running = True
        self.thread = threading.Thread(target=self._monitor)
        self.thread.daemon = True
        self.thread.start()

    def _monitor(self):
        """Monitor memory usage"""
        while self.running:
            memory_mb = get_memory_usage()
            self.high_water_mark = max(self.high_water_mark, memory_mb)

            metric = PerformanceMetric(
                timestamp=datetime.now(),
                category='memory',
                operation='rss',
                value=memory_mb,
                metadata={
                    'high_water_mark': self.high_water_mark
                }
            )
            metrics_buffer.add_metric(metric)

            time.sleep(10)  # Check every 10 seconds

def get_memory_usage() -> float:
    """Get current memory usage in MB"""
    usage = resource.getrusage(resource.RUSAGE_SELF)
    return usage.ru_maxrss / 1024  # Convert KB to MB

Performance Dashboard

Dashboard Route

@app.route('/admin/performance')
@require_admin
def performance_dashboard():
    """Display performance metrics"""
    if not config.PERF_MONITORING_ENABLED:
        return render_template('admin/performance_disabled.html')

    summary = metrics_buffer.get_summary()
    slow_queries = list(metrics_buffer.slow_queries)
    memory_data = get_memory_graph_data()

    return render_template(
        'admin/performance.html',
        summary=summary,
        slow_queries=slow_queries,
        memory_data=memory_data,
        uptime=get_uptime(),
        config={
            'slow_threshold': config.PERF_SLOW_QUERY_THRESHOLD,
            'monitoring_enabled': config.PERF_MONITORING_ENABLED,
            'memory_tracking': config.PERF_MEMORY_TRACKING
        }
    )

Dashboard Template Structure

<div class="performance-dashboard">
  <h2>Performance Monitoring</h2>

  <!-- Overview Stats -->
  <div class="stats-grid">
    <div class="stat">
      <h3>Uptime</h3>
      <p>{{ uptime }}</p>
    </div>
    <div class="stat">
      <h3>Total Requests</h3>
      <p>{{ summary.http.count }}</p>
    </div>
    <div class="stat">
      <h3>Avg Response Time</h3>
      <p>{{ summary.http.avg_ms|round(2) }}ms</p>
    </div>
    <div class="stat">
      <h3>Memory Usage</h3>
      <p>{{ current_memory }}MB</p>
    </div>
  </div>

  <!-- Slow Queries -->
  <div class="slow-queries">
    <h3>Slow Queries (&gt;{{ config.slow_threshold }}s)</h3>
    <table>
      <thead>
        <tr>
          <th>Time</th>
          <th>Duration</th>
          <th>Query</th>
        </tr>
      </thead>
      <tbody>
        {% for query in slow_queries %}
        <tr>
          <td>{{ query.timestamp|timeago }}</td>
          <td>{{ query.duration_ms|round(2) }}ms</td>
          <td><code>{{ query.metadata.query|truncate(100) }}</code></td>
        </tr>
        {% endfor %}
      </tbody>
    </table>
  </div>

  <!-- Endpoint Performance -->
  <div class="endpoint-performance">
    <h3>Endpoint Performance</h3>
    <table>
      <thead>
        <tr>
          <th>Endpoint</th>
          <th>Calls</th>
          <th>Avg (ms)</th>
          <th>P95 (ms)</th>
          <th>P99 (ms)</th>
        </tr>
      </thead>
      <tbody>
        {% for endpoint, stats in summary.endpoints.items() %}
        <tr>
          <td>{{ endpoint }}</td>
          <td>{{ stats.count }}</td>
          <td>{{ stats.avg_ms|round(2) }}</td>
          <td>{{ stats.p95_ms|round(2) }}</td>
          <td>{{ stats.p99_ms|round(2) }}</td>
        </tr>
        {% endfor %}
      </tbody>
    </table>
  </div>

  <!-- Memory Graph -->
  <div class="memory-graph">
    <h3>Memory Usage (Last 15 Minutes)</h3>
    <canvas id="memory-chart"></canvas>
  </div>
</div>

Configuration Options

# Performance monitoring configuration
PERF_MONITORING_ENABLED = Config.get_bool("STARPUNK_PERF_MONITORING_ENABLED", False)
PERF_SLOW_QUERY_THRESHOLD = Config.get_float("STARPUNK_PERF_SLOW_QUERY_THRESHOLD", 1.0)
PERF_LOG_QUERIES = Config.get_bool("STARPUNK_PERF_LOG_QUERIES", False)
PERF_MEMORY_TRACKING = Config.get_bool("STARPUNK_PERF_MEMORY_TRACKING", False)
PERF_BUFFER_SIZE = Config.get_int("STARPUNK_PERF_BUFFER_SIZE", 1000)
PERF_SAMPLE_RATE = Config.get_float("STARPUNK_PERF_SAMPLE_RATE", 1.0)

Testing Strategy

Unit Tests

Metric collection and storage
Circular buffer behavior
Summary statistics calculation
Memory monitoring functions
Query monitoring callbacks

Integration Tests

End-to-end request monitoring
Slow query detection
Memory leak detection
Dashboard rendering
Performance overhead measurement

Performance Tests

def test_monitoring_overhead():
    """Verify monitoring overhead is <1%"""
    # Baseline without monitoring
    config.PERF_MONITORING_ENABLED = False
    baseline_time = measure_operation_time()

    # With monitoring
    config.PERF_MONITORING_ENABLED = True
    monitored_time = measure_operation_time()

    overhead = (monitored_time - baseline_time) / baseline_time
    assert overhead < 0.01  # Less than 1%

Security Considerations

Authentication: Dashboard requires admin access
Query Sanitization: Don't log sensitive query parameters
Rate Limiting: Prevent dashboard DoS
Data Retention: Automatic cleanup of old metrics
Configuration: Validate all config values

Performance Impact

Expected Overhead

Request timing: <0.1ms per request
Query monitoring: <0.5ms per query
Memory tracking: <1% CPU (background thread)
Dashboard rendering: <50ms
Total overhead: <1% when fully enabled

Optimization Strategies

Use sampling for high-frequency operations
Lazy calculation of statistics
Efficient circular buffer implementation
Minimal string operations in hot path

Documentation Requirements

Administrator Guide

How to enable monitoring
Understanding metrics
Identifying performance issues
Tuning configuration

Dashboard User Guide

Navigating the dashboard
Interpreting metrics
Finding slow queries
Memory usage patterns

Acceptance Criteria

✅ Timing instrumentation for all key operations
✅ Database query performance logging
✅ Slow query detection with configurable threshold
✅ Memory usage tracking
✅ Performance dashboard at /admin/performance
✅ Monitoring overhead <1%
✅ Zero impact when disabled
✅ Circular buffer limits memory usage
✅ All metrics clearly documented
✅ Security review passed

14 KiB Raw Blame History