This commit resolves all documentation issues identified in the comprehensive review: CRITICAL FIXES: - Renumbered duplicate ADRs to eliminate conflicts: * ADR-022-migration-race-condition-fix → ADR-037 * ADR-022-syndication-formats → ADR-038 * ADR-023-microformats2-compliance → ADR-040 * ADR-027-versioning-strategy-for-authorization-removal → ADR-042 * ADR-030-CORRECTED-indieauth-endpoint-discovery → ADR-043 * ADR-031-endpoint-discovery-implementation → ADR-044 - Updated all cross-references to renumbered ADRs in: * docs/projectplan/ROADMAP.md * docs/reports/v1.0.0-rc.5-migration-race-condition-implementation.md * docs/reports/2025-11-24-endpoint-discovery-analysis.md * docs/decisions/ADR-043-CORRECTED-indieauth-endpoint-discovery.md * docs/decisions/ADR-044-endpoint-discovery-implementation.md - Updated README.md version from 1.0.0 to 1.1.0 - Tracked ADR-021-indieauth-provider-strategy.md in git DOCUMENTATION IMPROVEMENTS: - Created comprehensive INDEX.md files for all docs/ subdirectories: * docs/architecture/INDEX.md (28 documents indexed) * docs/decisions/INDEX.md (55 ADRs indexed with topical grouping) * docs/design/INDEX.md (phase plans and feature designs) * docs/standards/INDEX.md (9 standards with compliance checklist) * docs/reports/INDEX.md (57 implementation reports) * docs/deployment/INDEX.md (deployment guides) * docs/examples/INDEX.md (code samples and usage patterns) * docs/migration/INDEX.md (version migration guides) * docs/releases/INDEX.md (release documentation) * docs/reviews/INDEX.md (architectural reviews) * docs/security/INDEX.md (security documentation) - Updated CLAUDE.md with complete folder descriptions including: * docs/migration/ * docs/releases/ * docs/security/ VERIFICATION: - All ADR numbers now sequential and unique (50 total ADRs) - No duplicate ADR numbers remain - All cross-references updated and verified - Documentation structure consistent and well-organized These changes improve documentation discoverability, maintainability, and ensure proper version tracking. All index files follow consistent format with clear navigation guidance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
487 lines
14 KiB
Markdown
487 lines
14 KiB
Markdown
# Performance Monitoring Foundation Specification
|
|
|
|
## Overview
|
|
The performance monitoring foundation provides operators with visibility into StarPunk's runtime behavior, helping identify bottlenecks, track resource usage, and ensure optimal performance in production.
|
|
|
|
## Requirements
|
|
|
|
### Functional Requirements
|
|
|
|
1. **Timing Instrumentation**
|
|
- Measure execution time for key operations
|
|
- Track request processing duration
|
|
- Monitor database query execution time
|
|
- Measure template rendering time
|
|
- Track static file serving time
|
|
|
|
2. **Database Performance Logging**
|
|
- Log all queries when enabled
|
|
- Detect and warn about slow queries
|
|
- Track connection pool usage
|
|
- Monitor transaction duration
|
|
- Count query frequency by type
|
|
|
|
3. **Memory Usage Tracking**
|
|
- Monitor process RSS memory
|
|
- Track memory growth over time
|
|
- Detect memory leaks
|
|
- Per-request memory delta
|
|
- Memory high water mark
|
|
|
|
4. **Performance Dashboard**
|
|
- Real-time metrics display
|
|
- Historical data (last 15 minutes)
|
|
- Slow query log
|
|
- Memory usage visualization
|
|
- Endpoint performance table
|
|
|
|
### Non-Functional Requirements
|
|
|
|
1. **Performance Impact**
|
|
- Monitoring overhead <1% when enabled
|
|
- Zero impact when disabled
|
|
- Efficient memory usage (<1MB for metrics)
|
|
- No blocking operations
|
|
|
|
2. **Usability**
|
|
- Simple enable/disable via configuration
|
|
- Clear, actionable metrics
|
|
- Self-explanatory dashboard
|
|
- No external dependencies
|
|
|
|
## Design
|
|
|
|
### Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────┐
|
|
│ HTTP Request │
|
|
│ ↓ │
|
|
│ Performance Middleware │
|
|
│ (start timer) │
|
|
│ ↓ │
|
|
│ ┌─────────────────┐ │
|
|
│ │ Request Handler │ │
|
|
│ │ ↓ │ │
|
|
│ │ Database Layer │←── Query Monitor
|
|
│ │ ↓ │ │
|
|
│ │ Business Logic │←── Function Timer
|
|
│ │ ↓ │ │
|
|
│ │ Response Build │ │
|
|
│ └─────────────────┘ │
|
|
│ ↓ │
|
|
│ Performance Middleware │
|
|
│ (stop timer) │
|
|
│ ↓ │
|
|
│ Metrics Collector ← Memory Monitor
|
|
│ ↓ │
|
|
│ Circular Buffer │
|
|
│ ↓ │
|
|
│ Admin Dashboard │
|
|
└──────────────────────────────────────┘
|
|
```
|
|
|
|
### Data Model
|
|
|
|
```python
|
|
from dataclasses import dataclass
|
|
from typing import Optional, Dict, Any
|
|
from datetime import datetime
|
|
from collections import deque
|
|
|
|
@dataclass
|
|
class PerformanceMetric:
|
|
"""Single performance measurement"""
|
|
timestamp: datetime
|
|
category: str # 'http', 'db', 'function', 'memory'
|
|
operation: str # Specific operation name
|
|
duration_ms: Optional[float] # For timed operations
|
|
value: Optional[float] # For measurements
|
|
metadata: Dict[str, Any] # Additional context
|
|
|
|
class MetricsBuffer:
|
|
"""Circular buffer for metrics storage"""
|
|
|
|
def __init__(self, max_size: int = 1000):
|
|
self.metrics = deque(maxlen=max_size)
|
|
self.slow_queries = deque(maxlen=100)
|
|
|
|
def add_metric(self, metric: PerformanceMetric):
|
|
"""Add metric to buffer"""
|
|
self.metrics.append(metric)
|
|
|
|
# Special handling for slow queries
|
|
if (metric.category == 'db' and
|
|
metric.duration_ms > config.PERF_SLOW_QUERY_THRESHOLD * 1000):
|
|
self.slow_queries.append(metric)
|
|
|
|
def get_recent(self, seconds: int = 900) -> List[PerformanceMetric]:
|
|
"""Get metrics from last N seconds"""
|
|
cutoff = datetime.now() - timedelta(seconds=seconds)
|
|
return [m for m in self.metrics if m.timestamp > cutoff]
|
|
|
|
def get_summary(self) -> Dict[str, Any]:
|
|
"""Get summary statistics"""
|
|
recent = self.get_recent()
|
|
|
|
# Group by category and operation
|
|
summary = defaultdict(lambda: {
|
|
'count': 0,
|
|
'total_ms': 0,
|
|
'avg_ms': 0,
|
|
'max_ms': 0,
|
|
'p95_ms': 0,
|
|
'p99_ms': 0
|
|
})
|
|
|
|
# Calculate statistics...
|
|
return dict(summary)
|
|
```
|
|
|
|
### Instrumentation Implementation
|
|
|
|
#### Database Query Monitoring
|
|
```python
|
|
import sqlite3
|
|
import time
|
|
from contextlib import contextmanager
|
|
|
|
@contextmanager
|
|
def monitored_connection():
|
|
"""Database connection with monitoring"""
|
|
conn = sqlite3.connect(DATABASE_PATH)
|
|
|
|
if config.PERF_MONITORING_ENABLED:
|
|
# Set trace callback for query logging
|
|
def trace_callback(statement):
|
|
start_time = time.perf_counter()
|
|
|
|
# Execute query (via monkey-patching)
|
|
original_execute = conn.execute
|
|
|
|
def monitored_execute(sql, params=None):
|
|
result = original_execute(sql, params)
|
|
duration = time.perf_counter() - start_time
|
|
|
|
metric = PerformanceMetric(
|
|
timestamp=datetime.now(),
|
|
category='db',
|
|
operation=sql.split()[0].upper(), # SELECT, INSERT, etc
|
|
duration_ms=duration * 1000,
|
|
metadata={
|
|
'query': sql if config.PERF_LOG_QUERIES else None,
|
|
'params_count': len(params) if params else 0
|
|
}
|
|
)
|
|
metrics_buffer.add_metric(metric)
|
|
|
|
if duration > config.PERF_SLOW_QUERY_THRESHOLD:
|
|
logger.warning(
|
|
"Slow query detected",
|
|
extra={
|
|
'query': sql,
|
|
'duration_ms': duration * 1000
|
|
}
|
|
)
|
|
|
|
return result
|
|
|
|
conn.execute = monitored_execute
|
|
|
|
conn.set_trace_callback(trace_callback)
|
|
|
|
yield conn
|
|
conn.close()
|
|
```
|
|
|
|
#### HTTP Request Monitoring
|
|
```python
|
|
from flask import g, request
|
|
import time
|
|
|
|
@app.before_request
|
|
def start_request_timer():
|
|
"""Start timing the request"""
|
|
if config.PERF_MONITORING_ENABLED:
|
|
g.start_time = time.perf_counter()
|
|
g.start_memory = get_memory_usage()
|
|
|
|
@app.after_request
|
|
def end_request_timer(response):
|
|
"""End timing and record metrics"""
|
|
if config.PERF_MONITORING_ENABLED and hasattr(g, 'start_time'):
|
|
duration = time.perf_counter() - g.start_time
|
|
memory_delta = get_memory_usage() - g.start_memory
|
|
|
|
metric = PerformanceMetric(
|
|
timestamp=datetime.now(),
|
|
category='http',
|
|
operation=f"{request.method} {request.endpoint}",
|
|
duration_ms=duration * 1000,
|
|
metadata={
|
|
'method': request.method,
|
|
'path': request.path,
|
|
'status': response.status_code,
|
|
'size': len(response.get_data()),
|
|
'memory_delta': memory_delta
|
|
}
|
|
)
|
|
metrics_buffer.add_metric(metric)
|
|
|
|
return response
|
|
```
|
|
|
|
#### Memory Monitoring
|
|
```python
|
|
import resource
|
|
import threading
|
|
import time
|
|
|
|
class MemoryMonitor:
|
|
"""Background thread for memory monitoring"""
|
|
|
|
def __init__(self):
|
|
self.running = False
|
|
self.thread = None
|
|
self.high_water_mark = 0
|
|
|
|
def start(self):
|
|
"""Start memory monitoring"""
|
|
if not config.PERF_MEMORY_TRACKING:
|
|
return
|
|
|
|
self.running = True
|
|
self.thread = threading.Thread(target=self._monitor)
|
|
self.thread.daemon = True
|
|
self.thread.start()
|
|
|
|
def _monitor(self):
|
|
"""Monitor memory usage"""
|
|
while self.running:
|
|
memory_mb = get_memory_usage()
|
|
self.high_water_mark = max(self.high_water_mark, memory_mb)
|
|
|
|
metric = PerformanceMetric(
|
|
timestamp=datetime.now(),
|
|
category='memory',
|
|
operation='rss',
|
|
value=memory_mb,
|
|
metadata={
|
|
'high_water_mark': self.high_water_mark
|
|
}
|
|
)
|
|
metrics_buffer.add_metric(metric)
|
|
|
|
time.sleep(10) # Check every 10 seconds
|
|
|
|
def get_memory_usage() -> float:
|
|
"""Get current memory usage in MB"""
|
|
usage = resource.getrusage(resource.RUSAGE_SELF)
|
|
return usage.ru_maxrss / 1024 # Convert KB to MB
|
|
```
|
|
|
|
### Performance Dashboard
|
|
|
|
#### Dashboard Route
|
|
```python
|
|
@app.route('/admin/performance')
|
|
@require_admin
|
|
def performance_dashboard():
|
|
"""Display performance metrics"""
|
|
if not config.PERF_MONITORING_ENABLED:
|
|
return render_template('admin/performance_disabled.html')
|
|
|
|
summary = metrics_buffer.get_summary()
|
|
slow_queries = list(metrics_buffer.slow_queries)
|
|
memory_data = get_memory_graph_data()
|
|
|
|
return render_template(
|
|
'admin/performance.html',
|
|
summary=summary,
|
|
slow_queries=slow_queries,
|
|
memory_data=memory_data,
|
|
uptime=get_uptime(),
|
|
config={
|
|
'slow_threshold': config.PERF_SLOW_QUERY_THRESHOLD,
|
|
'monitoring_enabled': config.PERF_MONITORING_ENABLED,
|
|
'memory_tracking': config.PERF_MEMORY_TRACKING
|
|
}
|
|
)
|
|
```
|
|
|
|
#### Dashboard Template Structure
|
|
```html
|
|
<div class="performance-dashboard">
|
|
<h2>Performance Monitoring</h2>
|
|
|
|
<!-- Overview Stats -->
|
|
<div class="stats-grid">
|
|
<div class="stat">
|
|
<h3>Uptime</h3>
|
|
<p>{{ uptime }}</p>
|
|
</div>
|
|
<div class="stat">
|
|
<h3>Total Requests</h3>
|
|
<p>{{ summary.http.count }}</p>
|
|
</div>
|
|
<div class="stat">
|
|
<h3>Avg Response Time</h3>
|
|
<p>{{ summary.http.avg_ms|round(2) }}ms</p>
|
|
</div>
|
|
<div class="stat">
|
|
<h3>Memory Usage</h3>
|
|
<p>{{ current_memory }}MB</p>
|
|
</div>
|
|
</div>
|
|
|
|
<!-- Slow Queries -->
|
|
<div class="slow-queries">
|
|
<h3>Slow Queries (>{{ config.slow_threshold }}s)</h3>
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th>Time</th>
|
|
<th>Duration</th>
|
|
<th>Query</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
{% for query in slow_queries %}
|
|
<tr>
|
|
<td>{{ query.timestamp|timeago }}</td>
|
|
<td>{{ query.duration_ms|round(2) }}ms</td>
|
|
<td><code>{{ query.metadata.query|truncate(100) }}</code></td>
|
|
</tr>
|
|
{% endfor %}
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
|
|
<!-- Endpoint Performance -->
|
|
<div class="endpoint-performance">
|
|
<h3>Endpoint Performance</h3>
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th>Endpoint</th>
|
|
<th>Calls</th>
|
|
<th>Avg (ms)</th>
|
|
<th>P95 (ms)</th>
|
|
<th>P99 (ms)</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
{% for endpoint, stats in summary.endpoints.items() %}
|
|
<tr>
|
|
<td>{{ endpoint }}</td>
|
|
<td>{{ stats.count }}</td>
|
|
<td>{{ stats.avg_ms|round(2) }}</td>
|
|
<td>{{ stats.p95_ms|round(2) }}</td>
|
|
<td>{{ stats.p99_ms|round(2) }}</td>
|
|
</tr>
|
|
{% endfor %}
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
|
|
<!-- Memory Graph -->
|
|
<div class="memory-graph">
|
|
<h3>Memory Usage (Last 15 Minutes)</h3>
|
|
<canvas id="memory-chart"></canvas>
|
|
</div>
|
|
</div>
|
|
```
|
|
|
|
### Configuration Options
|
|
|
|
```python
|
|
# Performance monitoring configuration
|
|
PERF_MONITORING_ENABLED = Config.get_bool("STARPUNK_PERF_MONITORING_ENABLED", False)
|
|
PERF_SLOW_QUERY_THRESHOLD = Config.get_float("STARPUNK_PERF_SLOW_QUERY_THRESHOLD", 1.0)
|
|
PERF_LOG_QUERIES = Config.get_bool("STARPUNK_PERF_LOG_QUERIES", False)
|
|
PERF_MEMORY_TRACKING = Config.get_bool("STARPUNK_PERF_MEMORY_TRACKING", False)
|
|
PERF_BUFFER_SIZE = Config.get_int("STARPUNK_PERF_BUFFER_SIZE", 1000)
|
|
PERF_SAMPLE_RATE = Config.get_float("STARPUNK_PERF_SAMPLE_RATE", 1.0)
|
|
```
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
1. Metric collection and storage
|
|
2. Circular buffer behavior
|
|
3. Summary statistics calculation
|
|
4. Memory monitoring functions
|
|
5. Query monitoring callbacks
|
|
|
|
### Integration Tests
|
|
1. End-to-end request monitoring
|
|
2. Slow query detection
|
|
3. Memory leak detection
|
|
4. Dashboard rendering
|
|
5. Performance overhead measurement
|
|
|
|
### Performance Tests
|
|
```python
|
|
def test_monitoring_overhead():
|
|
"""Verify monitoring overhead is <1%"""
|
|
# Baseline without monitoring
|
|
config.PERF_MONITORING_ENABLED = False
|
|
baseline_time = measure_operation_time()
|
|
|
|
# With monitoring
|
|
config.PERF_MONITORING_ENABLED = True
|
|
monitored_time = measure_operation_time()
|
|
|
|
overhead = (monitored_time - baseline_time) / baseline_time
|
|
assert overhead < 0.01 # Less than 1%
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
1. **Authentication**: Dashboard requires admin access
|
|
2. **Query Sanitization**: Don't log sensitive query parameters
|
|
3. **Rate Limiting**: Prevent dashboard DoS
|
|
4. **Data Retention**: Automatic cleanup of old metrics
|
|
5. **Configuration**: Validate all config values
|
|
|
|
## Performance Impact
|
|
|
|
### Expected Overhead
|
|
- Request timing: <0.1ms per request
|
|
- Query monitoring: <0.5ms per query
|
|
- Memory tracking: <1% CPU (background thread)
|
|
- Dashboard rendering: <50ms
|
|
- Total overhead: <1% when fully enabled
|
|
|
|
### Optimization Strategies
|
|
1. Use sampling for high-frequency operations
|
|
2. Lazy calculation of statistics
|
|
3. Efficient circular buffer implementation
|
|
4. Minimal string operations in hot path
|
|
|
|
## Documentation Requirements
|
|
|
|
### Administrator Guide
|
|
- How to enable monitoring
|
|
- Understanding metrics
|
|
- Identifying performance issues
|
|
- Tuning configuration
|
|
|
|
### Dashboard User Guide
|
|
- Navigating the dashboard
|
|
- Interpreting metrics
|
|
- Finding slow queries
|
|
- Memory usage patterns
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. ✅ Timing instrumentation for all key operations
|
|
2. ✅ Database query performance logging
|
|
3. ✅ Slow query detection with configurable threshold
|
|
4. ✅ Memory usage tracking
|
|
5. ✅ Performance dashboard at /admin/performance
|
|
6. ✅ Monitoring overhead <1%
|
|
7. ✅ Zero impact when disabled
|
|
8. ✅ Circular buffer limits memory usage
|
|
9. ✅ All metrics clearly documented
|
|
10. ✅ Security review passed |