docs: Fix ADR numbering conflicts and create comprehensive documentation indices
This commit resolves all documentation issues identified in the comprehensive review: CRITICAL FIXES: - Renumbered duplicate ADRs to eliminate conflicts: * ADR-022-migration-race-condition-fix → ADR-037 * ADR-022-syndication-formats → ADR-038 * ADR-023-microformats2-compliance → ADR-040 * ADR-027-versioning-strategy-for-authorization-removal → ADR-042 * ADR-030-CORRECTED-indieauth-endpoint-discovery → ADR-043 * ADR-031-endpoint-discovery-implementation → ADR-044 - Updated all cross-references to renumbered ADRs in: * docs/projectplan/ROADMAP.md * docs/reports/v1.0.0-rc.5-migration-race-condition-implementation.md * docs/reports/2025-11-24-endpoint-discovery-analysis.md * docs/decisions/ADR-043-CORRECTED-indieauth-endpoint-discovery.md * docs/decisions/ADR-044-endpoint-discovery-implementation.md - Updated README.md version from 1.0.0 to 1.1.0 - Tracked ADR-021-indieauth-provider-strategy.md in git DOCUMENTATION IMPROVEMENTS: - Created comprehensive INDEX.md files for all docs/ subdirectories: * docs/architecture/INDEX.md (28 documents indexed) * docs/decisions/INDEX.md (55 ADRs indexed with topical grouping) * docs/design/INDEX.md (phase plans and feature designs) * docs/standards/INDEX.md (9 standards with compliance checklist) * docs/reports/INDEX.md (57 implementation reports) * docs/deployment/INDEX.md (deployment guides) * docs/examples/INDEX.md (code samples and usage patterns) * docs/migration/INDEX.md (version migration guides) * docs/releases/INDEX.md (release documentation) * docs/reviews/INDEX.md (architectural reviews) * docs/security/INDEX.md (security documentation) - Updated CLAUDE.md with complete folder descriptions including: * docs/migration/ * docs/releases/ * docs/security/ VERIFICATION: - All ADR numbers now sequential and unique (50 total ADRs) - No duplicate ADR numbers remain - All cross-references updated and verified - Documentation structure consistent and well-organized These changes improve documentation discoverability, maintainability, and ensure proper version tracking. All index files follow consistent format with clear navigation guidance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
487
docs/design/v1.1.1/performance-monitoring-spec.md
Normal file
487
docs/design/v1.1.1/performance-monitoring-spec.md
Normal file
@@ -0,0 +1,487 @@
|
||||
# Performance Monitoring Foundation Specification
|
||||
|
||||
## Overview
|
||||
The performance monitoring foundation provides operators with visibility into StarPunk's runtime behavior, helping identify bottlenecks, track resource usage, and ensure optimal performance in production.
|
||||
|
||||
## Requirements
|
||||
|
||||
### Functional Requirements
|
||||
|
||||
1. **Timing Instrumentation**
|
||||
- Measure execution time for key operations
|
||||
- Track request processing duration
|
||||
- Monitor database query execution time
|
||||
- Measure template rendering time
|
||||
- Track static file serving time
|
||||
|
||||
2. **Database Performance Logging**
|
||||
- Log all queries when enabled
|
||||
- Detect and warn about slow queries
|
||||
- Track connection pool usage
|
||||
- Monitor transaction duration
|
||||
- Count query frequency by type
|
||||
|
||||
3. **Memory Usage Tracking**
|
||||
- Monitor process RSS memory
|
||||
- Track memory growth over time
|
||||
- Detect memory leaks
|
||||
- Per-request memory delta
|
||||
- Memory high water mark
|
||||
|
||||
4. **Performance Dashboard**
|
||||
- Real-time metrics display
|
||||
- Historical data (last 15 minutes)
|
||||
- Slow query log
|
||||
- Memory usage visualization
|
||||
- Endpoint performance table
|
||||
|
||||
### Non-Functional Requirements
|
||||
|
||||
1. **Performance Impact**
|
||||
- Monitoring overhead <1% when enabled
|
||||
- Zero impact when disabled
|
||||
- Efficient memory usage (<1MB for metrics)
|
||||
- No blocking operations
|
||||
|
||||
2. **Usability**
|
||||
- Simple enable/disable via configuration
|
||||
- Clear, actionable metrics
|
||||
- Self-explanatory dashboard
|
||||
- No external dependencies
|
||||
|
||||
## Design
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────┐
|
||||
│ HTTP Request │
|
||||
│ ↓ │
|
||||
│ Performance Middleware │
|
||||
│ (start timer) │
|
||||
│ ↓ │
|
||||
│ ┌─────────────────┐ │
|
||||
│ │ Request Handler │ │
|
||||
│ │ ↓ │ │
|
||||
│ │ Database Layer │←── Query Monitor
|
||||
│ │ ↓ │ │
|
||||
│ │ Business Logic │←── Function Timer
|
||||
│ │ ↓ │ │
|
||||
│ │ Response Build │ │
|
||||
│ └─────────────────┘ │
|
||||
│ ↓ │
|
||||
│ Performance Middleware │
|
||||
│ (stop timer) │
|
||||
│ ↓ │
|
||||
│ Metrics Collector ← Memory Monitor
|
||||
│ ↓ │
|
||||
│ Circular Buffer │
|
||||
│ ↓ │
|
||||
│ Admin Dashboard │
|
||||
└──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Data Model
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional, Dict, Any
|
||||
from datetime import datetime
|
||||
from collections import deque
|
||||
|
||||
@dataclass
|
||||
class PerformanceMetric:
|
||||
"""Single performance measurement"""
|
||||
timestamp: datetime
|
||||
category: str # 'http', 'db', 'function', 'memory'
|
||||
operation: str # Specific operation name
|
||||
duration_ms: Optional[float] # For timed operations
|
||||
value: Optional[float] # For measurements
|
||||
metadata: Dict[str, Any] # Additional context
|
||||
|
||||
class MetricsBuffer:
|
||||
"""Circular buffer for metrics storage"""
|
||||
|
||||
def __init__(self, max_size: int = 1000):
|
||||
self.metrics = deque(maxlen=max_size)
|
||||
self.slow_queries = deque(maxlen=100)
|
||||
|
||||
def add_metric(self, metric: PerformanceMetric):
|
||||
"""Add metric to buffer"""
|
||||
self.metrics.append(metric)
|
||||
|
||||
# Special handling for slow queries
|
||||
if (metric.category == 'db' and
|
||||
metric.duration_ms > config.PERF_SLOW_QUERY_THRESHOLD * 1000):
|
||||
self.slow_queries.append(metric)
|
||||
|
||||
def get_recent(self, seconds: int = 900) -> List[PerformanceMetric]:
|
||||
"""Get metrics from last N seconds"""
|
||||
cutoff = datetime.now() - timedelta(seconds=seconds)
|
||||
return [m for m in self.metrics if m.timestamp > cutoff]
|
||||
|
||||
def get_summary(self) -> Dict[str, Any]:
|
||||
"""Get summary statistics"""
|
||||
recent = self.get_recent()
|
||||
|
||||
# Group by category and operation
|
||||
summary = defaultdict(lambda: {
|
||||
'count': 0,
|
||||
'total_ms': 0,
|
||||
'avg_ms': 0,
|
||||
'max_ms': 0,
|
||||
'p95_ms': 0,
|
||||
'p99_ms': 0
|
||||
})
|
||||
|
||||
# Calculate statistics...
|
||||
return dict(summary)
|
||||
```
|
||||
|
||||
### Instrumentation Implementation
|
||||
|
||||
#### Database Query Monitoring
|
||||
```python
|
||||
import sqlite3
|
||||
import time
|
||||
from contextlib import contextmanager
|
||||
|
||||
@contextmanager
|
||||
def monitored_connection():
|
||||
"""Database connection with monitoring"""
|
||||
conn = sqlite3.connect(DATABASE_PATH)
|
||||
|
||||
if config.PERF_MONITORING_ENABLED:
|
||||
# Set trace callback for query logging
|
||||
def trace_callback(statement):
|
||||
start_time = time.perf_counter()
|
||||
|
||||
# Execute query (via monkey-patching)
|
||||
original_execute = conn.execute
|
||||
|
||||
def monitored_execute(sql, params=None):
|
||||
result = original_execute(sql, params)
|
||||
duration = time.perf_counter() - start_time
|
||||
|
||||
metric = PerformanceMetric(
|
||||
timestamp=datetime.now(),
|
||||
category='db',
|
||||
operation=sql.split()[0].upper(), # SELECT, INSERT, etc
|
||||
duration_ms=duration * 1000,
|
||||
metadata={
|
||||
'query': sql if config.PERF_LOG_QUERIES else None,
|
||||
'params_count': len(params) if params else 0
|
||||
}
|
||||
)
|
||||
metrics_buffer.add_metric(metric)
|
||||
|
||||
if duration > config.PERF_SLOW_QUERY_THRESHOLD:
|
||||
logger.warning(
|
||||
"Slow query detected",
|
||||
extra={
|
||||
'query': sql,
|
||||
'duration_ms': duration * 1000
|
||||
}
|
||||
)
|
||||
|
||||
return result
|
||||
|
||||
conn.execute = monitored_execute
|
||||
|
||||
conn.set_trace_callback(trace_callback)
|
||||
|
||||
yield conn
|
||||
conn.close()
|
||||
```
|
||||
|
||||
#### HTTP Request Monitoring
|
||||
```python
|
||||
from flask import g, request
|
||||
import time
|
||||
|
||||
@app.before_request
|
||||
def start_request_timer():
|
||||
"""Start timing the request"""
|
||||
if config.PERF_MONITORING_ENABLED:
|
||||
g.start_time = time.perf_counter()
|
||||
g.start_memory = get_memory_usage()
|
||||
|
||||
@app.after_request
|
||||
def end_request_timer(response):
|
||||
"""End timing and record metrics"""
|
||||
if config.PERF_MONITORING_ENABLED and hasattr(g, 'start_time'):
|
||||
duration = time.perf_counter() - g.start_time
|
||||
memory_delta = get_memory_usage() - g.start_memory
|
||||
|
||||
metric = PerformanceMetric(
|
||||
timestamp=datetime.now(),
|
||||
category='http',
|
||||
operation=f"{request.method} {request.endpoint}",
|
||||
duration_ms=duration * 1000,
|
||||
metadata={
|
||||
'method': request.method,
|
||||
'path': request.path,
|
||||
'status': response.status_code,
|
||||
'size': len(response.get_data()),
|
||||
'memory_delta': memory_delta
|
||||
}
|
||||
)
|
||||
metrics_buffer.add_metric(metric)
|
||||
|
||||
return response
|
||||
```
|
||||
|
||||
#### Memory Monitoring
|
||||
```python
|
||||
import resource
|
||||
import threading
|
||||
import time
|
||||
|
||||
class MemoryMonitor:
|
||||
"""Background thread for memory monitoring"""
|
||||
|
||||
def __init__(self):
|
||||
self.running = False
|
||||
self.thread = None
|
||||
self.high_water_mark = 0
|
||||
|
||||
def start(self):
|
||||
"""Start memory monitoring"""
|
||||
if not config.PERF_MEMORY_TRACKING:
|
||||
return
|
||||
|
||||
self.running = True
|
||||
self.thread = threading.Thread(target=self._monitor)
|
||||
self.thread.daemon = True
|
||||
self.thread.start()
|
||||
|
||||
def _monitor(self):
|
||||
"""Monitor memory usage"""
|
||||
while self.running:
|
||||
memory_mb = get_memory_usage()
|
||||
self.high_water_mark = max(self.high_water_mark, memory_mb)
|
||||
|
||||
metric = PerformanceMetric(
|
||||
timestamp=datetime.now(),
|
||||
category='memory',
|
||||
operation='rss',
|
||||
value=memory_mb,
|
||||
metadata={
|
||||
'high_water_mark': self.high_water_mark
|
||||
}
|
||||
)
|
||||
metrics_buffer.add_metric(metric)
|
||||
|
||||
time.sleep(10) # Check every 10 seconds
|
||||
|
||||
def get_memory_usage() -> float:
|
||||
"""Get current memory usage in MB"""
|
||||
usage = resource.getrusage(resource.RUSAGE_SELF)
|
||||
return usage.ru_maxrss / 1024 # Convert KB to MB
|
||||
```
|
||||
|
||||
### Performance Dashboard
|
||||
|
||||
#### Dashboard Route
|
||||
```python
|
||||
@app.route('/admin/performance')
|
||||
@require_admin
|
||||
def performance_dashboard():
|
||||
"""Display performance metrics"""
|
||||
if not config.PERF_MONITORING_ENABLED:
|
||||
return render_template('admin/performance_disabled.html')
|
||||
|
||||
summary = metrics_buffer.get_summary()
|
||||
slow_queries = list(metrics_buffer.slow_queries)
|
||||
memory_data = get_memory_graph_data()
|
||||
|
||||
return render_template(
|
||||
'admin/performance.html',
|
||||
summary=summary,
|
||||
slow_queries=slow_queries,
|
||||
memory_data=memory_data,
|
||||
uptime=get_uptime(),
|
||||
config={
|
||||
'slow_threshold': config.PERF_SLOW_QUERY_THRESHOLD,
|
||||
'monitoring_enabled': config.PERF_MONITORING_ENABLED,
|
||||
'memory_tracking': config.PERF_MEMORY_TRACKING
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
#### Dashboard Template Structure
|
||||
```html
|
||||
<div class="performance-dashboard">
|
||||
<h2>Performance Monitoring</h2>
|
||||
|
||||
<!-- Overview Stats -->
|
||||
<div class="stats-grid">
|
||||
<div class="stat">
|
||||
<h3>Uptime</h3>
|
||||
<p>{{ uptime }}</p>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<h3>Total Requests</h3>
|
||||
<p>{{ summary.http.count }}</p>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<h3>Avg Response Time</h3>
|
||||
<p>{{ summary.http.avg_ms|round(2) }}ms</p>
|
||||
</div>
|
||||
<div class="stat">
|
||||
<h3>Memory Usage</h3>
|
||||
<p>{{ current_memory }}MB</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Slow Queries -->
|
||||
<div class="slow-queries">
|
||||
<h3>Slow Queries (>{{ config.slow_threshold }}s)</h3>
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Time</th>
|
||||
<th>Duration</th>
|
||||
<th>Query</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for query in slow_queries %}
|
||||
<tr>
|
||||
<td>{{ query.timestamp|timeago }}</td>
|
||||
<td>{{ query.duration_ms|round(2) }}ms</td>
|
||||
<td><code>{{ query.metadata.query|truncate(100) }}</code></td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
<!-- Endpoint Performance -->
|
||||
<div class="endpoint-performance">
|
||||
<h3>Endpoint Performance</h3>
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Endpoint</th>
|
||||
<th>Calls</th>
|
||||
<th>Avg (ms)</th>
|
||||
<th>P95 (ms)</th>
|
||||
<th>P99 (ms)</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for endpoint, stats in summary.endpoints.items() %}
|
||||
<tr>
|
||||
<td>{{ endpoint }}</td>
|
||||
<td>{{ stats.count }}</td>
|
||||
<td>{{ stats.avg_ms|round(2) }}</td>
|
||||
<td>{{ stats.p95_ms|round(2) }}</td>
|
||||
<td>{{ stats.p99_ms|round(2) }}</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
<!-- Memory Graph -->
|
||||
<div class="memory-graph">
|
||||
<h3>Memory Usage (Last 15 Minutes)</h3>
|
||||
<canvas id="memory-chart"></canvas>
|
||||
</div>
|
||||
</div>
|
||||
```
|
||||
|
||||
### Configuration Options
|
||||
|
||||
```python
|
||||
# Performance monitoring configuration
|
||||
PERF_MONITORING_ENABLED = Config.get_bool("STARPUNK_PERF_MONITORING_ENABLED", False)
|
||||
PERF_SLOW_QUERY_THRESHOLD = Config.get_float("STARPUNK_PERF_SLOW_QUERY_THRESHOLD", 1.0)
|
||||
PERF_LOG_QUERIES = Config.get_bool("STARPUNK_PERF_LOG_QUERIES", False)
|
||||
PERF_MEMORY_TRACKING = Config.get_bool("STARPUNK_PERF_MEMORY_TRACKING", False)
|
||||
PERF_BUFFER_SIZE = Config.get_int("STARPUNK_PERF_BUFFER_SIZE", 1000)
|
||||
PERF_SAMPLE_RATE = Config.get_float("STARPUNK_PERF_SAMPLE_RATE", 1.0)
|
||||
```
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
1. Metric collection and storage
|
||||
2. Circular buffer behavior
|
||||
3. Summary statistics calculation
|
||||
4. Memory monitoring functions
|
||||
5. Query monitoring callbacks
|
||||
|
||||
### Integration Tests
|
||||
1. End-to-end request monitoring
|
||||
2. Slow query detection
|
||||
3. Memory leak detection
|
||||
4. Dashboard rendering
|
||||
5. Performance overhead measurement
|
||||
|
||||
### Performance Tests
|
||||
```python
|
||||
def test_monitoring_overhead():
|
||||
"""Verify monitoring overhead is <1%"""
|
||||
# Baseline without monitoring
|
||||
config.PERF_MONITORING_ENABLED = False
|
||||
baseline_time = measure_operation_time()
|
||||
|
||||
# With monitoring
|
||||
config.PERF_MONITORING_ENABLED = True
|
||||
monitored_time = measure_operation_time()
|
||||
|
||||
overhead = (monitored_time - baseline_time) / baseline_time
|
||||
assert overhead < 0.01 # Less than 1%
|
||||
```
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Authentication**: Dashboard requires admin access
|
||||
2. **Query Sanitization**: Don't log sensitive query parameters
|
||||
3. **Rate Limiting**: Prevent dashboard DoS
|
||||
4. **Data Retention**: Automatic cleanup of old metrics
|
||||
5. **Configuration**: Validate all config values
|
||||
|
||||
## Performance Impact
|
||||
|
||||
### Expected Overhead
|
||||
- Request timing: <0.1ms per request
|
||||
- Query monitoring: <0.5ms per query
|
||||
- Memory tracking: <1% CPU (background thread)
|
||||
- Dashboard rendering: <50ms
|
||||
- Total overhead: <1% when fully enabled
|
||||
|
||||
### Optimization Strategies
|
||||
1. Use sampling for high-frequency operations
|
||||
2. Lazy calculation of statistics
|
||||
3. Efficient circular buffer implementation
|
||||
4. Minimal string operations in hot path
|
||||
|
||||
## Documentation Requirements
|
||||
|
||||
### Administrator Guide
|
||||
- How to enable monitoring
|
||||
- Understanding metrics
|
||||
- Identifying performance issues
|
||||
- Tuning configuration
|
||||
|
||||
### Dashboard User Guide
|
||||
- Navigating the dashboard
|
||||
- Interpreting metrics
|
||||
- Finding slow queries
|
||||
- Memory usage patterns
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
1. ✅ Timing instrumentation for all key operations
|
||||
2. ✅ Database query performance logging
|
||||
3. ✅ Slow query detection with configurable threshold
|
||||
4. ✅ Memory usage tracking
|
||||
5. ✅ Performance dashboard at /admin/performance
|
||||
6. ✅ Monitoring overhead <1%
|
||||
7. ✅ Zero impact when disabled
|
||||
8. ✅ Circular buffer limits memory usage
|
||||
9. ✅ All metrics clearly documented
|
||||
10. ✅ Security review passed
|
||||
Reference in New Issue
Block a user