This commit resolves all documentation issues identified in the comprehensive review: CRITICAL FIXES: - Renumbered duplicate ADRs to eliminate conflicts: * ADR-022-migration-race-condition-fix → ADR-037 * ADR-022-syndication-formats → ADR-038 * ADR-023-microformats2-compliance → ADR-040 * ADR-027-versioning-strategy-for-authorization-removal → ADR-042 * ADR-030-CORRECTED-indieauth-endpoint-discovery → ADR-043 * ADR-031-endpoint-discovery-implementation → ADR-044 - Updated all cross-references to renumbered ADRs in: * docs/projectplan/ROADMAP.md * docs/reports/v1.0.0-rc.5-migration-race-condition-implementation.md * docs/reports/2025-11-24-endpoint-discovery-analysis.md * docs/decisions/ADR-043-CORRECTED-indieauth-endpoint-discovery.md * docs/decisions/ADR-044-endpoint-discovery-implementation.md - Updated README.md version from 1.0.0 to 1.1.0 - Tracked ADR-021-indieauth-provider-strategy.md in git DOCUMENTATION IMPROVEMENTS: - Created comprehensive INDEX.md files for all docs/ subdirectories: * docs/architecture/INDEX.md (28 documents indexed) * docs/decisions/INDEX.md (55 ADRs indexed with topical grouping) * docs/design/INDEX.md (phase plans and feature designs) * docs/standards/INDEX.md (9 standards with compliance checklist) * docs/reports/INDEX.md (57 implementation reports) * docs/deployment/INDEX.md (deployment guides) * docs/examples/INDEX.md (code samples and usage patterns) * docs/migration/INDEX.md (version migration guides) * docs/releases/INDEX.md (release documentation) * docs/reviews/INDEX.md (architectural reviews) * docs/security/INDEX.md (security documentation) - Updated CLAUDE.md with complete folder descriptions including: * docs/migration/ * docs/releases/ * docs/security/ VERIFICATION: - All ADR numbers now sequential and unique (50 total ADRs) - No duplicate ADR numbers remain - All cross-references updated and verified - Documentation structure consistent and well-organized These changes improve documentation discoverability, maintainability, and ensure proper version tracking. All index files follow consistent format with clear navigation guidance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
304 lines
7.2 KiB
Markdown
304 lines
7.2 KiB
Markdown
# ADR-053: Performance Monitoring Strategy
|
|
|
|
## Status
|
|
Accepted
|
|
|
|
## Context
|
|
StarPunk v1.1.1 introduces performance monitoring to help operators understand system behavior in production. Currently, we have no visibility into:
|
|
- Database query performance
|
|
- Memory usage patterns
|
|
- Request processing times
|
|
- Bottlenecks and slow operations
|
|
|
|
We need a lightweight, zero-dependency monitoring solution that provides actionable insights without impacting performance.
|
|
|
|
## Decision
|
|
Implement a built-in performance monitoring system using Python's standard library, with optional detailed tracking controlled by configuration.
|
|
|
|
### Architecture Overview
|
|
|
|
```
|
|
Request → Middleware (timing) → Handler
|
|
↓ ↓
|
|
Context Manager Decorators
|
|
↓ ↓
|
|
Metrics Store ← Database Hooks
|
|
↓
|
|
Admin Dashboard
|
|
```
|
|
|
|
### Core Components
|
|
|
|
#### 1. Metrics Collector
|
|
Location: `starpunk/monitoring/collector.py`
|
|
|
|
Responsibilities:
|
|
- Collect timing data
|
|
- Track memory usage
|
|
- Store recent metrics in memory
|
|
- Provide aggregation functions
|
|
|
|
Data Structure:
|
|
```python
|
|
@dataclass
|
|
class Metric:
|
|
timestamp: float
|
|
category: str # "db", "http", "function"
|
|
operation: str # specific operation name
|
|
duration: float # in seconds
|
|
metadata: dict # additional context
|
|
```
|
|
|
|
#### 2. Database Performance Tracking
|
|
Location: `starpunk/monitoring/db_monitor.py`
|
|
|
|
Features:
|
|
- Query execution timing
|
|
- Slow query detection
|
|
- Query pattern analysis
|
|
- Connection pool monitoring
|
|
|
|
Implementation via SQLite callbacks:
|
|
```python
|
|
# Wrap database operations
|
|
with monitor.track_query("SELECT", "notes"):
|
|
cursor.execute(query)
|
|
```
|
|
|
|
#### 3. Memory Tracking
|
|
Location: `starpunk/monitoring/memory.py`
|
|
|
|
Track:
|
|
- Process memory (RSS)
|
|
- Memory growth over time
|
|
- Per-request memory delta
|
|
- Memory high water mark
|
|
|
|
Uses `resource` module (stdlib).
|
|
|
|
#### 4. Request Performance
|
|
Location: `starpunk/monitoring/http.py`
|
|
|
|
Track:
|
|
- Request processing time
|
|
- Response size
|
|
- Status code distribution
|
|
- Slowest endpoints
|
|
|
|
#### 5. Admin Dashboard
|
|
Location: `/admin/performance`
|
|
|
|
Display:
|
|
- Real-time metrics (last 15 minutes)
|
|
- Slow query log
|
|
- Memory usage graph
|
|
- Endpoint performance table
|
|
- Database statistics
|
|
|
|
### Data Retention
|
|
|
|
In-memory circular buffer approach:
|
|
- Last 1000 metrics retained
|
|
- Automatic old data eviction
|
|
- No persistent storage (privacy/simplicity)
|
|
- Reset on restart
|
|
|
|
### Performance Overhead
|
|
|
|
Target: <1% overhead when enabled
|
|
|
|
Strategies:
|
|
- Sampling for high-frequency operations
|
|
- Lazy computation of aggregates
|
|
- Minimal memory footprint (1MB max)
|
|
- Conditional compilation via config
|
|
|
|
## Rationale
|
|
|
|
### Why Built-in Monitoring?
|
|
1. **Zero Dependencies**: Uses only Python stdlib
|
|
2. **Privacy**: No external services
|
|
3. **Simplicity**: No complex setup
|
|
4. **Integrated**: Direct access to internals
|
|
5. **Lightweight**: Minimal overhead
|
|
|
|
### Why Not External Tools?
|
|
|
|
**Prometheus/Grafana**:
|
|
- Requires external services
|
|
- Complex setup
|
|
- Overkill for single-user system
|
|
|
|
**APM Services** (New Relic, DataDog):
|
|
- Privacy concerns
|
|
- Subscription costs
|
|
- Network dependency
|
|
- Too heavy for our needs
|
|
|
|
**OpenTelemetry**:
|
|
- Large dependency
|
|
- Complex configuration
|
|
- Designed for distributed systems
|
|
|
|
### Design Principles
|
|
|
|
1. **Opt-in**: Disabled by default
|
|
2. **Lightweight**: Minimal resource usage
|
|
3. **Actionable**: Focus on useful metrics
|
|
4. **Temporary**: No permanent storage
|
|
5. **Private**: No external data transmission
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
1. **Production Visibility**: Understand behavior under load
|
|
2. **Performance Debugging**: Identify bottlenecks quickly
|
|
3. **No Dependencies**: Pure Python solution
|
|
4. **Privacy Preserving**: Data stays local
|
|
5. **Simple Deployment**: No additional services
|
|
|
|
### Negative
|
|
1. **Limited History**: Only recent data available
|
|
2. **Memory Usage**: ~1MB for metrics buffer
|
|
3. **No Alerting**: Manual monitoring required
|
|
4. **Single Node**: No distributed tracing
|
|
|
|
### Mitigations
|
|
1. Export capability for external tools
|
|
2. Configurable buffer size
|
|
3. Webhook support for alerts (future)
|
|
4. Focus on most valuable metrics
|
|
|
|
## Alternatives Considered
|
|
|
|
### 1. Logging-based Monitoring
|
|
**Approach**: Parse performance data from logs
|
|
**Pros**: Simple, no new code
|
|
**Cons**: Log parsing complexity, no real-time view
|
|
**Decision**: Dedicated monitoring is cleaner
|
|
|
|
### 2. External Monitoring Service
|
|
**Approach**: Use service like Sentry
|
|
**Pros**: Full-featured, alerting included
|
|
**Cons**: Privacy, cost, complexity
|
|
**Decision**: Violates self-hosted principle
|
|
|
|
### 3. Prometheus Exporter
|
|
**Approach**: Expose /metrics endpoint
|
|
**Pros**: Standard, good tooling
|
|
**Cons**: Requires Prometheus setup
|
|
**Decision**: Too complex for target users
|
|
|
|
### 4. No Monitoring
|
|
**Approach**: Rely on logs and external tools
|
|
**Pros**: Simplest
|
|
**Cons**: Poor production visibility
|
|
**Decision**: v1.1.1 specifically targets production readiness
|
|
|
|
## Implementation Details
|
|
|
|
### Instrumentation Points
|
|
|
|
1. **Database Layer**
|
|
- All queries automatically timed
|
|
- Connection acquisition/release
|
|
- Transaction duration
|
|
- Migration execution
|
|
|
|
2. **HTTP Layer**
|
|
- Middleware wraps all requests
|
|
- Per-endpoint timing
|
|
- Static file serving
|
|
- Error handling
|
|
|
|
3. **Core Functions**
|
|
- Note creation/update
|
|
- Search operations
|
|
- RSS generation
|
|
- Authentication flow
|
|
|
|
### Performance Dashboard Layout
|
|
|
|
```
|
|
Performance Dashboard
|
|
═══════════════════
|
|
|
|
Overview
|
|
--------
|
|
Uptime: 5d 3h 15m
|
|
Requests: 10,234
|
|
Avg Response: 45ms
|
|
Memory: 128MB
|
|
|
|
Slow Queries (>1s)
|
|
------------------
|
|
[timestamp] SELECT ... FROM notes (1.2s)
|
|
[timestamp] UPDATE ... SET ... (1.1s)
|
|
|
|
Endpoint Performance
|
|
-------------------
|
|
GET / : avg 23ms, p99 45ms
|
|
GET /notes/:id : avg 35ms, p99 67ms
|
|
POST /micropub : avg 125ms, p99 234ms
|
|
|
|
Memory Usage
|
|
-----------
|
|
[ASCII graph showing last 15 minutes]
|
|
|
|
Database Stats
|
|
-------------
|
|
Pool Size: 3/5
|
|
Queries/sec: 4.2
|
|
Cache Hit Rate: 87%
|
|
```
|
|
|
|
### Configuration Options
|
|
|
|
```python
|
|
# All under STARPUNK_PERF_* prefix
|
|
MONITORING_ENABLED = False # Master switch
|
|
SLOW_QUERY_THRESHOLD = 1.0 # seconds
|
|
LOG_QUERIES = False # Log all queries
|
|
MEMORY_TRACKING = False # Track memory usage
|
|
SAMPLE_RATE = 1.0 # 1.0 = all, 0.1 = 10%
|
|
BUFFER_SIZE = 1000 # Number of metrics
|
|
DASHBOARD_ENABLED = True # Enable web UI
|
|
```
|
|
|
|
## Testing Strategy
|
|
|
|
1. **Unit Tests**: Mock collectors, verify metrics
|
|
2. **Integration Tests**: End-to-end monitoring flow
|
|
3. **Performance Tests**: Verify low overhead
|
|
4. **Load Tests**: Behavior under stress
|
|
|
|
## Security Considerations
|
|
|
|
1. Dashboard requires admin authentication
|
|
2. No sensitive data in metrics
|
|
3. No external data transmission
|
|
4. Metrics cleared on logout
|
|
5. Rate limiting on dashboard endpoint
|
|
|
|
## Migration Path
|
|
|
|
No migration required - monitoring is opt-in via configuration.
|
|
|
|
## Future Enhancements
|
|
|
|
v1.2.0 and beyond:
|
|
- Metric export (CSV/JSON)
|
|
- Alert thresholds
|
|
- Historical trending
|
|
- Custom metric points
|
|
- Plugin architecture
|
|
|
|
## References
|
|
|
|
- [Python resource module](https://docs.python.org/3/library/resource.html)
|
|
- [SQLite Query Performance](https://www.sqlite.org/queryplanner.html)
|
|
- [Web Vitals](https://web.dev/vitals/)
|
|
|
|
## Document History
|
|
|
|
- 2025-11-25: Initial draft for v1.1.1 release planning |