docs: Fix ADR numbering conflicts and create comprehensive documentation indices

This commit resolves all documentation issues identified in the comprehensive review: CRITICAL FIXES: - Renumbered duplicate ADRs to eliminate conflicts: * ADR-022-migration-race-condition-fix → ADR-037 * ADR-022-syndication-formats → ADR-038 * ADR-023-microformats2-compliance → ADR-040 * ADR-027-versioning-strategy-for-authorization-removal → ADR-042 * ADR-030-CORRECTED-indieauth-endpoint-discovery → ADR-043 * ADR-031-endpoint-discovery-implementation → ADR-044 - Updated all cross-references to renumbered ADRs in: * docs/projectplan/ROADMAP.md * docs/reports/v1.0.0-rc.5-migration-race-condition-implementation.md * docs/reports/2025-11-24-endpoint-discovery-analysis.md * docs/decisions/ADR-043-CORRECTED-indieauth-endpoint-discovery.md * docs/decisions/ADR-044-endpoint-discovery-implementation.md - Updated README.md version from 1.0.0 to 1.1.0 - Tracked ADR-021-indieauth-provider-strategy.md in git DOCUMENTATION IMPROVEMENTS: - Created comprehensive INDEX.md files for all docs/ subdirectories: * docs/architecture/INDEX.md (28 documents indexed) * docs/decisions/INDEX.md (55 ADRs indexed with topical grouping) * docs/design/INDEX.md (phase plans and feature designs) * docs/standards/INDEX.md (9 standards with compliance checklist) * docs/reports/INDEX.md (57 implementation reports) * docs/deployment/INDEX.md (deployment guides) * docs/examples/INDEX.md (code samples and usage patterns) * docs/migration/INDEX.md (version migration guides) * docs/releases/INDEX.md (release documentation) * docs/reviews/INDEX.md (architectural reviews) * docs/security/INDEX.md (security documentation) - Updated CLAUDE.md with complete folder descriptions including: * docs/migration/ * docs/releases/ * docs/security/ VERIFICATION: - All ADR numbers now sequential and unique (50 total ADRs) - No duplicate ADR numbers remain - All cross-references updated and verified - Documentation structure consistent and well-organized These changes improve documentation discoverability, maintainability, and ensure proper version tracking. All index files follow consistent format with clear navigation guidance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-25 13:28:56 -07:00
parent f28a48f560
commit e589f5bd6c
34 changed files with 5820 additions and 30 deletions
--- a/docs/decisions/ADR-055-error-handling-philosophy.md
+++ b/docs/decisions/ADR-055-error-handling-philosophy.md
@@ -0,0 +1,415 @@
+# ADR-055: Error Handling Philosophy
+
+## Status
+Accepted
+
+## Context
+StarPunk v1.1.1 focuses on production readiness, including graceful error handling. Currently, error handling is inconsistent:
+- Some errors crash the application
+- Error messages vary in helpfulness
+- No distinction between user and system errors
+- Insufficient context for debugging
+
+We need a consistent philosophy for handling errors that balances user experience, security, and debuggability.
+
+## Decision
+Adopt a layered error handling strategy that provides graceful degradation, helpful user messages, and detailed logging for operators.
+
+### Error Handling Principles
+
+1. **Fail Gracefully**: Never crash when recovery is possible
+2. **Be Helpful**: Provide actionable error messages
+3. **Log Everything**: Detailed context for debugging
+4. **Secure by Default**: Don't leak sensitive information
+5. **User vs System**: Different handling for different audiences
+
+### Error Categories
+
+#### 1. User Errors (4xx class)
+Errors caused by user action or client issues.
+
+Examples:
+- Invalid Micropub request
+- Authentication failure
+- Missing required fields
+- Invalid slug format
+
+Handling:
+- Return helpful error message
+- Suggest corrective action
+- Log at INFO level
+- Don't expose internals
+
+#### 2. System Errors (5xx class)
+Errors in system operation.
+
+Examples:
+- Database connection failure
+- File system errors
+- Memory exhaustion
+- Template rendering errors
+
+Handling:
+- Generic user message
+- Detailed logging at ERROR level
+- Attempt recovery if possible
+- Alert operators (future)
+
+#### 3. Configuration Errors
+Errors due to misconfiguration.
+
+Examples:
+- Missing required config
+- Invalid configuration values
+- Incompatible settings
+- Permission issues
+
+Handling:
+- Fail fast at startup
+- Clear error messages
+- Suggest fixes
+- Document requirements
+
+#### 4. Transient Errors
+Temporary errors that may succeed on retry.
+
+Examples:
+- Database lock
+- Network timeout
+- Resource temporarily unavailable
+
+Handling:
+- Automatic retry with backoff
+- Log at WARNING level
+- Fail gracefully after retries
+- Track frequency
+
+### Error Response Format
+
+#### Development Mode
+```json
+{
+  "error": {
+    "type": "ValidationError",
+    "message": "Invalid slug format",
+    "details": {
+      "field": "slug",
+      "value": "my/bad/slug",
+      "pattern": "^[a-z0-9-]+$"
+    },
+    "suggestion": "Slugs can only contain lowercase letters, numbers, and hyphens",
+    "documentation": "/docs/api/micropub#slugs",
+    "trace_id": "abc123"
+  }
+}
+```
+
+#### Production Mode
+```json
+{
+  "error": {
+    "message": "Invalid request format",
+    "suggestion": "Please check your request and try again",
+    "documentation": "/docs/api/micropub",
+    "trace_id": "abc123"
+  }
+}
+```
+
+### Implementation Pattern
+
+```python
+# starpunk/errors.py
+from enum import Enum
+from typing import Optional, Dict, Any
+import logging
+
+logger = logging.getLogger('starpunk.errors')
+
+class ErrorCategory(Enum):
+    USER = "user"
+    SYSTEM = "system"
+    CONFIG = "config"
+    TRANSIENT = "transient"
+
+class StarPunkError(Exception):
+    """Base exception for all StarPunk errors"""
+
+    def __init__(
+        self,
+        message: str,
+        category: ErrorCategory = ErrorCategory.SYSTEM,
+        suggestion: Optional[str] = None,
+        details: Optional[Dict[str, Any]] = None,
+        status_code: int = 500,
+        recoverable: bool = False
+    ):
+        self.message = message
+        self.category = category
+        self.suggestion = suggestion
+        self.details = details or {}
+        self.status_code = status_code
+        self.recoverable = recoverable
+        super().__init__(message)
+
+    def to_user_dict(self, debug: bool = False) -> dict:
+        """Format error for user response"""
+        result = {
+            'error': {
+                'message': self.message,
+                'trace_id': self.trace_id
+            }
+        }
+
+        if self.suggestion:
+            result['error']['suggestion'] = self.suggestion
+
+        if debug and self.details:
+            result['error']['details'] = self.details
+            result['error']['type'] = self.__class__.__name__
+
+        return result
+
+    def log(self):
+        """Log error with appropriate level"""
+        if self.category == ErrorCategory.USER:
+            logger.info(
+                "User error: %s",
+                self.message,
+                extra={'context': self.details}
+            )
+        elif self.category == ErrorCategory.TRANSIENT:
+            logger.warning(
+                "Transient error: %s",
+                self.message,
+                extra={'context': self.details}
+            )
+        else:
+            logger.error(
+                "System error: %s",
+                self.message,
+                extra={'context': self.details},
+                exc_info=True
+            )
+
+# Specific error classes
+class ValidationError(StarPunkError):
+    """User input validation failed"""
+    def __init__(self, message: str, field: str = None, **kwargs):
+        super().__init__(
+            message,
+            category=ErrorCategory.USER,
+            status_code=400,
+            **kwargs
+        )
+        if field:
+            self.details['field'] = field
+
+class AuthenticationError(StarPunkError):
+    """Authentication failed"""
+    def __init__(self, message: str = "Authentication required", **kwargs):
+        super().__init__(
+            message,
+            category=ErrorCategory.USER,
+            status_code=401,
+            suggestion="Please authenticate and try again",
+            **kwargs
+        )
+
+class DatabaseError(StarPunkError):
+    """Database operation failed"""
+    def __init__(self, message: str, **kwargs):
+        super().__init__(
+            message,
+            category=ErrorCategory.SYSTEM,
+            status_code=500,
+            suggestion="Please try again later",
+            **kwargs
+        )
+
+class ConfigurationError(StarPunkError):
+    """Configuration is invalid"""
+    def __init__(self, message: str, setting: str = None, **kwargs):
+        super().__init__(
+            message,
+            category=ErrorCategory.CONFIG,
+            status_code=500,
+            **kwargs
+        )
+        if setting:
+            self.details['setting'] = setting
+```
+
+### Error Handling Middleware
+
+```python
+# starpunk/middleware/errors.py
+def error_handler(func):
+    """Decorator for consistent error handling"""
+    def wrapper(*args, **kwargs):
+        try:
+            return func(*args, **kwargs)
+        except StarPunkError as e:
+            e.log()
+            return e.to_user_dict(debug=is_debug_mode())
+        except Exception as e:
+            # Unexpected error
+            error = StarPunkError(
+                message="An unexpected error occurred",
+                category=ErrorCategory.SYSTEM,
+                details={'original': str(e)}
+            )
+            error.log()
+            return error.to_user_dict(debug=is_debug_mode())
+    return wrapper
+```
+
+### Graceful Degradation Examples
+
+#### FTS5 Unavailable
+```python
+try:
+    # Attempt FTS5 search
+    results = search_with_fts5(query)
+except FTS5UnavailableError:
+    logger.warning("FTS5 unavailable, falling back to LIKE")
+    results = search_with_like(query)
+    flash("Search is running in compatibility mode")
+```
+
+#### Database Lock
+```python
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(multiplier=0.5, max=2),
+    retry=retry_if_exception_type(sqlite3.OperationalError)
+)
+def execute_query(query):
+    """Execute with retry for transient errors"""
+    return db.execute(query)
+```
+
+#### Missing Optional Feature
+```python
+if not config.SEARCH_ENABLED:
+    # Return empty results instead of error
+    return {
+        'results': [],
+        'message': 'Search is disabled on this instance'
+    }
+```
+
+## Rationale
+
+### Why Graceful Degradation?
+1. **User Experience**: Don't break the whole app
+2. **Reliability**: Partial functionality better than none
+3. **Operations**: Easier to diagnose in production
+4. **Recovery**: System can self-heal from transients
+
+### Why Different Error Categories?
+1. **Appropriate Response**: Different errors need different handling
+2. **Security**: Don't expose internals for system errors
+3. **Debugging**: Operators need full context
+4. **User Experience**: Users need actionable messages
+
+### Why Structured Errors?
+1. **Consistency**: Predictable error format
+2. **Parsing**: Tools can process errors
+3. **Correlation**: Trace IDs link logs to responses
+4. **Documentation**: Self-documenting error details
+
+## Consequences
+
+### Positive
+1. **Better UX**: Helpful error messages
+2. **Easier Debugging**: Rich context in logs
+3. **More Reliable**: Graceful degradation
+4. **Secure**: No information leakage
+5. **Consistent**: Predictable error handling
+
+### Negative
+1. **More Code**: Error handling adds complexity
+2. **Testing Burden**: Many error paths to test
+3. **Performance**: Error handling overhead
+4. **Maintenance**: Error messages need updates
+
+### Mitigations
+1. Use error hierarchy to reduce duplication
+2. Generate tests for error paths
+3. Cache error messages
+4. Document error codes clearly
+
+## Alternatives Considered
+
+### 1. Let Exceptions Bubble
+**Pros**: Simple, Python default
+**Cons**: Poor UX, crashes, no context
+**Decision**: Not production-ready
+
+### 2. Generic Error Pages
+**Pros**: Simple to implement
+**Cons**: Not helpful, poor API experience
+**Decision**: Insufficient for Micropub API
+
+### 3. Error Codes System
+**Pros**: Precise, machine-readable
+**Cons**: Complex, needs documentation
+**Decision**: Over-engineered for our scale
+
+### 4. Sentry/Error Tracking Service
+**Pros**: Rich features, alerting
+**Cons**: External dependency, privacy
+**Decision**: Conflicts with self-hosted philosophy
+
+## Implementation Notes
+
+### Critical Path Protection
+Always protect critical paths:
+```python
+# Never let note creation completely fail
+try:
+    create_search_index(note)
+except Exception as e:
+    logger.error("Search indexing failed: %s", e)
+    # Continue without search - note still created
+```
+
+### Error Budget
+Track error rates for SLO monitoring:
+- User errors: Unlimited (not our fault)
+- System errors: <0.1% of requests
+- Configuration errors: 0 after startup
+- Transient errors: <1% of requests
+
+### Testing Strategy
+1. Unit tests for each error class
+2. Integration tests for error paths
+3. Chaos testing for transient errors
+4. User journey tests with errors
+
+## Security Considerations
+
+1. Never expose stack traces to users
+2. Sanitize error messages
+3. Rate limit error endpoints
+4. Don't leak existence via errors
+5. Log security errors specially
+
+## Migration Path
+
+1. Phase 1: Add error classes
+2. Phase 2: Wrap existing code
+3. Phase 3: Add graceful degradation
+4. Phase 4: Improve error messages
+
+## References
+
+- [Error Handling Best Practices](https://www.python.org/dev/peps/pep-0008/#programming-recommendations)
+- [HTTP Status Codes](https://httpstatuses.com/)
+- [OWASP Error Handling](https://owasp.org/www-community/Improper_Error_Handling)
+- [Google SRE Book - Handling Overload](https://sre.google/sre-book/handling-overload/)
+
+## Document History
+
+- 2025-11-25: Initial draft for v1.1.1 release planning