Files

Phil Skentelbery e589f5bd6c docs: Fix ADR numbering conflicts and create comprehensive documentation indices

This commit resolves all documentation issues identified in the comprehensive review:

CRITICAL FIXES:
- Renumbered duplicate ADRs to eliminate conflicts:
  * ADR-022-migration-race-condition-fix → ADR-037
  * ADR-022-syndication-formats → ADR-038
  * ADR-023-microformats2-compliance → ADR-040
  * ADR-027-versioning-strategy-for-authorization-removal → ADR-042
  * ADR-030-CORRECTED-indieauth-endpoint-discovery → ADR-043
  * ADR-031-endpoint-discovery-implementation → ADR-044

- Updated all cross-references to renumbered ADRs in:
  * docs/projectplan/ROADMAP.md
  * docs/reports/v1.0.0-rc.5-migration-race-condition-implementation.md
  * docs/reports/2025-11-24-endpoint-discovery-analysis.md
  * docs/decisions/ADR-043-CORRECTED-indieauth-endpoint-discovery.md
  * docs/decisions/ADR-044-endpoint-discovery-implementation.md

- Updated README.md version from 1.0.0 to 1.1.0
- Tracked ADR-021-indieauth-provider-strategy.md in git

DOCUMENTATION IMPROVEMENTS:
- Created comprehensive INDEX.md files for all docs/ subdirectories:
  * docs/architecture/INDEX.md (28 documents indexed)
  * docs/decisions/INDEX.md (55 ADRs indexed with topical grouping)
  * docs/design/INDEX.md (phase plans and feature designs)
  * docs/standards/INDEX.md (9 standards with compliance checklist)
  * docs/reports/INDEX.md (57 implementation reports)
  * docs/deployment/INDEX.md (deployment guides)
  * docs/examples/INDEX.md (code samples and usage patterns)
  * docs/migration/INDEX.md (version migration guides)
  * docs/releases/INDEX.md (release documentation)
  * docs/reviews/INDEX.md (architectural reviews)
  * docs/security/INDEX.md (security documentation)

- Updated CLAUDE.md with complete folder descriptions including:
  * docs/migration/
  * docs/releases/
  * docs/security/

VERIFICATION:
- All ADR numbers now sequential and unique (50 total ADRs)
- No duplicate ADR numbers remain
- All cross-references updated and verified
- Documentation structure consistent and well-organized

These changes improve documentation discoverability, maintainability, and
ensure proper version tracking. All index files follow consistent format
with clear navigation guidance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-25 13:28:56 -07:00

11 KiB

Raw Blame History

ADR-055: Error Handling Philosophy

Status

Accepted

Context

StarPunk v1.1.1 focuses on production readiness, including graceful error handling. Currently, error handling is inconsistent:

Some errors crash the application
Error messages vary in helpfulness
No distinction between user and system errors
Insufficient context for debugging

We need a consistent philosophy for handling errors that balances user experience, security, and debuggability.

Decision

Adopt a layered error handling strategy that provides graceful degradation, helpful user messages, and detailed logging for operators.

Error Handling Principles

Fail Gracefully: Never crash when recovery is possible
Be Helpful: Provide actionable error messages
Log Everything: Detailed context for debugging
Secure by Default: Don't leak sensitive information
User vs System: Different handling for different audiences

Error Categories

1. User Errors (4xx class)

Errors caused by user action or client issues.

Examples:

Invalid Micropub request
Authentication failure
Missing required fields
Invalid slug format

Handling:

Return helpful error message
Suggest corrective action
Log at INFO level
Don't expose internals

2. System Errors (5xx class)

Errors in system operation.

Examples:

Database connection failure
File system errors
Memory exhaustion
Template rendering errors

Handling:

Generic user message
Detailed logging at ERROR level
Attempt recovery if possible
Alert operators (future)

3. Configuration Errors

Errors due to misconfiguration.

Examples:

Missing required config
Invalid configuration values
Incompatible settings
Permission issues

Handling:

Fail fast at startup
Clear error messages
Suggest fixes
Document requirements

4. Transient Errors

Temporary errors that may succeed on retry.

Examples:

Database lock
Network timeout
Resource temporarily unavailable

Handling:

Automatic retry with backoff
Log at WARNING level
Fail gracefully after retries
Track frequency

Error Response Format

Development Mode

{
  "error": {
    "type": "ValidationError",
    "message": "Invalid slug format",
    "details": {
      "field": "slug",
      "value": "my/bad/slug",
      "pattern": "^[a-z0-9-]+$"
    },
    "suggestion": "Slugs can only contain lowercase letters, numbers, and hyphens",
    "documentation": "/docs/api/micropub#slugs",
    "trace_id": "abc123"
  }
}

Production Mode

{
  "error": {
    "message": "Invalid request format",
    "suggestion": "Please check your request and try again",
    "documentation": "/docs/api/micropub",
    "trace_id": "abc123"
  }
}

Implementation Pattern

# starpunk/errors.py
from enum import Enum
from typing import Optional, Dict, Any
import logging

logger = logging.getLogger('starpunk.errors')

class ErrorCategory(Enum):
    USER = "user"
    SYSTEM = "system"
    CONFIG = "config"
    TRANSIENT = "transient"

class StarPunkError(Exception):
    """Base exception for all StarPunk errors"""

    def __init__(
        self,
        message: str,
        category: ErrorCategory = ErrorCategory.SYSTEM,
        suggestion: Optional[str] = None,
        details: Optional[Dict[str, Any]] = None,
        status_code: int = 500,
        recoverable: bool = False
    ):
        self.message = message
        self.category = category
        self.suggestion = suggestion
        self.details = details or {}
        self.status_code = status_code
        self.recoverable = recoverable
        super().__init__(message)

    def to_user_dict(self, debug: bool = False) -> dict:
        """Format error for user response"""
        result = {
            'error': {
                'message': self.message,
                'trace_id': self.trace_id
            }
        }

        if self.suggestion:
            result['error']['suggestion'] = self.suggestion

        if debug and self.details:
            result['error']['details'] = self.details
            result['error']['type'] = self.__class__.__name__

        return result

    def log(self):
        """Log error with appropriate level"""
        if self.category == ErrorCategory.USER:
            logger.info(
                "User error: %s",
                self.message,
                extra={'context': self.details}
            )
        elif self.category == ErrorCategory.TRANSIENT:
            logger.warning(
                "Transient error: %s",
                self.message,
                extra={'context': self.details}
            )
        else:
            logger.error(
                "System error: %s",
                self.message,
                extra={'context': self.details},
                exc_info=True
            )

# Specific error classes
class ValidationError(StarPunkError):
    """User input validation failed"""
    def __init__(self, message: str, field: str = None, **kwargs):
        super().__init__(
            message,
            category=ErrorCategory.USER,
            status_code=400,
            **kwargs
        )
        if field:
            self.details['field'] = field

class AuthenticationError(StarPunkError):
    """Authentication failed"""
    def __init__(self, message: str = "Authentication required", **kwargs):
        super().__init__(
            message,
            category=ErrorCategory.USER,
            status_code=401,
            suggestion="Please authenticate and try again",
            **kwargs
        )

class DatabaseError(StarPunkError):
    """Database operation failed"""
    def __init__(self, message: str, **kwargs):
        super().__init__(
            message,
            category=ErrorCategory.SYSTEM,
            status_code=500,
            suggestion="Please try again later",
            **kwargs
        )

class ConfigurationError(StarPunkError):
    """Configuration is invalid"""
    def __init__(self, message: str, setting: str = None, **kwargs):
        super().__init__(
            message,
            category=ErrorCategory.CONFIG,
            status_code=500,
            **kwargs
        )
        if setting:
            self.details['setting'] = setting

Error Handling Middleware

# starpunk/middleware/errors.py
def error_handler(func):
    """Decorator for consistent error handling"""
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except StarPunkError as e:
            e.log()
            return e.to_user_dict(debug=is_debug_mode())
        except Exception as e:
            # Unexpected error
            error = StarPunkError(
                message="An unexpected error occurred",
                category=ErrorCategory.SYSTEM,
                details={'original': str(e)}
            )
            error.log()
            return error.to_user_dict(debug=is_debug_mode())
    return wrapper

Graceful Degradation Examples

FTS5 Unavailable

try:
    # Attempt FTS5 search
    results = search_with_fts5(query)
except FTS5UnavailableError:
    logger.warning("FTS5 unavailable, falling back to LIKE")
    results = search_with_like(query)
    flash("Search is running in compatibility mode")

Database Lock

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=0.5, max=2),
    retry=retry_if_exception_type(sqlite3.OperationalError)
)
def execute_query(query):
    """Execute with retry for transient errors"""
    return db.execute(query)

Missing Optional Feature

if not config.SEARCH_ENABLED:
    # Return empty results instead of error
    return {
        'results': [],
        'message': 'Search is disabled on this instance'
    }

Rationale

Why Graceful Degradation?

User Experience: Don't break the whole app
Reliability: Partial functionality better than none
Operations: Easier to diagnose in production
Recovery: System can self-heal from transients

Why Different Error Categories?

Appropriate Response: Different errors need different handling
Security: Don't expose internals for system errors
Debugging: Operators need full context
User Experience: Users need actionable messages

Why Structured Errors?

Consistency: Predictable error format
Parsing: Tools can process errors
Correlation: Trace IDs link logs to responses
Documentation: Self-documenting error details

Consequences

Positive

Better UX: Helpful error messages
Easier Debugging: Rich context in logs
More Reliable: Graceful degradation
Secure: No information leakage
Consistent: Predictable error handling

Negative

More Code: Error handling adds complexity
Testing Burden: Many error paths to test
Performance: Error handling overhead
Maintenance: Error messages need updates

Mitigations

Use error hierarchy to reduce duplication
Generate tests for error paths
Cache error messages
Document error codes clearly

Alternatives Considered

1. Let Exceptions Bubble

Pros: Simple, Python default Cons: Poor UX, crashes, no context Decision: Not production-ready

2. Generic Error Pages

Pros: Simple to implement Cons: Not helpful, poor API experience Decision: Insufficient for Micropub API

3. Error Codes System

Pros: Precise, machine-readable Cons: Complex, needs documentation Decision: Over-engineered for our scale

4. Sentry/Error Tracking Service

Pros: Rich features, alerting Cons: External dependency, privacy Decision: Conflicts with self-hosted philosophy

Implementation Notes

Critical Path Protection

Always protect critical paths:

# Never let note creation completely fail
try:
    create_search_index(note)
except Exception as e:
    logger.error("Search indexing failed: %s", e)
    # Continue without search - note still created

Error Budget

Track error rates for SLO monitoring:

User errors: Unlimited (not our fault)
System errors: <0.1% of requests
Configuration errors: 0 after startup
Transient errors: <1% of requests

Testing Strategy

Unit tests for each error class
Integration tests for error paths
Chaos testing for transient errors
User journey tests with errors

Security Considerations

Never expose stack traces to users
Sanitize error messages
Rate limit error endpoints
Don't leak existence via errors
Log security errors specially

11 KiB Raw Blame History

ADR-055: Error Handling Philosophy

Status

Context

Decision

Error Handling Principles

Error Categories

1. User Errors (4xx class)

2. System Errors (5xx class)

3. Configuration Errors

4. Transient Errors

Error Response Format

Development Mode

Production Mode

Implementation Pattern

Error Handling Middleware

Graceful Degradation Examples

FTS5 Unavailable

Database Lock

Missing Optional Feature

Rationale

Why Graceful Degradation?

Why Different Error Categories?

Why Structured Errors?

Consequences

Positive

Negative

Mitigations

Alternatives Considered

1. Let Exceptions Bubble

2. Generic Error Pages

3. Error Codes System

4. Sentry/Error Tracking Service

Implementation Notes

Critical Path Protection

Error Budget

Testing Strategy

Security Considerations

Migration Path

References

Document History

11 KiB

Raw Blame History