StarPunk/docs/decisions/ADR-054-structured-logging-architecture.md

# ADR-054: Structured Logging Architecture

## Status
Accepted

## Context
StarPunk currently uses print statements and basic logging without structure. For production deployments, we need:
- Consistent log formatting
- Appropriate log levels
- Structured data for parsing
- Correlation IDs for request tracking
- Performance-conscious logging

We need a logging architecture that is simple, follows Python best practices, and provides production-grade observability.

## Decision
Implement structured logging using Python's built-in `logging` module with JSON formatting and contextual information.

### Logging Architecture

```
Application Code
      ↓
Logger Interface → Filters → Formatters → Handlers → Output
      ↑                                              ↓
Context Injection                            (stdout/file)
```

### Log Levels

Following standard Python/syslog levels:

| Level | Value | Usage |
|-------|-------|-------|
| CRITICAL | 50 | System failures requiring immediate attention |
| ERROR | 40 | Errors that need investigation |
| WARNING | 30 | Unexpected conditions that might cause issues |
| INFO | 20 | Normal operation events |
| DEBUG | 10 | Detailed diagnostic information |

### Log Structure

JSON format for production, human-readable for development:

```json
{
  "timestamp": "2025-11-25T10:30:45.123Z",
  "level": "INFO",
  "logger": "starpunk.micropub",
  "message": "Note created",
  "request_id": "a1b2c3d4",
  "user": "alice@example.com",
  "context": {
    "note_id": 123,
    "slug": "my-note",
    "word_count": 42
  },
  "performance": {
    "duration_ms": 45
  }
}
```

### Logger Hierarchy

```
starpunk (root logger)
├── starpunk.auth        # Authentication/authorization
├── starpunk.micropub    # Micropub endpoint
├── starpunk.database    # Database operations
├── starpunk.search      # Search functionality
├── starpunk.web        # Web interface
├── starpunk.rss        # RSS generation
├── starpunk.monitoring # Performance monitoring
└── starpunk.migration  # Database migrations
```

### Implementation Pattern

```python
# starpunk/logging.py
import logging
import json
import sys
from datetime import datetime
from contextvars import ContextVar

# Request context for correlation
request_id: ContextVar[str] = ContextVar('request_id', default='')

class StructuredFormatter(logging.Formatter):
    """JSON formatter for structured logging"""

    def format(self, record):
        log_obj = {
            'timestamp': datetime.utcnow().isoformat() + 'Z',
            'level': record.levelname,
            'logger': record.name,
            'message': record.getMessage(),
            'request_id': request_id.get()
        }

        # Add extra fields
        if hasattr(record, 'context'):
            log_obj['context'] = record.context

        if hasattr(record, 'performance'):
            log_obj['performance'] = record.performance

        # Add exception info if present
        if record.exc_info:
            log_obj['exception'] = self.formatException(record.exc_info)

        return json.dumps(log_obj)

def setup_logging(level='INFO', format_type='json'):
    """Configure logging for the application"""
    root_logger = logging.getLogger('starpunk')
    root_logger.setLevel(level)

    handler = logging.StreamHandler(sys.stdout)

    if format_type == 'json':
        formatter = StructuredFormatter()
    else:
        # Human-readable for development
        formatter = logging.Formatter(
            '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
        )

    handler.setFormatter(formatter)
    root_logger.addHandler(handler)

    return root_logger

# Usage pattern
logger = logging.getLogger('starpunk.micropub')

def create_note(content, user):
    logger.info(
        "Creating note",
        extra={
            'context': {
                'user': user,
                'content_length': len(content)
            }
        }
    )
    # ... implementation
```

### What to Log

#### Always Log (INFO+)
- Authentication attempts (success/failure)
- Note CRUD operations
- Configuration changes
- Startup/shutdown
- External API calls
- Migration execution
- Search queries

#### Error Conditions (ERROR)
- Database connection failures
- Invalid Micropub requests
- Authentication failures
- File system errors
- Configuration errors

#### Warnings (WARNING)
- Slow queries
- High memory usage
- Deprecated feature usage
- Missing optional configuration
- FTS5 unavailability

#### Debug Information (DEBUG)
- SQL queries executed
- Request/response bodies
- Template rendering details
- Cache operations
- Detailed timing data

### What NOT to Log
- Passwords or tokens
- Full note content (unless debug)
- Personal information (PII)
- Request headers with auth
- Database connection strings

### Performance Considerations

1. **Lazy Evaluation**: Use lazy % formatting
   ```python
   logger.debug("Processing note %s", note_id)  # Good
   logger.debug(f"Processing note {note_id}")   # Bad
   ```

2. **Level Checking**: Check before expensive operations
   ```python
   if logger.isEnabledFor(logging.DEBUG):
       logger.debug("Data: %s", expensive_serialization())
   ```

3. **Async Logging**: For high-volume scenarios (future)

4. **Sampling**: For very frequent operations
   ```python
   if random.random() < 0.1:  # Log 10%
       logger.debug("High frequency operation")
   ```

## Rationale

### Why Standard Logging Module?
1. **No Dependencies**: Built into Python
2. **Industry Standard**: Well understood
3. **Flexible**: Handlers, formatters, filters
4. **Battle-tested**: Proven in production
5. **Integration**: Works with existing tools

### Why JSON Format?
1. **Parseable**: Easy for log aggregators
2. **Structured**: Consistent field access
3. **Flexible**: Can add fields without breaking
4. **Standard**: Widely supported

### Why Not Alternatives?

**structlog**:
- Additional dependency
- More complex API
- Overkill for our needs

**loguru**:
- Third-party dependency
- Non-standard API
- Not necessary for our scale

**Print statements**:
- No levels
- No structure
- No filtering
- Not production-ready

## Consequences

### Positive
1. **Production Ready**: Professional logging
2. **Debuggable**: Rich context in logs
3. **Parseable**: Integration with log tools
4. **Performant**: Minimal overhead
5. **Configurable**: Adjust without code changes
6. **Correlatable**: Request tracking via IDs

### Negative
1. **Verbosity**: More code for logging
2. **Learning**: Developers must understand levels
3. **Size**: JSON logs are larger than plain text
4. **Complexity**: More setup than prints

### Mitigations
1. Provide logging utilities/helpers
2. Document logging guidelines
3. Use log rotation for size management
4. Create developer-friendly formatter option

## Alternatives Considered

### 1. Continue with Print Statements
**Pros**: Simplest possible
**Cons**: Not production-ready
**Decision**: Inadequate for production

### 2. Custom Logging Solution
**Pros**: Exactly what we need
**Cons**: Reinventing the wheel
**Decision**: Standard library is sufficient

### 3. External Logging Service
**Pros**: No local storage needed
**Cons**: Privacy, dependency, cost
**Decision**: Conflicts with self-hosted philosophy

### 4. Syslog Integration
**Pros**: Standard Unix logging
**Cons**: Platform-specific, complexity
**Decision**: Can add as handler if needed

## Implementation Notes

### Bootstrap Logging
```python
# Application startup
import logging
from starpunk.logging import setup_logging

# Configure based on environment
if os.environ.get('STARPUNK_ENV') == 'production':
    setup_logging(level='INFO', format_type='json')
else:
    setup_logging(level='DEBUG', format_type='human')
```

### Request Correlation
```python
# Middleware sets request ID
from uuid import uuid4
from contextvars import copy_context

def middleware(request):
    request_id.set(str(uuid4())[:8])
    # Process request in context
    return copy_context().run(handler, request)
```

### Migration Strategy
1. Phase 1: Add logging module, keep prints
2. Phase 2: Convert prints to logger calls
3. Phase 3: Remove print statements
4. Phase 4: Add structured context

## Testing Strategy

1. **Unit Tests**: Mock logger, verify calls
2. **Integration Tests**: Verify log output format
3. **Performance Tests**: Measure logging overhead
4. **Configuration Tests**: Test different levels/formats

## Configuration

Environment variables:
- `STARPUNK_LOG_LEVEL`: DEBUG|INFO|WARNING|ERROR|CRITICAL
- `STARPUNK_LOG_FORMAT`: json|human
- `STARPUNK_LOG_FILE`: Path to log file (optional)
- `STARPUNK_LOG_ROTATION`: Enable rotation (optional)

## Security Considerations

1. Never log sensitive data
2. Sanitize user input in logs
3. Rate limit log output
4. Monitor for log injection attacks
5. Secure log file permissions

## References

- [Python Logging HOWTO](https://docs.python.org/3/howto/logging.html)
- [The Twelve-Factor App - Logs](https://12factor.net/logs)
- [OWASP Logging Guide](https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html)
- [JSON Logging Best Practices](https://www.loggly.com/use-cases/json-logging-best-practices/)

## Document History

- 2025-11-25: Initial draft for v1.1.1 release planning