Gondulf/docs/designs/phase-4b-clarifications.md

# Phase 4b Security Hardening - Implementation Clarifications

Date: 2025-11-20

## Overview

This document provides clarifications for implementation questions raised during the Phase 4b Security Hardening design review. Each clarification includes the rationale and specific implementation guidance.

## Clarifications

### 1. Content Security Policy (CSP) img-src Directive

**Question**: Should `img-src 'self' https:` allow loading images from any HTTPS source, or should it be more restrictive?

**Answer**: Use `img-src 'self' https:` to allow any HTTPS source.

**Rationale**:
- IndieAuth clients may display various client logos and user profile images from external HTTPS sources
- Client applications registered via self-service could have logos hosted anywhere
- User profile images from IndieWeb sites could be hosted on various services
- Requiring explicit whitelisting would break the self-service registration model

**Implementation**:
```python
CSP_DIRECTIVES = {
    "default-src": "'self'",
    "script-src": "'self'",
    "style-src": "'self' 'unsafe-inline'",  # unsafe-inline for minimal CSS
    "img-src": "'self' https:",  # Allow any HTTPS image source
    "font-src": "'self'",
    "connect-src": "'self'",
    "frame-ancestors": "'none'"
}
```

### 2. HTTPS Enforcement with Reverse Proxy Support

**Question**: Should the HTTPS enforcement middleware check the `X-Forwarded-Proto` header for reverse proxy deployments?

**Answer**: Yes, check `X-Forwarded-Proto` header when configured for reverse proxy deployments.

**Rationale**:
- Many production deployments run behind reverse proxies (nginx, Apache, Cloudflare)
- The application sees HTTP from the proxy even when the client connection is HTTPS
- This is a standard pattern for Python web applications

**Implementation**:
```python
def is_https_request(request: Request) -> bool:
    """Check if request is HTTPS, considering reverse proxy headers."""
    # Direct HTTPS
    if request.url.scheme == "https":
        return True

    # Behind proxy - check forwarded header
    # Only trust this header in production with TRUST_PROXY=true
    if config.TRUST_PROXY:
        forwarded_proto = request.headers.get("X-Forwarded-Proto", "").lower()
        return forwarded_proto == "https"

    return False
```

**Configuration Addition**:
Add to config.py:
```python
# Security settings
HTTPS_REDIRECT: bool = True  # Redirect HTTP to HTTPS in production
TRUST_PROXY: bool = False    # Trust X-Forwarded-* headers from reverse proxy
```

### 3. Token Prefix Format for Logging

**Question**: Should partial token logging consistently use exactly 8 characters with ellipsis suffix?

**Answer**: Yes, use exactly 8 characters plus ellipsis for all token logging.

**Rationale**:
- Consistency aids in log parsing and monitoring
- 8 characters provides enough uniqueness for debugging (16^8 = 4.3 billion combinations)
- Ellipsis clearly indicates truncation to log readers
- Matches common security logging practices

**Implementation**:
```python
def mask_sensitive_value(value: str, prefix_len: int = 8) -> str:
    """Mask sensitive values for logging, showing only prefix."""
    if not value or len(value) <= prefix_len:
        return "***"
    return f"{value[:prefix_len]}..."

# Usage in logging
logger.info(f"Token validated", extra={
    "token_prefix": mask_sensitive_value(token, 8),
    "client_id": client_id
})
```

### 4. Timing Attack Test Reliability

**Question**: How should we handle potential flakiness in statistical timing attack tests, especially in CI environments?

**Answer**: Use a combination of increased sample size, relaxed thresholds for CI, and optional skip markers.

**Rationale**:
- CI environments have variable performance characteristics
- Statistical tests inherently have some variance
- We need to balance test reliability with meaningful security validation
- Some timing variation is acceptable as long as there's no clear correlation

**Implementation**:
```python
@pytest.mark.security
@pytest.mark.slow  # Mark as slow test
@pytest.mark.skipif(
    os.getenv("CI") == "true" and os.getenv("SKIP_TIMING_TESTS") == "true",
    reason="Timing tests disabled in CI"
)
def test_authorization_code_timing_attack_resistance():
    """Test that authorization code validation has consistent timing."""
    # Increase samples in CI for better statistics
    samples = 200 if os.getenv("CI") == "true" else 100

    # Use relaxed threshold in CI (30% vs 20% coefficient of variation)
    max_cv = 0.30 if os.getenv("CI") == "true" else 0.20

    # ... rest of test implementation

    # Check coefficient of variation (stddev/mean)
    cv = np.std(timings) / np.mean(timings)
    assert cv < max_cv, f"Timing variation too high: {cv:.2%} (max: {max_cv:.2%})"
```

**CI Configuration**:
Document in testing standards that `SKIP_TIMING_TESTS=true` can be set in CI if timing tests prove unreliable in a particular environment.

### 5. SQL Injection Test Implementation

**Question**: Should SQL injection tests actually read and inspect source files for patterns? Are there concerns about false positives?

**Answer**: No, do not inspect source files. Use actual injection attempts and verify behavior.

**Rationale**:
- Source code inspection is fragile and prone to false positives
- Testing actual behavior is more reliable than pattern matching
- SQLAlchemy's parameterized queries should handle this at runtime
- Behavioral testing confirms the security measure works end-to-end

**Implementation**:
```python
@pytest.mark.security
def test_sql_injection_prevention():
    """Test that SQL injection attempts are properly prevented."""
    # Test actual injection attempts, not source code patterns
    injection_attempts = [
        "'; DROP TABLE users; --",
        "' OR '1'='1",
        "admin'--",
        "' UNION SELECT * FROM tokens--",
        "'; INSERT INTO clients VALUES ('evil', 'client'); --"
    ]

    for attempt in injection_attempts:
        # Attempt injection via client_id parameter
        response = client.get(
            "/authorize",
            params={"client_id": attempt, "response_type": "code"}
        )

        # Should get client not found, not SQL error
        assert response.status_code == 400
        assert "invalid_client" in response.json()["error"]

        # Verify no SQL error in logs (would indicate query wasn't escaped)
        # This would be checked via log capture in test fixtures
```

### 6. HTTPS Redirect Configuration

**Question**: Should `HTTPS_REDIRECT` configuration option be added to the Config class in Phase 4b?

**Answer**: Yes, add both `HTTPS_REDIRECT` and `TRUST_PROXY` to the Config class.

**Rationale**:
- Security features need runtime configuration
- Different deployment environments have different requirements
- Development needs HTTP for local testing
- Production typically needs HTTPS enforcement

**Implementation**:
Add to `src/config.py`:
```python
class Config:
    """Application configuration."""

    # Existing configuration...

    # Security configuration
    HTTPS_REDIRECT: bool = Field(
        default=True,
        description="Redirect HTTP requests to HTTPS in production"
    )

    TRUST_PROXY: bool = Field(
        default=False,
        description="Trust X-Forwarded-* headers from reverse proxy"
    )

    SECURE_COOKIES: bool = Field(
        default=True,
        description="Set secure flag on cookies (requires HTTPS)"
    )

    @validator("HTTPS_REDIRECT")
    def validate_https_redirect(cls, v, values):
        """Disable HTTPS redirect in development."""
        if values.get("ENV") == "development":
            return False
        return v
```

### 7. Pytest Security Marker Registration

**Question**: Should `@pytest.mark.security` be registered in pytest configuration?

**Answer**: Yes, register the marker in `pytest.ini` or `pyproject.toml`.

**Rationale**:
- Prevents pytest warnings about unregistered markers
- Enables running security tests separately: `pytest -m security`
- Documents available test categories
- Follows pytest best practices

**Implementation**:
Create or update `pytest.ini`:
```ini
[tool:pytest]
markers =
    security: Security-related tests (timing attacks, injection, headers)
    slow: Tests that take longer to run (timing attack statistics)
    integration: Integration tests requiring full application context
```

Or in `pyproject.toml`:
```toml
[tool.pytest.ini_options]
markers = [
    "security: Security-related tests (timing attacks, injection, headers)",
    "slow: Tests that take longer to run (timing attack statistics)",
    "integration: Integration tests requiring full application context",
]
```

**Usage**:
```bash
# Run only security tests
pytest -m security

# Run all except slow tests
pytest -m "not slow"

# Run security tests but not slow ones
pytest -m "security and not slow"
```

### 8. Secure Logging Guidelines Documentation

**Question**: How should secure logging guidelines be structured in the coding standards?

**Answer**: Add a dedicated "Security Practices" section to `/docs/standards/coding.md` with specific logging subsection.

**Rationale**:
- Security practices deserve prominent placement in coding standards
- Developers need clear, findable guidelines
- Examples make guidelines actionable
- Should cover both what to log and what not to log

**Implementation**:
Add to `/docs/standards/coding.md`:

```markdown
## Security Practices

### Secure Logging Guidelines

#### Never Log Sensitive Data

The following must NEVER appear in logs:
- Full tokens (authorization codes, access tokens, refresh tokens)
- Passwords or secrets
- Full authorization codes
- Private keys or certificates
- Personally identifiable information (PII) beyond user identifiers

#### Safe Logging Practices

When logging security-relevant events, follow these practices:

1. **Token Prefixes**: When token identification is necessary, log only the first 8 characters:
   ```python
   logger.info("Token validated", extra={
       "token_prefix": token[:8] + "..." if len(token) > 8 else "***",
       "client_id": client_id
   })
   ```

2. **Request Context**: Log security events with context:
   ```python
   logger.warning("Authorization failed", extra={
       "client_id": client_id,
       "ip_address": request.client.host,
       "user_agent": request.headers.get("User-Agent", "unknown"),
       "error": error_code  # Use error codes, not full messages
   })
   ```

3. **Security Events to Log**:
   - Failed authentication attempts
   - Token validation failures
   - Rate limit violations
   - Input validation failures
   - HTTPS redirect actions
   - Client registration events

4. **Use Structured Logging**: Include metadata as structured fields:
   ```python
   logger.info("Client registered", extra={
       "event": "client.registered",
       "client_id": client_id,
       "registration_method": "self_service",
       "timestamp": datetime.utcnow().isoformat()
   })
   ```

5. **Sanitize User Input**: Always sanitize user-provided data before logging:
   ```python
   def sanitize_for_logging(value: str, max_length: int = 100) -> str:
       """Sanitize user input for safe logging."""
       # Remove control characters
       value = "".join(ch for ch in value if ch.isprintable())
       # Truncate if too long
       if len(value) > max_length:
           value = value[:max_length] + "..."
       return value
   ```

#### Security Audit Logging

For security-critical operations, use a dedicated audit logger:

```python
audit_logger = logging.getLogger("security.audit")

# Log security-critical events
audit_logger.info("Token issued", extra={
    "event": "token.issued",
    "client_id": client_id,
    "scope": scope,
    "expires_in": expires_in,
    "ip_address": request.client.host
})
```

#### Testing Logging Security

Include tests that verify sensitive data doesn't leak into logs:

```python
def test_no_token_in_logs(caplog):
    """Verify tokens are not logged in full."""
    token = "sensitive_token_abc123xyz789"

    # Perform operation that logs token
    validate_token(token)

    # Check logs don't contain full token
    for record in caplog.records:
        assert token not in record.getMessage()
        # But prefix might be present
        assert token[:8] in record.getMessage() or "***" in record.getMessage()
```
```

## Summary

All clarifications maintain the principle of simplicity while ensuring security. Key decisions:

1. **CSP allows any HTTPS image source** - supports self-service model
2. **HTTPS middleware checks proxy headers when configured** - supports real deployments
3. **Token prefixes use consistent 8-char + ellipsis format** - aids monitoring
4. **Timing tests use relaxed thresholds in CI** - balances reliability with security validation
5. **SQL injection tests use behavioral testing** - more reliable than source inspection
6. **Security config added to Config class** - runtime configuration for different environments
7. **Pytest markers registered properly** - enables targeted test runs
8. **Comprehensive security logging guidelines** - clear, actionable developer guidance

These clarifications ensure the Developer can proceed with implementation without ambiguity while maintaining security best practices.