Files
StarPunk/docs/design/v1.5.0/2025-12-16-phase2-architect-review.md
Phil Skentelbery b689e02e64 perf(feed): Batch load media and tags to fix N+1 query
Per v1.5.0 Phase 3: Fix N+1 query pattern in feed generation.

Implementation:
- Add get_media_for_notes() to starpunk/media.py for batch media loading
- Add get_tags_for_notes() to starpunk/tags.py for batch tag loading
- Update _get_cached_notes() in starpunk/routes/public.py to use batch loading
- Add comprehensive tests in tests/test_batch_loading.py

Performance improvement:
- Before: O(n) queries (1 query per note for media + 1 query per note for tags)
- After: O(1) queries (2 queries total: 1 for all media, 1 for all tags)
- Maintains same API behavior and output format

All tests passing: 920 passed in 360.79s

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-17 10:42:44 -07:00

8.0 KiB

Phase 2 Architect Review: Debug File Management

Date: 2025-12-17 Reviewer: Architect Agent Phase: 2 - Debug File Management Status: APPROVED


Executive Summary

Phase 2 implementation meets all acceptance criteria and demonstrates sound architectural decisions. The implementation follows the principle of secure defaults, properly sanitizes user-controlled input, and provides appropriate cleanup mechanisms to prevent resource exhaustion.


Acceptance Criteria Verification

Criterion Status Notes
Debug files disabled by default PASS DEBUG_SAVE_FAILED_UPLOADS defaults to False
Files older than 7 days auto-deleted PASS Configurable via DEBUG_FILE_MAX_AGE_DAYS
Folder size limited to 100MB PASS Configurable via DEBUG_FILE_MAX_SIZE_MB
Filenames sanitized (no path traversal) PASS Pattern: alphanumeric, ., _, - only, truncated to 50 chars
Cleanup runs on startup PASS Called in create_app() before database init
Tests cover all scenarios PASS 15 tests covering config, saving, sanitization, cleanup, startup

Code Review

1. Configuration (starpunk/config.py)

Lines 100-103:

app.config["DEBUG_SAVE_FAILED_UPLOADS"] = os.getenv("DEBUG_SAVE_FAILED_UPLOADS", "false").lower() == "true"
app.config["DEBUG_FILE_MAX_AGE_DAYS"] = int(os.getenv("DEBUG_FILE_MAX_AGE_DAYS", "7"))
app.config["DEBUG_FILE_MAX_SIZE_MB"] = int(os.getenv("DEBUG_FILE_MAX_SIZE_MB", "100"))

Assessment: Correct implementation.

  • Boolean parsing uses explicit comparison to "true" (secure)
  • Integer parsing uses sensible defaults
  • Configuration follows existing patterns in the file

2. Filename Sanitization (starpunk/media.py)

Lines 141-143:

# Sanitize filename to prevent path traversal (v1.5.0 security fix)
safe_filename = "".join(c for c in filename if c.isalnum() or c in "._-")[:50]
debug_file = debug_dir / f"failed_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{safe_filename}"

Assessment: Correct implementation of specified pattern.

  • Allowlist approach (only alphanumeric, ., _, -)
  • Truncation to 50 characters prevents filesystem issues
  • Path traversal attack vectors (../, ..\\, etc.) are neutralized

Security Note: The pattern "._-" correctly allows:

  • . for file extensions
  • _ for separators
  • - for common filename characters

The pattern removes:

  • / and \ (path separators)
  • : (Windows drive letters)
  • <>"|?* (shell metacharacters)
  • Null bytes and other control characters

3. Cleanup Function (starpunk/media.py)

Lines 785-857:

Assessment: Well-designed two-phase cleanup algorithm.

Phase 1 - Age-based cleanup:

  • Correctly calculates cutoff date
  • Uses timedelta for date arithmetic
  • Removes from list while iterating (uses slice copy correctly)

Phase 2 - Size-based cleanup:

  • Correctly sums remaining file sizes
  • Deletes oldest first (proper FIFO approach)
  • Loop terminates when under limit or no files remain

Edge cases handled:

  • Non-existent directory (early return)
  • Empty directory (early return after glob)
  • OSError during deletion (logged, continues)

4. Startup Integration (starpunk/__init__.py)

Lines 130-132:

# Clean up old debug files (v1.5.0 Phase 2)
from starpunk.media import cleanup_old_debug_files
cleanup_old_debug_files(app)

Assessment: Correct placement.

  • After logging configuration (can log cleanup actions)
  • Before database initialization (no DB dependency)
  • Uses app instance directly (not current_app, which wouldn't exist yet)

5. Test Coverage

15 tests in 5 test classes:

Class Tests Purpose
TestDebugFileConfiguration 4 Verify config defaults and override
TestDebugFileSaving 2 Verify conditional saving behavior
TestDebugFilenameSanitization 3 Security: path traversal, special chars, truncation
TestDebugFileCleanup 5 Age/size limits, edge cases
TestDebugFileStartupCleanup 1 Integration: cleanup during app init

Assessment: Comprehensive coverage of all requirements.


Security Analysis

Path Traversal Prevention

The sanitization pattern effectively prevents all known path traversal attacks:

Attack Vector Input Sanitized Output
Basic traversal ../../../etc/passwd ...etcpasswd
Windows traversal ..\\..\\..\system32 ...system32
Encoded traversal %2e%2e%2f 2e2e2f (%-encoded already decoded before reaching code)
Null byte file\x00.jpg file.jpg

Default-Secure Configuration

The feature is disabled by default (DEBUG_SAVE_FAILED_UPLOADS=false), meaning:

  • Production deployments are protected without configuration
  • Debug files only appear when explicitly enabled
  • No accidental data retention in production

Resource Exhaustion Prevention

Dual limits prevent disk exhaustion:

  • Age limit (7 days default): Prevents indefinite accumulation
  • Size limit (100MB default): Hard cap on disk usage
  • Both limits configurable for different environments

Performance Considerations

Startup Overhead

The cleanup_old_debug_files() function runs on every app startup:

# Worst case: 1000 files in debug directory
# - glob: O(n) directory listing
# - sort: O(n log n)
# - deletion: O(n) unlink calls

# Estimated overhead: ~100ms for 100 files, ~500ms for 1000 files

Assessment: Acceptable. Startup is not a hot path, and the cleanup prevents greater issues.

Recommendation: Consider adding early exit if DEBUG_SAVE_FAILED_UPLOADS=False to skip cleanup entirely in production. However, this is a minor optimization and not required for approval.


Architectural Alignment

Project Philosophy

"Every line of code must justify its existence."

This implementation follows the philosophy:

  • Minimal code (single sanitization pattern)
  • Single responsibility (cleanup function does one thing)
  • Secure defaults (disabled in production)
  • No unnecessary features

Standards Compliance

The implementation follows established patterns:

  • Configuration uses existing os.getenv() pattern
  • Logging uses app.logger
  • File operations use pathlib.Path
  • Tests use pytest fixtures

Minor Observations (Not Blocking)

  1. Config validation: The new config options are not validated in validate_config(). Since they have sensible defaults and are non-critical, this is acceptable. However, negative values for MAX_AGE_DAYS or MAX_SIZE_MB could cause unexpected behavior.

  2. Logging verbosity: When cleanup deletes many files, it logs a single summary message. This is appropriate for production but could mask issues during debugging.

  3. Race condition: If multiple workers start simultaneously, they could attempt to delete the same file. The OSError handling covers this gracefully.


Verdict

APPROVED

Phase 2 is complete and meets all acceptance criteria. The implementation demonstrates:

  • Secure defaults
  • Proper input sanitization
  • Comprehensive test coverage
  • Good error handling
  • Alignment with project architecture

The developer may proceed to Phase 3: N+1 Query Fix (Feed Generation).


Checklist for Project Plan Update

The following items from RELEASE.md Phase 2 are complete:

  • Configuration options added (DEBUG_SAVE_FAILED_UPLOADS, DEBUG_FILE_MAX_AGE_DAYS, DEBUG_FILE_MAX_SIZE_MB)
  • Cleanup logic implemented in starpunk/media.py
  • Config check before saving debug files
  • cleanup_old_debug_files() function implemented
  • Startup cleanup integrated in __init__.py
  • Filename sanitization pattern implemented
  • Tests cover all scenarios (15 tests)

Appendix: Test Verification

$ uv run pytest tests/test_debug_file_management.py -v
================================ 15 passed in 0.72s ============================

$ uv run pytest tests/test_media*.py -v
================================ 33 passed in 6.16s ============================

All tests pass consistently.