Files

Phil Skentelbery b689e02e64 perf(feed): Batch load media and tags to fix N+1 query

Per v1.5.0 Phase 3: Fix N+1 query pattern in feed generation.

Implementation:
- Add get_media_for_notes() to starpunk/media.py for batch media loading
- Add get_tags_for_notes() to starpunk/tags.py for batch tag loading
- Update _get_cached_notes() in starpunk/routes/public.py to use batch loading
- Add comprehensive tests in tests/test_batch_loading.py

Performance improvement:
- Before: O(n) queries (1 query per note for media + 1 query per note for tags)
- After: O(1) queries (2 queries total: 1 for all media, 1 for all tags)
- Maintains same API behavior and output format

All tests passing: 920 passed in 360.79s

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-17 10:42:44 -07:00

8.0 KiB

Raw Blame History

Phase 2 Architect Review: Debug File Management

Date: 2025-12-17 Reviewer: Architect Agent Phase: 2 - Debug File Management Status: APPROVED

Executive Summary

Phase 2 implementation meets all acceptance criteria and demonstrates sound architectural decisions. The implementation follows the principle of secure defaults, properly sanitizes user-controlled input, and provides appropriate cleanup mechanisms to prevent resource exhaustion.

Acceptance Criteria Verification

Criterion	Status	Notes
Debug files disabled by default	PASS	`DEBUG_SAVE_FAILED_UPLOADS` defaults to `False`
Files older than 7 days auto-deleted	PASS	Configurable via `DEBUG_FILE_MAX_AGE_DAYS`
Folder size limited to 100MB	PASS	Configurable via `DEBUG_FILE_MAX_SIZE_MB`
Filenames sanitized (no path traversal)	PASS	Pattern: alphanumeric, `.`, `_`, `-` only, truncated to 50 chars
Cleanup runs on startup	PASS	Called in `create_app()` before database init
Tests cover all scenarios	PASS	15 tests covering config, saving, sanitization, cleanup, startup

Code Review

1. Configuration (`starpunk/config.py`)

Lines 100-103:

app.config["DEBUG_SAVE_FAILED_UPLOADS"] = os.getenv("DEBUG_SAVE_FAILED_UPLOADS", "false").lower() == "true"
app.config["DEBUG_FILE_MAX_AGE_DAYS"] = int(os.getenv("DEBUG_FILE_MAX_AGE_DAYS", "7"))
app.config["DEBUG_FILE_MAX_SIZE_MB"] = int(os.getenv("DEBUG_FILE_MAX_SIZE_MB", "100"))

Assessment: Correct implementation.

Boolean parsing uses explicit comparison to "true" (secure)
Integer parsing uses sensible defaults
Configuration follows existing patterns in the file

2. Filename Sanitization (`starpunk/media.py`)

Lines 141-143:

# Sanitize filename to prevent path traversal (v1.5.0 security fix)
safe_filename = "".join(c for c in filename if c.isalnum() or c in "._-")[:50]
debug_file = debug_dir / f"failed_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{safe_filename}"

Assessment: Correct implementation of specified pattern.

Allowlist approach (only alphanumeric, ., _, -)
Truncation to 50 characters prevents filesystem issues
Path traversal attack vectors (../, ..\\, etc.) are neutralized

Security Note: The pattern "._-" correctly allows:

. for file extensions
_ for separators
- for common filename characters

The pattern removes:

/ and \ (path separators)
: (Windows drive letters)
<>"|?* (shell metacharacters)
Null bytes and other control characters

3. Cleanup Function (`starpunk/media.py`)

Lines 785-857:

Assessment: Well-designed two-phase cleanup algorithm.

Phase 1 - Age-based cleanup:

Correctly calculates cutoff date
Uses timedelta for date arithmetic
Removes from list while iterating (uses slice copy correctly)

Phase 2 - Size-based cleanup:

Correctly sums remaining file sizes
Deletes oldest first (proper FIFO approach)
Loop terminates when under limit or no files remain

Edge cases handled:

Non-existent directory (early return)
Empty directory (early return after glob)
OSError during deletion (logged, continues)

4. Startup Integration (`starpunk/init.py`)

Lines 130-132:

# Clean up old debug files (v1.5.0 Phase 2)
from starpunk.media import cleanup_old_debug_files
cleanup_old_debug_files(app)

Assessment: Correct placement.

After logging configuration (can log cleanup actions)
Before database initialization (no DB dependency)
Uses app instance directly (not current_app, which wouldn't exist yet)

5. Test Coverage

15 tests in 5 test classes:

Class	Tests	Purpose
`TestDebugFileConfiguration`	4	Verify config defaults and override
`TestDebugFileSaving`	2	Verify conditional saving behavior
`TestDebugFilenameSanitization`	3	Security: path traversal, special chars, truncation
`TestDebugFileCleanup`	5	Age/size limits, edge cases
`TestDebugFileStartupCleanup`	1	Integration: cleanup during app init

Assessment: Comprehensive coverage of all requirements.

Security Analysis

Path Traversal Prevention

The sanitization pattern effectively prevents all known path traversal attacks:

Attack Vector	Input	Sanitized Output
Basic traversal	`../../../etc/passwd`	`...etcpasswd`
Windows traversal	`..\\..\\..\system32`	`...system32`
Encoded traversal	`%2e%2e%2f`	`2e2e2f` (%-encoded already decoded before reaching code)
Null byte	`file\x00.jpg`	`file.jpg`

Default-Secure Configuration

The feature is disabled by default (DEBUG_SAVE_FAILED_UPLOADS=false), meaning:

Production deployments are protected without configuration
Debug files only appear when explicitly enabled
No accidental data retention in production

Resource Exhaustion Prevention

Dual limits prevent disk exhaustion:

Age limit (7 days default): Prevents indefinite accumulation
Size limit (100MB default): Hard cap on disk usage
Both limits configurable for different environments

Performance Considerations

Startup Overhead

The cleanup_old_debug_files() function runs on every app startup:

# Worst case: 1000 files in debug directory
# - glob: O(n) directory listing
# - sort: O(n log n)
# - deletion: O(n) unlink calls

# Estimated overhead: ~100ms for 100 files, ~500ms for 1000 files

Assessment: Acceptable. Startup is not a hot path, and the cleanup prevents greater issues.

Recommendation: Consider adding early exit if DEBUG_SAVE_FAILED_UPLOADS=False to skip cleanup entirely in production. However, this is a minor optimization and not required for approval.

Architectural Alignment

Project Philosophy

"Every line of code must justify its existence."

This implementation follows the philosophy:

Minimal code (single sanitization pattern)
Single responsibility (cleanup function does one thing)
Secure defaults (disabled in production)
No unnecessary features

Standards Compliance

The implementation follows established patterns:

Configuration uses existing os.getenv() pattern
Logging uses app.logger
File operations use pathlib.Path
Tests use pytest fixtures

Minor Observations (Not Blocking)

Config validation: The new config options are not validated in validate_config(). Since they have sensible defaults and are non-critical, this is acceptable. However, negative values for MAX_AGE_DAYS or MAX_SIZE_MB could cause unexpected behavior.
Logging verbosity: When cleanup deletes many files, it logs a single summary message. This is appropriate for production but could mask issues during debugging.
Race condition: If multiple workers start simultaneously, they could attempt to delete the same file. The OSError handling covers this gracefully.

Verdict

APPROVED

Phase 2 is complete and meets all acceptance criteria. The implementation demonstrates:

Secure defaults
Proper input sanitization
Comprehensive test coverage
Good error handling
Alignment with project architecture

The developer may proceed to Phase 3: N+1 Query Fix (Feed Generation).

Checklist for Project Plan Update

The following items from RELEASE.md Phase 2 are complete:

Configuration options added (DEBUG_SAVE_FAILED_UPLOADS, DEBUG_FILE_MAX_AGE_DAYS, DEBUG_FILE_MAX_SIZE_MB)
Cleanup logic implemented in starpunk/media.py
Config check before saving debug files
cleanup_old_debug_files() function implemented
Startup cleanup integrated in __init__.py
Filename sanitization pattern implemented
Tests cover all scenarios (15 tests)

Appendix: Test Verification

$ uv run pytest tests/test_debug_file_management.py -v
================================ 15 passed in 0.72s ============================

$ uv run pytest tests/test_media*.py -v
================================ 33 passed in 6.16s ============================

All tests pass consistently.

8.0 KiB Raw Blame History