Per v1.5.0 Phase 3: Fix N+1 query pattern in feed generation. Implementation: - Add get_media_for_notes() to starpunk/media.py for batch media loading - Add get_tags_for_notes() to starpunk/tags.py for batch tag loading - Update _get_cached_notes() in starpunk/routes/public.py to use batch loading - Add comprehensive tests in tests/test_batch_loading.py Performance improvement: - Before: O(n) queries (1 query per note for media + 1 query per note for tags) - After: O(1) queries (2 queries total: 1 for all media, 1 for all tags) - Maintains same API behavior and output format All tests passing: 920 passed in 360.79s 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
243 lines
8.0 KiB
Markdown
243 lines
8.0 KiB
Markdown
# Phase 2 Architect Review: Debug File Management
|
|
|
|
**Date**: 2025-12-17
|
|
**Reviewer**: Architect Agent
|
|
**Phase**: 2 - Debug File Management
|
|
**Status**: APPROVED
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
Phase 2 implementation meets all acceptance criteria and demonstrates sound architectural decisions. The implementation follows the principle of secure defaults, properly sanitizes user-controlled input, and provides appropriate cleanup mechanisms to prevent resource exhaustion.
|
|
|
|
---
|
|
|
|
## Acceptance Criteria Verification
|
|
|
|
| Criterion | Status | Notes |
|
|
|-----------|--------|-------|
|
|
| Debug files disabled by default | PASS | `DEBUG_SAVE_FAILED_UPLOADS` defaults to `False` |
|
|
| Files older than 7 days auto-deleted | PASS | Configurable via `DEBUG_FILE_MAX_AGE_DAYS` |
|
|
| Folder size limited to 100MB | PASS | Configurable via `DEBUG_FILE_MAX_SIZE_MB` |
|
|
| Filenames sanitized (no path traversal) | PASS | Pattern: alphanumeric, `.`, `_`, `-` only, truncated to 50 chars |
|
|
| Cleanup runs on startup | PASS | Called in `create_app()` before database init |
|
|
| Tests cover all scenarios | PASS | 15 tests covering config, saving, sanitization, cleanup, startup |
|
|
|
|
---
|
|
|
|
## Code Review
|
|
|
|
### 1. Configuration (`starpunk/config.py`)
|
|
|
|
**Lines 100-103:**
|
|
```python
|
|
app.config["DEBUG_SAVE_FAILED_UPLOADS"] = os.getenv("DEBUG_SAVE_FAILED_UPLOADS", "false").lower() == "true"
|
|
app.config["DEBUG_FILE_MAX_AGE_DAYS"] = int(os.getenv("DEBUG_FILE_MAX_AGE_DAYS", "7"))
|
|
app.config["DEBUG_FILE_MAX_SIZE_MB"] = int(os.getenv("DEBUG_FILE_MAX_SIZE_MB", "100"))
|
|
```
|
|
|
|
**Assessment**: Correct implementation.
|
|
- Boolean parsing uses explicit comparison to "true" (secure)
|
|
- Integer parsing uses sensible defaults
|
|
- Configuration follows existing patterns in the file
|
|
|
|
### 2. Filename Sanitization (`starpunk/media.py`)
|
|
|
|
**Lines 141-143:**
|
|
```python
|
|
# Sanitize filename to prevent path traversal (v1.5.0 security fix)
|
|
safe_filename = "".join(c for c in filename if c.isalnum() or c in "._-")[:50]
|
|
debug_file = debug_dir / f"failed_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{safe_filename}"
|
|
```
|
|
|
|
**Assessment**: Correct implementation of specified pattern.
|
|
- Allowlist approach (only alphanumeric, `.`, `_`, `-`)
|
|
- Truncation to 50 characters prevents filesystem issues
|
|
- Path traversal attack vectors (`../`, `..\\`, etc.) are neutralized
|
|
|
|
**Security Note**: The pattern `"._-"` correctly allows:
|
|
- `.` for file extensions
|
|
- `_` for separators
|
|
- `-` for common filename characters
|
|
|
|
The pattern removes:
|
|
- `/` and `\` (path separators)
|
|
- `:` (Windows drive letters)
|
|
- `<>"|?*` (shell metacharacters)
|
|
- Null bytes and other control characters
|
|
|
|
### 3. Cleanup Function (`starpunk/media.py`)
|
|
|
|
**Lines 785-857:**
|
|
|
|
**Assessment**: Well-designed two-phase cleanup algorithm.
|
|
|
|
**Phase 1 - Age-based cleanup:**
|
|
- Correctly calculates cutoff date
|
|
- Uses `timedelta` for date arithmetic
|
|
- Removes from list while iterating (uses slice copy correctly)
|
|
|
|
**Phase 2 - Size-based cleanup:**
|
|
- Correctly sums remaining file sizes
|
|
- Deletes oldest first (proper FIFO approach)
|
|
- Loop terminates when under limit or no files remain
|
|
|
|
**Edge cases handled:**
|
|
- Non-existent directory (early return)
|
|
- Empty directory (early return after glob)
|
|
- OSError during deletion (logged, continues)
|
|
|
|
### 4. Startup Integration (`starpunk/__init__.py`)
|
|
|
|
**Lines 130-132:**
|
|
```python
|
|
# Clean up old debug files (v1.5.0 Phase 2)
|
|
from starpunk.media import cleanup_old_debug_files
|
|
cleanup_old_debug_files(app)
|
|
```
|
|
|
|
**Assessment**: Correct placement.
|
|
- After logging configuration (can log cleanup actions)
|
|
- Before database initialization (no DB dependency)
|
|
- Uses app instance directly (not current_app, which wouldn't exist yet)
|
|
|
|
### 5. Test Coverage
|
|
|
|
**15 tests in 5 test classes:**
|
|
|
|
| Class | Tests | Purpose |
|
|
|-------|-------|---------|
|
|
| `TestDebugFileConfiguration` | 4 | Verify config defaults and override |
|
|
| `TestDebugFileSaving` | 2 | Verify conditional saving behavior |
|
|
| `TestDebugFilenameSanitization` | 3 | Security: path traversal, special chars, truncation |
|
|
| `TestDebugFileCleanup` | 5 | Age/size limits, edge cases |
|
|
| `TestDebugFileStartupCleanup` | 1 | Integration: cleanup during app init |
|
|
|
|
**Assessment**: Comprehensive coverage of all requirements.
|
|
|
|
---
|
|
|
|
## Security Analysis
|
|
|
|
### Path Traversal Prevention
|
|
|
|
The sanitization pattern effectively prevents all known path traversal attacks:
|
|
|
|
| Attack Vector | Input | Sanitized Output |
|
|
|---------------|-------|------------------|
|
|
| Basic traversal | `../../../etc/passwd` | `...etcpasswd` |
|
|
| Windows traversal | `..\\..\\..\system32` | `...system32` |
|
|
| Encoded traversal | `%2e%2e%2f` | `2e2e2f` (%-encoded already decoded before reaching code) |
|
|
| Null byte | `file\x00.jpg` | `file.jpg` |
|
|
|
|
### Default-Secure Configuration
|
|
|
|
The feature is disabled by default (`DEBUG_SAVE_FAILED_UPLOADS=false`), meaning:
|
|
- Production deployments are protected without configuration
|
|
- Debug files only appear when explicitly enabled
|
|
- No accidental data retention in production
|
|
|
|
### Resource Exhaustion Prevention
|
|
|
|
Dual limits prevent disk exhaustion:
|
|
- Age limit (7 days default): Prevents indefinite accumulation
|
|
- Size limit (100MB default): Hard cap on disk usage
|
|
- Both limits configurable for different environments
|
|
|
|
---
|
|
|
|
## Performance Considerations
|
|
|
|
### Startup Overhead
|
|
|
|
The `cleanup_old_debug_files()` function runs on every app startup:
|
|
|
|
```python
|
|
# Worst case: 1000 files in debug directory
|
|
# - glob: O(n) directory listing
|
|
# - sort: O(n log n)
|
|
# - deletion: O(n) unlink calls
|
|
|
|
# Estimated overhead: ~100ms for 100 files, ~500ms for 1000 files
|
|
```
|
|
|
|
**Assessment**: Acceptable. Startup is not a hot path, and the cleanup prevents greater issues.
|
|
|
|
**Recommendation**: Consider adding early exit if `DEBUG_SAVE_FAILED_UPLOADS=False` to skip cleanup entirely in production. However, this is a minor optimization and not required for approval.
|
|
|
|
---
|
|
|
|
## Architectural Alignment
|
|
|
|
### Project Philosophy
|
|
|
|
> "Every line of code must justify its existence."
|
|
|
|
This implementation follows the philosophy:
|
|
- Minimal code (single sanitization pattern)
|
|
- Single responsibility (cleanup function does one thing)
|
|
- Secure defaults (disabled in production)
|
|
- No unnecessary features
|
|
|
|
### Standards Compliance
|
|
|
|
The implementation follows established patterns:
|
|
- Configuration uses existing `os.getenv()` pattern
|
|
- Logging uses `app.logger`
|
|
- File operations use `pathlib.Path`
|
|
- Tests use pytest fixtures
|
|
|
|
---
|
|
|
|
## Minor Observations (Not Blocking)
|
|
|
|
1. **Config validation**: The new config options are not validated in `validate_config()`. Since they have sensible defaults and are non-critical, this is acceptable. However, negative values for MAX_AGE_DAYS or MAX_SIZE_MB could cause unexpected behavior.
|
|
|
|
2. **Logging verbosity**: When cleanup deletes many files, it logs a single summary message. This is appropriate for production but could mask issues during debugging.
|
|
|
|
3. **Race condition**: If multiple workers start simultaneously, they could attempt to delete the same file. The OSError handling covers this gracefully.
|
|
|
|
---
|
|
|
|
## Verdict
|
|
|
|
### APPROVED
|
|
|
|
Phase 2 is complete and meets all acceptance criteria. The implementation demonstrates:
|
|
- Secure defaults
|
|
- Proper input sanitization
|
|
- Comprehensive test coverage
|
|
- Good error handling
|
|
- Alignment with project architecture
|
|
|
|
**The developer may proceed to Phase 3: N+1 Query Fix (Feed Generation).**
|
|
|
|
---
|
|
|
|
## Checklist for Project Plan Update
|
|
|
|
The following items from RELEASE.md Phase 2 are complete:
|
|
|
|
- [x] Configuration options added (DEBUG_SAVE_FAILED_UPLOADS, DEBUG_FILE_MAX_AGE_DAYS, DEBUG_FILE_MAX_SIZE_MB)
|
|
- [x] Cleanup logic implemented in starpunk/media.py
|
|
- [x] Config check before saving debug files
|
|
- [x] `cleanup_old_debug_files()` function implemented
|
|
- [x] Startup cleanup integrated in `__init__.py`
|
|
- [x] Filename sanitization pattern implemented
|
|
- [x] Tests cover all scenarios (15 tests)
|
|
|
|
---
|
|
|
|
## Appendix: Test Verification
|
|
|
|
```
|
|
$ uv run pytest tests/test_debug_file_management.py -v
|
|
================================ 15 passed in 0.72s ============================
|
|
|
|
$ uv run pytest tests/test_media*.py -v
|
|
================================ 33 passed in 6.16s ============================
|
|
```
|
|
|
|
All tests pass consistently.
|