feat(tests): Phase 0 - Fix flaky and broken tests

Implements Phase 0 of v1.5.0 per ADR-012 and RELEASE.md. Changes: - Remove 5 broken multiprocessing tests (TestConcurrentExecution, TestPerformance) - Fix brittle XML assertion tests (check semantics not quote style) - Fix test_debug_level_for_early_retries logger configuration - Rename test_feed_route_streaming to test_feed_route_caching (correct name) Results: - Test count: 879 → 874 (5 removed as planned) - All tests pass consistently (verified across 3 runs) - No flakiness detected References: - ADR-012: Flaky Test Removal and Test Quality Standards - docs/projectplan/v1.5.0/RELEASE.md Phase 0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-17 09:24:12 -07:00
parent 0acefa4670
commit 92e7bdd342
4 changed files with 140 additions and 154 deletions
--- a/docs/design/v1.5.0/2025-12-16-phase-0-implementation-report.md
+++ b/docs/design/v1.5.0/2025-12-16-phase-0-implementation-report.md
@@ -0,0 +1,120 @@
+# Phase 0 Implementation Report - Test Fixes
+
+**Date**: 2025-12-16
+**Developer**: Developer Agent
+**Phase**: v1.5.0 Phase 0 - Test Fixes
+**Status**: Complete
+
+## Overview
+
+Successfully implemented Phase 0 of v1.5.0 as specified in ADR-012 (Flaky Test Removal) and the v1.5.0 RELEASE.md. All test fixes completed and verified across 3 test runs with no flakiness detected.
+
+## Changes Made
+
+### 1. Removed 5 Broken Multiprocessing Tests
+
+**File**: `tests/test_migration_race_condition.py`
+
+Removed the following tests that fundamentally cannot work due to Python multiprocessing limitations:
+
+- `test_concurrent_workers_barrier_sync` - Cannot pickle Barrier objects for Pool.map()
+- `test_sequential_worker_startup` - Missing Flask app context across processes
+- `test_worker_late_arrival` - Missing Flask app context across processes
+- `test_single_worker_performance` - Cannot pickle local functions
+- `test_concurrent_workers_performance` - Cannot pickle local functions
+
+**Action Taken**:
+- Removed entire `TestConcurrentExecution` class (3 tests)
+- Removed entire `TestPerformance` class (2 tests)
+- Removed unused module-level worker functions (`_barrier_worker`, `_simple_worker`)
+- Removed unused imports (`time`, `multiprocessing`, `Barrier`)
+- Added explanatory comments documenting why tests were removed
+
+**Justification**: Per ADR-012, these tests have architectural issues that make them unreliable. The migration retry logic they attempt to test is proven to work in production with multi-worker Gunicorn deployments. The tests are the problem, not the code.
+
+### 2. Fixed Brittle Feed XML Assertions
+
+**File**: `tests/test_routes_feeds.py`
+
+Fixed assertions that were checking implementation details (quote style) rather than semantics (valid XML):
+
+**Changes**:
+- `test_feed_atom_endpoint`: Changed from checking `<?xml version="1.0"` to checking `<?xml version=` and `encoding=` separately
+- `test_feed_json_endpoint`: Changed Content-Type assertion from exact match to `.startswith('application/feed+json')` to not require charset specification
+- `test_accept_json_feed`: Same Content-Type fix as above
+- `test_accept_json_generic`: Same Content-Type fix as above
+- `test_quality_factor_json_wins`: Same Content-Type fix as above
+
+**Rationale**: XML generators may use single or double quotes. Tests should verify semantics (valid XML with correct encoding), not formatting details. Similarly, Content-Type may or may not include charset parameter depending on framework version.
+
+### 3. Fixed test_debug_level_for_early_retries
+
+**File**: `tests/test_migration_race_condition.py`
+
+**Issue**: Logger not configured to capture DEBUG level messages.
+
+**Fix**: Simplified logger configuration by:
+- Removing unnecessary `caplog.clear()` calls
+- Ensuring `caplog.at_level(logging.DEBUG, logger='starpunk.migrations')` wraps the actual test execution
+- Removing redundant clearing inside the context manager
+
+**Result**: Test now reliably captures DEBUG level log messages from the migrations module.
+
+### 4. Verified test_new_connection_per_retry
+
+**File**: `tests/test_migration_race_condition.py`
+
+**Finding**: This test is actually working correctly. It expects 10 connection attempts (retry_count 0-9) which matches the implementation (`while retry_count < max_retries` where `max_retries = 10`).
+
+**Action**: No changes needed. Test runs successfully and correctly verifies that a new connection is created for each retry attempt.
+
+### 5. Renamed Misleading Test
+
+**File**: `tests/test_routes_feed.py`
+
+**Change**: Renamed `test_feed_route_streaming` to `test_feed_route_caching`
+
+**Rationale**: The test name said "streaming" but the implementation actually uses caching (Phase 3 feed optimization). The test correctly verifies ETag presence, which is a caching feature. The name was misleading but the test logic was correct.
+
+## Test Results
+
+Ran full test suite 3 times to verify no flakiness:
+
+**Run 1**: 874 passed, 1 warning in 375.92s
+**Run 2**: 874 passed, 1 warning in 386.40s
+**Run 3**: 874 passed, 1 warning in 375.68s
+
+**Test Count**: Reduced from 879 to 874 (5 tests removed as planned)
+**Flakiness**: None detected across 3 runs
+**Warnings**: 1 expected warning about DecompressionBombWarning (intentional test of large image handling)
+
+## Acceptance Criteria
+
+| Criterion | Status | Evidence |
+|-----------|--------|----------|
+| All remaining tests pass consistently | ✓ | 3 successful test runs |
+| 5 broken tests removed | ✓ | Test count: 879 → 874 |
+| No new test skips added | ✓ | No `@pytest.mark.skip` added |
+| Test count reduced to 874 | ✓ | Verified in all 3 runs |
+
+## Files Modified
+
+- `/home/phil/Projects/starpunk/tests/test_migration_race_condition.py`
+- `/home/phil/Projects/starpunk/tests/test_routes_feeds.py`
+- `/home/phil/Projects/starpunk/tests/test_routes_feed.py`
+
+## Next Steps
+
+Phase 0 is complete and ready for architect review. Once approved:
+- Commit changes with reference to ADR-012
+- Proceed to Phase 1 (Timestamp-Based Slugs)
+
+## Notes
+
+The test suite is now more reliable and maintainable:
+- Removed tests that cannot work reliably due to Python limitations
+- Fixed tests that checked implementation details instead of behavior
+- Improved test isolation and logger configuration
+- Clearer test names that reflect actual behavior being tested
+
+All changes align with the project philosophy: "Every line of code must justify its existence." Tests that fail unreliably do not justify their existence.