Files

Phil Skentelbery 0acefa4670 docs: Update Phase 0 with specific test fix requirements

Per ADR-012, Phase 0 now specifies:
- 5 tests to REMOVE (broken multiprocessing)
- 4 tests to FIX (brittle assertions)
- 1 test to RENAME (misleading name)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-16 20:45:41 -07:00

7.8 KiB

Raw Blame History

ADR-012: Flaky Test Removal and Test Quality Standards

Status

Proposed

Context

The test suite contains several categories of flaky tests that pass inconsistently. These tests consume developer time without providing proportional value. Per the project philosophy ("Every line of code must justify its existence"), we must evaluate whether these tests should be kept, fixed, or removed.

Analysis by Test Category

1. Migration Race Condition Tests (`test_migration_race_condition.py`)

Failing Tests:

test_debug_level_for_early_retries - Log message matching
test_new_connection_per_retry - Connection count assertions
test_concurrent_workers_barrier_sync - Multiprocessing pickle errors
test_sequential_worker_startup - Missing table errors
test_worker_late_arrival - Missing table errors
test_single_worker_performance - Missing table errors
test_concurrent_workers_performance - Pickle errors

Value Analysis:

The migration retry logic with exponential backoff is critical for production deployments with multiple Gunicorn workers
However, the flaky tests are testing implementation details (log levels, exact connection counts) rather than behavior
The multiprocessing tests fundamentally cannot work reliably because:
1. multiprocessing.Manager().Barrier() objects cannot be pickled for Pool.map()
2. The worker functions require Flask app context that doesn't transfer across processes
3. SQLite database files in temp directories may not be accessible across process boundaries

Root Cause: Test design is flawed. These are attempting integration/stress tests using unit test infrastructure.

Recommendation: REMOVE the multiprocessing tests entirely. KEEP and FIX the unit tests.

Specifically:

REMOVE: TestConcurrentExecution class (all 3 tests) - fundamentally broken by design
REMOVE: TestPerformance class (both tests) - same multiprocessing issues
KEEP: TestRetryLogic - valuable, just needs mock fixes
KEEP: TestGraduatedLogging - valuable, needs logger configuration fixes
KEEP: TestConnectionManagement - valuable, needs assertion fixes
KEEP: TestErrorHandling - valuable, tests critical rollback behavior
KEEP: TestBeginImmediateTransaction - valuable, tests locking mechanism

Rationale for removal: If we need to test concurrent migration behavior, that requires:

A proper integration test framework (not pytest unit tests)
External process spawning (not multiprocessing.Pool)
Real filesystem isolation
This is out of scope for V1 - the code works; the tests are the problem

2. Feed Route Tests (`test_routes_feeds.py`)

Failing Assertions:

Tests checking for exact <?xml version="1.0" but code produces <?xml version='1.0' (single quotes)
Tests checking for exact Content-Type with charset but response may vary
Tests checking for exact <rss version="2.0" string

Value Analysis:

These tests ARE valuable - they verify feed output format
The tests are NOT flaky per se; they are brittle due to over-specific assertions

Root Cause: Tests are asserting implementation details (quote style) rather than semantics (valid XML).

Recommendation: FIX by loosening assertions

Current (brittle):

assert b'<?xml version="1.0"' in response.data

Better (semantic):

assert b'<?xml version=' in response.data
assert b"encoding=" in response.data

The test file already has SOME tests using the correct pattern (lines 72, 103, 265). The ATOM test on line 84 is the outlier - it should match the RSS tests.

3. Feed Streaming Test (`test_routes_feed.py`)

Failing Test: test_feed_route_streaming

Current assertion (line 124):

assert "ETag" in response.headers

But the test comment says:

# Cached responses include ETags for conditional requests
# (Phase 3 caching was added, replacing streaming for better performance)

Value Analysis:

The test title says "streaming" but the implementation uses caching
The test is actually correct (ETag SHOULD be present)
If ETag is NOT present, that's a bug in the feed caching implementation

Root Cause: This is not a flaky test - if it fails, there's an actual bug. The test name is misleading but the assertion is correct.

Recommendation: KEEP and RENAME to test_feed_route_caching

4. Search Security Tests (`test_search_security.py`)

Analysis of the file: After reviewing, I see no obviously flaky tests. All tests are:

Testing XSS prevention (correct)
Testing SQL injection prevention (correct)
Testing input validation (correct)
Testing pagination limits (correct)

Possible flakiness sources:

FTS5 special character handling varies by SQLite version
Tests that accept multiple status codes (200, 400, 500) are defensive, not flaky

Recommendation: KEEP all tests

If specific flakiness is identified, it's likely due to SQLite FTS5 version differences, which should be documented rather than the tests removed.

Decision

Remove Entirely

TestConcurrentExecution class from test_migration_race_condition.py
TestPerformance class from test_migration_race_condition.py

Fix Tests (Developer Action Items)

test_routes_feeds.py line 84: Change assert b'<?xml version="1.0"' to assert b'<?xml version='
test_routes_feed.py line 117: Rename test from test_feed_route_streaming to test_feed_route_caching
test_migration_race_condition.py: Fix logger configuration in TestGraduatedLogging tests to ensure DEBUG level is captured
test_migration_race_condition.py: Fix mock setup in test_new_connection_per_retry to accurately count connection attempts

Keep As-Is

All tests in test_search_security.py
All non-multiprocessing tests in test_migration_race_condition.py (after fixes)
All other tests in test_routes_feeds.py and test_routes_feed.py

Rationale

Project philosophy alignment: Tests that cannot reliably pass do not justify their existence. They waste developer time and erode confidence in the test suite.
Pragmatic approach: The migration concurrency code is tested in production by virtue of running with multiple Gunicorn workers. Manual testing during deployment is more reliable than broken multiprocessing tests.
Test semantics over implementation: Tests should verify behavior, not implementation details like quote styles in XML.
Maintainability: A smaller, reliable test suite is better than a larger, flaky one.

Consequences

Positive

Faster, more reliable CI/CD pipeline
Increased developer confidence in test results
Reduced time spent debugging test infrastructure
Tests that fail actually indicate bugs

Negative

Reduced test coverage for concurrent migration scenarios
Manual testing required for multi-worker deployments

Mitigations

Document the multi-worker testing procedure in deployment docs
Consider adding integration tests in a separate test category (not run in CI) for concurrent scenarios

Alternatives Considered

1. Fix the multiprocessing tests

Rejected: Would require significant refactoring to use subprocess spawning instead of multiprocessing.Pool. The complexity is not justified for V1 given the code works correctly in production.

2. Mark tests as `@pytest.mark.skip`

Rejected: Skipped tests are just noise. They either work and should run, or they don't work and should be removed. "Skip" is procrastination.

3. Use pytest-xdist for parallel testing

Rejected: Does not solve the fundamental issue of needing to spawn external processes with proper app context.

4. Move to integration test framework (e.g., testcontainers)

Considered for future: This is the correct long-term solution but is out of scope for V1. Should be considered for V2 if concurrent migration testing is deemed critical.

7.8 KiB Raw Blame History

ADR-012: Flaky Test Removal and Test Quality Standards

Status

Context

Analysis by Test Category

1. Migration Race Condition Tests (test_migration_race_condition.py)

2. Feed Route Tests (test_routes_feeds.py)

3. Feed Streaming Test (test_routes_feed.py)

4. Search Security Tests (test_search_security.py)

Decision

Remove Entirely

Fix Tests (Developer Action Items)

Keep As-Is

Rationale

Consequences

Positive

Negative

Mitigations

Alternatives Considered

1. Fix the multiprocessing tests

2. Mark tests as @pytest.mark.skip

3. Use pytest-xdist for parallel testing

4. Move to integration test framework (e.g., testcontainers)

7.8 KiB

Raw Blame History

1. Migration Race Condition Tests (`test_migration_race_condition.py`)

2. Feed Route Tests (`test_routes_feeds.py`)

3. Feed Streaming Test (`test_routes_feed.py`)

4. Search Security Tests (`test_search_security.py`)

2. Mark tests as `@pytest.mark.skip`