Add comprehensive RSS scraper implementation with security and testing

- Modular architecture with separate modules for scraping, parsing, security, validation, and caching - Comprehensive security measures including HTML sanitization, rate limiting, and input validation - Robust error handling with custom exceptions and retry logic - HTTP caching with ETags and Last-Modified headers for efficiency - Pre-compiled regex patterns for improved performance - Comprehensive test suite with 66 tests covering all major functionality - Docker support for containerized deployment - Configuration management with environment variable support - Working parser that successfully extracts 32 articles from Warhammer Community 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-06 09:15:06 -06:00
parent e0647325ff
commit 25086fc01b
26 changed files with 15226 additions and 280 deletions
--- a/5
+++ b/5
@@ -59,9 +59,10 @@ RUN useradd -m -u 1001 scraper && \
    chown -R scraper:scraper /app && \
    chmod 755 /app/output

-# Copy the Python script to the container
+# Copy the application code to the container
 COPY main.py .
-RUN chown scraper:scraper main.py
+COPY src/ src/
+RUN chown -R scraper:scraper main.py src/

 # Set environment variables
 ENV PYTHONUNBUFFERED=1 \