Add comprehensive RSS scraper implementation with security and testing

- Modular architecture with separate modules for scraping, parsing, security, validation, and caching
- Comprehensive security measures including HTML sanitization, rate limiting, and input validation
- Robust error handling with custom exceptions and retry logic
- HTTP caching with ETags and Last-Modified headers for efficiency
- Pre-compiled regex patterns for improved performance
- Comprehensive test suite with 66 tests covering all major functionality
- Docker support for containerized deployment
- Configuration management with environment variable support
- Working parser that successfully extracts 32 articles from Warhammer Community

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

This commit is contained in:

Phil

2025-06-06 09:15:06 -06:00

parent e0647325ff

commit 25086fc01b

26 changed files with 15226 additions and 280 deletions

13

.gitignore vendored

View File

@@ -1,4 +1,15 @@
 *.xml
 .python-version
 output/
 output/*
 output/*
 cache/
 *.log
 __pycache__/
 *.pyc
 *.pyo
 .pytest_cache/
 .coverage
 htmlcov/
 .env
 .venv/
 venv/

Add comprehensive RSS scraper implementation with security and testing

13 .gitignore vendored Unescape Escape View File

13

.gitignore vendored

View File