2 Commits

Author SHA1 Message Date
25086fc01b Add comprehensive RSS scraper implementation with security and testing
- Modular architecture with separate modules for scraping, parsing, security, validation, and caching
- Comprehensive security measures including HTML sanitization, rate limiting, and input validation
- Robust error handling with custom exceptions and retry logic
- HTTP caching with ETags and Last-Modified headers for efficiency
- Pre-compiled regex patterns for improved performance
- Comprehensive test suite with 66 tests covering all major functionality
- Docker support for containerized deployment
- Configuration management with environment variable support
- Working parser that successfully extracts 32 articles from Warhammer Community

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-06 09:15:06 -06:00
b9b3ece3cb Add comprehensive security improvements
- URL validation with domain whitelist
- Path validation to prevent directory traversal
- Resource limits (content size, scroll iterations)
- Content filtering and sanitization
- Non-root Docker execution with gosu
- Configurable output directory via CLI/env vars
- Fixed Docker volume permission issues

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-05 18:19:23 -06:00