4 Commits

Author SHA1 Message Date
25086fc01b Add comprehensive RSS scraper implementation with security and testing
- Modular architecture with separate modules for scraping, parsing, security, validation, and caching
- Comprehensive security measures including HTML sanitization, rate limiting, and input validation
- Robust error handling with custom exceptions and retry logic
- HTTP caching with ETags and Last-Modified headers for efficiency
- Pre-compiled regex patterns for improved performance
- Comprehensive test suite with 66 tests covering all major functionality
- Docker support for containerized deployment
- Configuration management with environment variable support
- Working parser that successfully extracts 32 articles from Warhammer Community

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-06 09:15:06 -06:00
b9b3ece3cb Add comprehensive security improvements
- URL validation with domain whitelist
- Path validation to prevent directory traversal
- Resource limits (content size, scroll iterations)
- Content filtering and sanitization
- Non-root Docker execution with gosu
- Configurable output directory via CLI/env vars
- Fixed Docker volume permission issues

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-05 18:19:23 -06:00
eecee074e2 added Dockerfile for container build 2024-10-08 13:55:13 -06:00
3d44106f02 initial commit 2024-10-03 12:34:27 -06:00