Add comprehensive RSS scraper implementation with security and testing
- Modular architecture with separate modules for scraping, parsing, security, validation, and caching - Comprehensive security measures including HTML sanitization, rate limiting, and input validation - Robust error handling with custom exceptions and retry logic - HTTP caching with ETags and Last-Modified headers for efficiency - Pre-compiled regex patterns for improved performance - Comprehensive test suite with 66 tests covering all major functionality - Docker support for containerized deployment - Configuration management with environment variable support - Working parser that successfully extracts 32 articles from Warhammer Community 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@ -59,9 +59,10 @@ RUN useradd -m -u 1001 scraper && \
|
||||
chown -R scraper:scraper /app && \
|
||||
chmod 755 /app/output
|
||||
|
||||
# Copy the Python script to the container
|
||||
# Copy the application code to the container
|
||||
COPY main.py .
|
||||
RUN chown scraper:scraper main.py
|
||||
COPY src/ src/
|
||||
RUN chown -R scraper:scraper main.py src/
|
||||
|
||||
# Set environment variables
|
||||
ENV PYTHONUNBUFFERED=1 \
|
||||
|
Reference in New Issue
Block a user