Add comprehensive RSS scraper implementation with security and testing

- Modular architecture with separate modules for scraping, parsing, security, validation, and caching
- Comprehensive security measures including HTML sanitization, rate limiting, and input validation
- Robust error handling with custom exceptions and retry logic
- HTTP caching with ETags and Last-Modified headers for efficiency
- Pre-compiled regex patterns for improved performance
- Comprehensive test suite with 66 tests covering all major functionality
- Docker support for containerized deployment
- Configuration management with environment variable support
- Working parser that successfully extracts 32 articles from Warhammer Community

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-06-06 09:15:06 -06:00
parent e0647325ff
commit 25086fc01b
26 changed files with 15226 additions and 280 deletions

View File

@ -9,7 +9,7 @@
<link data-precedence="next" href="/_next/static/css/99b9ab492a345c78.css?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj" rel="stylesheet"/>
<link data-precedence="next" href="/_next/static/css/15a460f12b20e8b4.css?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj" rel="stylesheet"/>
<link as="script" fetchpriority="low" href="/_next/static/chunks/webpack-d5efc487f7f67811.js?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj" rel="preload"/>
<script async="" src="https://www.googletagmanager.com/gtag/destination?id=G-HM9PKBFCFY&amp;cx=c&amp;gtm=45He5640v72768202za200&amp;tag_exp=101509157~103116026~103200004~103233427~103351869~103351871~104611962~104611964~104661466~104661468" type="text/javascript">
<script async="" src="https://www.googletagmanager.com/gtag/destination?id=G-HM9PKBFCFY&amp;cx=c&amp;gtm=45He5641v72768202za200&amp;tag_exp=101509157~103116026~103200004~103233427~103351869~103351871~104653070~104653072~104661466~104661468~104684204~104684207~104698127~104698129" type="text/javascript">
</script>
<script async="" src="/_next/static/chunks/fd9d1056-64913cbc7addc806.js?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj">
</script>
@ -19,18 +19,10 @@
</script>
<script async="" src="/_next/static/chunks/c15bf2b0-c5f2ab0c4ce668d5.js?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj">
</script>
<script async="" src="/_next/static/chunks/649-9158635ee991b097.js?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj">
</script>
<script async="" src="/_next/static/chunks/625-9352517478916afd.js?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj">
</script>
<script async="" src="/_next/static/chunks/278-0091e2641e681026.js?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj">
</script>
<script async="" src="/_next/static/chunks/359-66334f7447a76641.js?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj">
</script>
<script async="" src="/_next/static/chunks/196-b74dfc870faa7d5b.js?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj">
</script>
<script async="" src="/_next/static/chunks/727-0e04a61311082930.js?dpl=dpl_A9R9uLRvxZ9oTaZvjxhFqWghD9Jj">
</script>
<link as="script" href="https://cookie-cdn.cookiepro.com/consent/49151afa-644f-4c33-be99-a15807d544d0/OtAutoBlock.js" rel="preload"/>
<link as="script" href="https://cookie-cdn.cookiepro.com/scripttemplates/otSDKStub.js" rel="preload"/>
<link as="script" href="https://www.googletagmanager.com/gtm.js?id=GTM-TZ8HGH" rel="preload"/>
@ -1716,7 +1708,7 @@ background-color: inherit;
<div class="row mb-10 xl:mb-15">
<div class="column relative flex items-center">
<div class="swiper swiper-initialized swiper-horizontal md-max:!overflow-visible">
<div aria-live="polite" class="swiper-wrapper" id="swiper-wrapper-b0b6610f82f4be76c" style="transform: translate3d(0px, 0px, 0px);">
<div aria-live="polite" class="swiper-wrapper" id="swiper-wrapper-498655a58810be47c" style="transform: translate3d(0px, 0px, 0px);">
<div aria-label="1 / 14" class="swiper-slide swiper-slide-active !w-[107px] md:!w-115 xl:!w-120 !h-auto !flex items-center" role="group">
<a class="group p-[6px] focus-visible:border outline-none rounded-[5px]" href="/en-gb/setting/warhammer-40000/" title="Warhammer 40,000">
<img alt="" class="object-contain aspect-[202/81] default-transition group-hover:scale-[0.9]" data-nimg="1" decoding="async" height="324" loading="lazy" src="https://assets.warhammer-community.com/warhammer40000.png" style="color:transparent;object-position:50% 50%" width="808"/>
@ -3577,7 +3569,7 @@ background-color: inherit;
</div>
<div class="row mt-15">
<div class="swiper swiper-initialized swiper-horizontal !overflow-visible">
<div aria-live="polite" class="swiper-wrapper" id="swiper-wrapper-7958f24f101fa5d5e" style="transform: translate3d(0px, 0px, 0px);">
<div aria-live="polite" class="swiper-wrapper" id="swiper-wrapper-0f8d52c31f9266b6" style="transform: translate3d(0px, 0px, 0px);">
<div aria-label="1 / 15" class="swiper-slide swiper-slide-active column !w-5/12 md:!w-4/12 xl:!w-3/12 copy-bitter font-[600] xl:heading-bitter-sm" role="group">
<div class="shared-gameSystemCard w-full group relative undefined">
<div class="relative aspect-[262/353] rounded-[8px] overflow-hidden border border-transparent group-hover:border-stormcastYellow">
@ -5410,7 +5402,7 @@ background-color: inherit;
<div class="row mb-10 xl:mb-15">
<div class="column relative flex items-center">
<div class="swiper swiper-initialized swiper-horizontal md-max:!overflow-visible">
<div aria-live="polite" class="swiper-wrapper" id="swiper-wrapper-ddd96962b22d0ed1" style="transform: translate3d(0px, 0px, 0px);">
<div aria-live="polite" class="swiper-wrapper" id="swiper-wrapper-e26dea4415052697" style="transform: translate3d(0px, 0px, 0px);">
<div aria-label="1 / 13" class="swiper-slide swiper-slide-active !w-[107px] md:!w-115 xl:!w-120 !h-auto !flex items-center" role="group">
<a class="group p-[6px] focus-visible:border outline-none rounded-[5px]" href="/en-gb/setting/warhammer-40000/" title="Warhammer 40,000">
<img alt="" class="object-contain aspect-[202/81] default-transition group-hover:scale-[0.9]" data-nimg="1" decoding="async" height="324" loading="lazy" src="https://assets.warhammer-community.com/warhammer40000.png" style="color:transparent;object-position:50% 50%" width="808"/>