# Infrastructure Improvements TODO ## High Priority (Quick Wins) ### 1. Split the massive docker role ✅ COMPLETED - **Current Issue**: `roles/docker/tasks/main.yml` has 20+ services in one file (176 lines) - **Solution**: Break into logical service groups: ``` roles/docker/tasks/ ├── main.yml (orchestrator) ├── infrastructure/ (caddy, authentik, dockge) ├── development/ (gitea, codeserver, conduit) ├── media/ (audiobookshelf, calibre, ghost, pinchflat, pinry, hoarder, manyfold) ├── productivity/ (paperless, baikal, syncthing, mmdl, heyform, dawarich, pingvin) ├── communication/ (gotosocial, postiz) └── monitoring/ (glance, changedetection, appriseapi) ``` - **COMPLETED**: All services organized into logical categories with category-level tags ### 2. Standardize variable management - **Current Issue**: Secrets in single encrypted file, no clear variable hierarchy - **Solution**: Create proper variable structure: ``` group_vars/ ├── all/ │ ├── common.yml (shared config) │ └── secrets.yml (vault encrypted) ├── docker/ │ ├── services.yml (service configs) │ └── networking.yml (network settings) ``` ### 3. Template consolidation ✅ PARTIALLY COMPLETED - **Current Issue**: Many compose templates repeat patterns, some services used static files - **Solution**: Create reusable template includes with standard service template structure - **COMPLETED**: Converted all static compose files (caddy, dockge, hoarder) to Jinja2 templates - **REMAINING**: Create reusable template patterns for common configurations ## Security & Reliability ### 4. Add health checks - **Issue**: Most services lack proper healthcheck configurations in compose templates - **Solution**: Implement comprehensive health monitoring with standardized healthcheck patterns ### 5. Implement backup strategy - **Issue**: No automated backups for 25+ services and their data - **Solution**: Add backup role with: - Database dumps for PostgreSQL services - Volume backups for file-based services - Rotation policies - Restoration testing ### 6. Network segmentation - **Issue**: All services share one Docker network - **Solution**: Separate into: - `frontend` (Public-facing services) - `backend` (Internal services only) - `database` (Database access only) ### 7. Security hardening - Remove unnecessary `user: root` from services - Add security contexts to all containers - Implement least-privilege access patterns - Add fail2ban for authentication services ## Automation Opportunities ### 8. CI/CD with Gitea Actions - Leverage self-hosted Gitea for: - Ansible syntax validation - Service configuration testing - Automated deployment triggers - Rollback capabilities ### 9. Configuration drift detection - Add validation tasks to catch manual changes - Implement configuration validation with proper assertions ### 10. Service dependency management - **Issue**: Some services depend on Authentik SSO but no startup ordering - **Solution**: Implement dependency checking and startup ordering ### 11. Ansible best practices - Replace deprecated `apt_key` with proper patterns - Use `ansible.builtin` FQCN consistently - Add `check_mode` support - Implement proper idempotency checks ### 12. Documentation automation - Auto-generate service inventory - Create service documentation templates - Implement automated documentation updates ## Implementation Roadmap ### Week 1: Foundation - [x] Document improvements in todo.md - [x] Reorganize docker role structure - [x] Convert static compose files to templates - [x] Remove unused services (beaver, grist, stirlingpdf, tasksmd, redlib) - [x] Clean up templates and files directories - [ ] Implement variable hierarchy - [ ] Create reusable template patterns ### Week 2: Security & Monitoring - [ ] Add health checks - [ ] Implement backup strategy - [ ] Security hardening ### Week 3: Automation - [ ] CI/CD pipeline setup - [ ] Configuration validation - [ ] Documentation automation ### Week 4: Advanced Features - [ ] Network segmentation - [ ] Dependency management - [ ] Monitoring dashboard ## Completed Work Summary ### ✅ Major Accomplishments - **Docker Role Reorganization**: Split monolithic 176-line main.yml into 6 logical service categories - **Template Standardization**: Converted all static compose files to Jinja2 templates - **Service Cleanup**: Removed 5 unused/broken services (beaver, grist, stirlingpdf, tasksmd, redlib) - **Category-Based Deployment**: Can now deploy services by category using tags (infrastructure, media, etc.) - **Documentation Updates**: Updated CLAUDE.md to reflect new architecture ### 📊 Current Stats - **22+ active services** organized into 6 categories - **100% templated** compose files (no static files) - **6 service directories** for logical organization - **Clean file structure** with only essential static files ## Notes - Current architecture is solid and much better organized for long-term maintainability - Focus on high-impact, low-effort improvements first - Leverage existing infrastructure (Gitea, Authentik) for automation - Template-driven approach enables future dynamic configuration