# Infrastructure Improvements TODO ## High Priority (Quick Wins) ### 1. Split the massive docker role ⚠️ IN PROGRESS - **Current Issue**: `roles/docker/tasks/main.yml` has 20+ services in one file (176 lines) - **Solution**: Break into logical service groups: ``` roles/docker/tasks/ ├── main.yml (orchestrator) ├── infrastructure/ (caddy, authentik, dockge) ├── development/ (gitea, codeserver, conduit) ├── media/ (audiobookshelf, calibre, ghost, pinchflat) ├── productivity/ (paperless, baikal, syncthing, tasksmd) └── monitoring/ (glance, changedetection, appriseapi) ``` ### 2. Standardize variable management - **Current Issue**: Secrets in single encrypted file, no clear variable hierarchy - **Solution**: Create proper variable structure: ``` group_vars/ ├── all/ │ ├── common.yml (shared config) │ └── secrets.yml (vault encrypted) ├── docker/ │ ├── services.yml (service configs) │ └── networking.yml (network settings) ``` ### 3. Template consolidation - **Current Issue**: Many compose templates repeat patterns - **Solution**: Create reusable template includes with standard service template structure ## Security & Reliability ### 4. Add health checks - **Issue**: Most services lack proper healthcheck configurations in compose templates - **Solution**: Implement comprehensive health monitoring with standardized healthcheck patterns ### 5. Implement backup strategy - **Issue**: No automated backups for 25+ services and their data - **Solution**: Add backup role with: - Database dumps for PostgreSQL services - Volume backups for file-based services - Rotation policies - Restoration testing ### 6. Network segmentation - **Issue**: All services share one Docker network - **Solution**: Separate into: - `frontend` (Public-facing services) - `backend` (Internal services only) - `database` (Database access only) ### 7. Security hardening - Remove unnecessary `user: root` from services - Add security contexts to all containers - Implement least-privilege access patterns - Add fail2ban for authentication services ## Automation Opportunities ### 8. CI/CD with Gitea Actions - Leverage self-hosted Gitea for: - Ansible syntax validation - Service configuration testing - Automated deployment triggers - Rollback capabilities ### 9. Configuration drift detection - Add validation tasks to catch manual changes - Implement configuration validation with proper assertions ### 10. Service dependency management - **Issue**: Some services depend on Authentik SSO but no startup ordering - **Solution**: Implement dependency checking and startup ordering ### 11. Ansible best practices - Replace deprecated `apt_key` with proper patterns - Use `ansible.builtin` FQCN consistently - Add `check_mode` support - Implement proper idempotency checks ### 12. Documentation automation - Auto-generate service inventory - Create service documentation templates - Implement automated documentation updates ## Implementation Roadmap ### Week 1: Foundation - [x] Document improvements in todo.md - [ ] Reorganize docker role structure - [ ] Implement variable hierarchy - [ ] Standardize templates ### Week 2: Security & Monitoring - [ ] Add health checks - [ ] Implement backup strategy - [ ] Security hardening ### Week 3: Automation - [ ] CI/CD pipeline setup - [ ] Configuration validation - [ ] Documentation automation ### Week 4: Advanced Features - [ ] Network segmentation - [ ] Dependency management - [ ] Monitoring dashboard ## Notes - Current architecture is solid but needs better organization for long-term maintainability - Focus on high-impact, low-effort improvements first - Leverage existing infrastructure (Gitea, Authentik) for automation