ansible/todo.md
Phil 8ca2122cb3 add: comprehensive infrastructure improvement roadmap
Document prioritized improvements for Ansible infrastructure including:
- Docker role reorganization into logical service groups
- Variable management standardization
- Security hardening and backup strategies
- CI/CD automation opportunities
- Network segmentation and monitoring enhancements

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-06 11:46:07 -06:00

3.7 KiB

Infrastructure Improvements TODO

High Priority (Quick Wins)

1. Split the massive docker role ⚠️ IN PROGRESS

  • Current Issue: roles/docker/tasks/main.yml has 20+ services in one file (176 lines)
  • Solution: Break into logical service groups:
    roles/docker/tasks/
    ├── main.yml (orchestrator)
    ├── infrastructure/ (caddy, authentik, dockge)
    ├── development/ (gitea, codeserver, conduit)
    ├── media/ (audiobookshelf, calibre, ghost, pinchflat)
    ├── productivity/ (paperless, baikal, syncthing, tasksmd)
    └── monitoring/ (glance, changedetection, appriseapi)
    

2. Standardize variable management

  • Current Issue: Secrets in single encrypted file, no clear variable hierarchy
  • Solution: Create proper variable structure:
    group_vars/
    ├── all/
    │   ├── common.yml (shared config)
    │   └── secrets.yml (vault encrypted)
    ├── docker/
    │   ├── services.yml (service configs)
    │   └── networking.yml (network settings)
    

3. Template consolidation

  • Current Issue: Many compose templates repeat patterns
  • Solution: Create reusable template includes with standard service template structure

Security & Reliability

4. Add health checks

  • Issue: Most services lack proper healthcheck configurations in compose templates
  • Solution: Implement comprehensive health monitoring with standardized healthcheck patterns

5. Implement backup strategy

  • Issue: No automated backups for 25+ services and their data
  • Solution: Add backup role with:
    • Database dumps for PostgreSQL services
    • Volume backups for file-based services
    • Rotation policies
    • Restoration testing

6. Network segmentation

  • Issue: All services share one Docker network
  • Solution: Separate into:
    • frontend (Public-facing services)
    • backend (Internal services only)
    • database (Database access only)

7. Security hardening

  • Remove unnecessary user: root from services
  • Add security contexts to all containers
  • Implement least-privilege access patterns
  • Add fail2ban for authentication services

Automation Opportunities

8. CI/CD with Gitea Actions

  • Leverage self-hosted Gitea for:
    • Ansible syntax validation
    • Service configuration testing
    • Automated deployment triggers
    • Rollback capabilities

9. Configuration drift detection

  • Add validation tasks to catch manual changes
  • Implement configuration validation with proper assertions

10. Service dependency management

  • Issue: Some services depend on Authentik SSO but no startup ordering
  • Solution: Implement dependency checking and startup ordering

11. Ansible best practices

  • Replace deprecated apt_key with proper patterns
  • Use ansible.builtin FQCN consistently
  • Add check_mode support
  • Implement proper idempotency checks

12. Documentation automation

  • Auto-generate service inventory
  • Create service documentation templates
  • Implement automated documentation updates

Implementation Roadmap

Week 1: Foundation

  • Document improvements in todo.md
  • Reorganize docker role structure
  • Implement variable hierarchy
  • Standardize templates

Week 2: Security & Monitoring

  • Add health checks
  • Implement backup strategy
  • Security hardening

Week 3: Automation

  • CI/CD pipeline setup
  • Configuration validation
  • Documentation automation

Week 4: Advanced Features

  • Network segmentation
  • Dependency management
  • Monitoring dashboard

Notes

  • Current architecture is solid but needs better organization for long-term maintainability
  • Focus on high-impact, low-effort improvements first
  • Leverage existing infrastructure (Gitea, Authentik) for automation