add: comprehensive infrastructure improvement roadmap

Document prioritized improvements for Ansible infrastructure including:
- Docker role reorganization into logical service groups
- Variable management standardization
- Security hardening and backup strategies
- CI/CD automation opportunities
- Network segmentation and monitoring enhancements

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Phil 2025-06-06 11:46:07 -06:00
parent ccab665d26
commit 8ca2122cb3

116
todo.md Normal file
View File

@ -0,0 +1,116 @@
# Infrastructure Improvements TODO
## High Priority (Quick Wins)
### 1. Split the massive docker role ⚠️ IN PROGRESS
- **Current Issue**: `roles/docker/tasks/main.yml` has 20+ services in one file (176 lines)
- **Solution**: Break into logical service groups:
```
roles/docker/tasks/
├── main.yml (orchestrator)
├── infrastructure/ (caddy, authentik, dockge)
├── development/ (gitea, codeserver, conduit)
├── media/ (audiobookshelf, calibre, ghost, pinchflat)
├── productivity/ (paperless, baikal, syncthing, tasksmd)
└── monitoring/ (glance, changedetection, appriseapi)
```
### 2. Standardize variable management
- **Current Issue**: Secrets in single encrypted file, no clear variable hierarchy
- **Solution**: Create proper variable structure:
```
group_vars/
├── all/
│ ├── common.yml (shared config)
│ └── secrets.yml (vault encrypted)
├── docker/
│ ├── services.yml (service configs)
│ └── networking.yml (network settings)
```
### 3. Template consolidation
- **Current Issue**: Many compose templates repeat patterns
- **Solution**: Create reusable template includes with standard service template structure
## Security & Reliability
### 4. Add health checks
- **Issue**: Most services lack proper healthcheck configurations in compose templates
- **Solution**: Implement comprehensive health monitoring with standardized healthcheck patterns
### 5. Implement backup strategy
- **Issue**: No automated backups for 25+ services and their data
- **Solution**: Add backup role with:
- Database dumps for PostgreSQL services
- Volume backups for file-based services
- Rotation policies
- Restoration testing
### 6. Network segmentation
- **Issue**: All services share one Docker network
- **Solution**: Separate into:
- `frontend` (Public-facing services)
- `backend` (Internal services only)
- `database` (Database access only)
### 7. Security hardening
- Remove unnecessary `user: root` from services
- Add security contexts to all containers
- Implement least-privilege access patterns
- Add fail2ban for authentication services
## Automation Opportunities
### 8. CI/CD with Gitea Actions
- Leverage self-hosted Gitea for:
- Ansible syntax validation
- Service configuration testing
- Automated deployment triggers
- Rollback capabilities
### 9. Configuration drift detection
- Add validation tasks to catch manual changes
- Implement configuration validation with proper assertions
### 10. Service dependency management
- **Issue**: Some services depend on Authentik SSO but no startup ordering
- **Solution**: Implement dependency checking and startup ordering
### 11. Ansible best practices
- Replace deprecated `apt_key` with proper patterns
- Use `ansible.builtin` FQCN consistently
- Add `check_mode` support
- Implement proper idempotency checks
### 12. Documentation automation
- Auto-generate service inventory
- Create service documentation templates
- Implement automated documentation updates
## Implementation Roadmap
### Week 1: Foundation
- [x] Document improvements in todo.md
- [ ] Reorganize docker role structure
- [ ] Implement variable hierarchy
- [ ] Standardize templates
### Week 2: Security & Monitoring
- [ ] Add health checks
- [ ] Implement backup strategy
- [ ] Security hardening
### Week 3: Automation
- [ ] CI/CD pipeline setup
- [ ] Configuration validation
- [ ] Documentation automation
### Week 4: Advanced Features
- [ ] Network segmentation
- [ ] Dependency management
- [ ] Monitoring dashboard
## Notes
- Current architecture is solid but needs better organization for long-term maintainability
- Focus on high-impact, low-effort improvements first
- Leverage existing infrastructure (Gitea, Authentik) for automation