# Docker Swarm Stack Files - Review & Recommendations ## Overview Reviewed 9 Docker Swarm stack files totaling ~24KB of configuration. Found **critical security issues**, configuration inconsistencies, and optimization opportunities. --- ## 🔴 Critical Issues ### 1. **Hardcoded Secrets in Plain Text** **Files Affected**: [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml), [`monitoring-stack.yml`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml) **Problems**: ```yaml # Line 96: Paperless DB password in plain text - PAPERLESS_DBPASS=paperless # Line 98: Hardcoded secret key - PAPERLESS_SECRET_KEY=change-me-please-to-something-secure # Line 52: Grafana admin password exposed - GF_SECURITY_ADMIN_PASSWORD=change-me-please ``` **Risk**: Anyone with access to the repo can see credentials. These will be in Docker configs and logs. **Fix**: Use Docker secrets: ```yaml secrets: paperless_db_password: external: true paperless_secret_key: external: true grafana_admin_password: external: true services: paperless: secrets: - paperless_db_password - paperless_secret_key environment: - PAPERLESS_DBPASS_FILE=/run/secrets/paperless_db_password - PAPERLESS_SECRET_KEY_FILE=/run/secrets/paperless_secret_key ``` ### 2. **Missing Health Checks** **Files Affected**: All stack files **Problem**: No services have health checks configured, meaning: - Swarm can't detect unhealthy containers - Auto-restart won't work properly - Load balancers may route to failing instances **Fix**: Add health checks to critical services: ```yaml services: paperless: healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 start_period: 60s ``` ### 3. **Incorrect node-exporter Command** **File**: [`monitoring-stack.yml:111-114`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml#L111-L114) **Problem**: ```yaml command: - '--config.file=/etc/prometheus/prometheus.yml' # Wrong! This is for Prometheus - '--storage.tsdb.path=/prometheus' # Wrong! ``` **Fix**: ```yaml command: - '--path.procfs=/host/proc' - '--path.rootfs=/rootfs' - '--path.sysfs=/host/sys' - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)' ``` --- ## ⚠️ High-Priority Warnings ### 4. **Missing Networks on Database Services** **File**: [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml) **Problem**: `paperless-db` (line 70) doesn't have a network defined, but Paperless tries to connect to it. **Fix**: ```yaml paperless-db: networks: - homelab-backend # Add this ``` ### 5. **Resource Limits Too High for Pi Zero** **File**: [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml) **Problem**: Services with `node.labels.leader == true` (Pi 4) have resource limits that may be too high: - Paperless: 2GB memory (Pi 4 has 8GB total) - Stirling-PDF: 2GB memory - SearXNG: 2GB memory - Combined: 6GB+ on one node **Fix**: Reduce limits or spread services across nodes: ```yaml deploy: placement: constraints: - node.labels.leader == true - node.memory.available > 2G # Add memory check ``` ### 6. **Duplicate Portainer Definitions** **Files**: [`portainer-stack.yml`](file:///workspace/homelab/services/swarm/stacks/portainer-stack.yml) vs [`tools-stack.yml`](file:///workspace/homelab/services/swarm/stacks/tools-stack.yml) **Problem**: Portainer is defined in both files with different configurations: - `portainer-stack.yml`: Uses agent mode with global agents - `tools-stack.yml`: Uses socket mode (simpler but less scalable) **Fix**: Pick one approach and remove the duplicate. ### 7. **Missing Traefik Network Declaration** **File**: [`monitoring-stack.yml:38-44`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml#L38-L44) **Problem**: Prometheus has Traefik labels but isn't on the `traefik-public` network. **Fix**: ```yaml prometheus: networks: - monitoring - traefik-public # Add this ``` --- ## 🟡 Medium-Priority Improvements ### 8. **Missing Restart Policies** **Files Affected**: Most services **Problem**: Only Portainer has restart policies. Other services will fail permanently on error. **Fix**: Add to all services: ```yaml deploy: restart_policy: condition: on-failure delay: 5s max_attempts: 3 ``` ### 9. **Watchtower Interval Too Frequent** **File**: [`full-stack-complete.yml:191`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml#L191) **Problem**: `--interval 300` = check every 5 minutes (too frequent) **Fix**: Change to hourly or daily: ```yaml command: --cleanup --interval 86400 # Daily ``` ### 10. **Missing Logging Configuration** **Files Affected**: All **Problem**: No log driver or limits configured. Logs can fill disk. **Fix**: ```yaml deploy: logging: driver: "json-file" options: max-size: "10m" max-file: "3" ``` ### 11. **Version 3.9 is Deprecated** **Files Affected**: All **Problem**: Docker Compose v3.9 is deprecated. Should use Compose Specification (no version field) or v3.8. **Fix**: Remove version line or use `version: '3.8'` --- ## 🟢 Best Practice Recommendations ### 12. **Add Update Configs** **Benefit**: Zero-downtime deployments ```yaml deploy: update_config: parallelism: 1 delay: 10s failure_action: rollback order: start-first ``` ### 13. **Use Specific Image Tags** **Files Affected**: Services using `:latest` **Current**: ```yaml image: portainer/portainer-ce:latest image: searxng/searxng:latest ``` **Better**: ```yaml image: portainer/portainer-ce:2.33.4 image: searxng/searxng:2024.11.20 ``` **Good tags already used**: `full-stack-complete.yml` has several pinned versions ✓ ### 14. **Add Labels for Documentation** **Benefit**: Self-documenting infrastructure ```yaml deploy: labels: - "com.homelab.description=Paperless document management" - "com.homelab.maintainer=@sj98" - "com.homelab.version=2.19.3" ``` ### 15. **Separate Configs from Stacks** **Problem**: Mixing config and stack definitions **Current**: Prometheus config is external (good!) **Recommendation**: Do the same for Traefik, Alertmanager configs ### 16. **Add Dependency Ordering** **Current**: Some services use `depends_on` (good!) **Problem**: Not all services that need it have it ```yaml paperless: depends_on: - paperless-redis - paperless-db ``` --- ## 📋 Detailed File-by-File Analysis ### [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml) **Good**: - ✅ Proper network segmentation (traefik-public vs homelab-backend) - ✅ Resource limits defined - ✅ Node placement constraints - ✅ Specific image tags for most services **Issues**: - 🔴 Hardcoded passwords (lines 96, 98) - 🔴 No health checks - ⚠️ paperless-db missing network - ⚠️ Resource limits may be too high for Pi 4 **Score**: 6/10 --- ### [`monitoring-stack.yml`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml) **Good**: - ✅ Proper monitoring network - ✅ External configs for Prometheus - ✅ Resource limits **Issues**: - 🔴 Hardcoded Grafana password (line 52) - 🔴 node-exporter has wrong command (lines 111-114) - ⚠️ Prometheus missing traefik-public network - ⚠️ No health checks **Score**: 5/10 --- ### [`networking-stack.yml`](file:///workspace/homelab/services/swarm/stacks/networking-stack.yml) **Good**: - ✅ Uses secrets for DuckDNS token - ✅ External volume for Let's Encrypt - ✅ Proper network attachment **Issues**: - ⚠️ Traefik single replica (should be 2+ for HA) - ⚠️ No health check - ⚠️ whoami resource limits too strict **Score**: 7/10 --- ### [`portainer-stack.yml`](file:///workspace/homelab/services/swarm/stacks/portainer-stack.yml) **Good**: - ✅ Has restart policies! - ✅ Supports both Windows and Linux agents - ✅ Proper network setup **Issues**: - ⚠️ Duplicate of tools-stack.yml Portainer - ⚠️ No health check **Score**: 7/10 --- ### [`tools-stack.yml`](file:///workspace/homelab/services/swarm/stacks/tools-stack.yml) **Good**: - ✅ All tools on manager node (correct) - ✅ Resource limits defined **Issues**: - ⚠️ Duplicate Portainer definition - ⚠️ lazydocker needs TTY, won't work in Swarm - ⚠️ No restart policies **Score**: 6/10 --- ### [`node-exporter-stack.yml`](file:///workspace/homelab/services/swarm/stacks/node-exporter-stack.yml) **Content** (created by us): ```yaml version: '3.8' services: node-exporter: image: prom/node-exporter:latest command: - '--path.rootfs=/host' volumes: - '/:/host:ro,rslave' deploy: mode: global ``` **Good**: - ✅ Global mode (runs on all nodes) - ✅ Read-only host mount **Issues**: - ⚠️ Uses `:latest` tag - ⚠️ No resource limits - ⚠️ No health check **Score**: 6/10 --- ## 🛠️ Recommended Action Plan ### Phase 1: Critical Security (Do Immediately) 1. ✅ Create Docker secrets for all passwords 2. ✅ Update stack files to use secrets 3. ✅ Fix node-exporter command 4. ✅ Add missing network to paperless-db ### Phase 2: Stability (Do This Week) 1. ⏭️ Add health checks to all services 2. ⏭️ Add restart policies 3. ⏭️ Fix Prometheus network 4. ⏭️ Remove duplicate Portainer ### Phase 3: Optimization (Do This Month) 1. ⏭️ Update all `:latest` tags to specific versions 2. ⏭️ Add update configs 3. ⏭️ Configure logging limits 4. ⏭️ Review resource limits ### Phase 4: Best Practices (Ongoing) 1. ⏭️ Add documentation labels 2. ⏭️ Separate configs from stacks 3. ⏭️ Set up monitoring alerts for service health --- ## 🎯 Summary Scores | Stack File | Security | Stability | Best Practices | Overall | |-----------|----------|-----------|----------------|---------| | full-stack-complete.yml | 3/10 | 6/10 | 7/10 | **6/10** | | monitoring-stack.yml | 4/10 | 5/10 | 6/10 | **5/10** | | networking-stack.yml | 8/10 | 6/10 | 7/10 | **7/10** | | portainer-stack.yml | 7/10 | 7/10 | 7/10 | **7/10** | | tools-stack.yml | 7/10 | 5/10 | 6/10 | **6/10** | | node-exporter-stack.yml | 7/10 | 5/10 | 6/10 | **6/10** | | **Average** | **6.0/10** | **5.7/10** | **6.5/10** | **6.2/10** | --- ## 📝 Next Steps Would you like me to: 1. **Create fixed versions** of the stack files with all critical issues resolved? 2. **Generate Docker secrets creation script** for all passwords? 3. **Add health checks** to all services? 4. **Consolidate duplicate configs** (e.g., remove duplicate Portainer)? 5. **Create a migration guide** for applying these changes safely? Let me know which improvements you'd like me to implement!