11 KiB
Docker Swarm Stack Files - Review & Recommendations
Overview
Reviewed 9 Docker Swarm stack files totaling ~24KB of configuration. Found critical security issues, configuration inconsistencies, and optimization opportunities.
🔴 Critical Issues
1. Hardcoded Secrets in Plain Text
Files Affected: full-stack-complete.yml, monitoring-stack.yml
Problems:
# Line 96: Paperless DB password in plain text
- PAPERLESS_DBPASS=paperless
# Line 98: Hardcoded secret key
- PAPERLESS_SECRET_KEY=change-me-please-to-something-secure
# Line 52: Grafana admin password exposed
- GF_SECURITY_ADMIN_PASSWORD=change-me-please
Risk: Anyone with access to the repo can see credentials. These will be in Docker configs and logs.
Fix: Use Docker secrets:
secrets:
paperless_db_password:
external: true
paperless_secret_key:
external: true
grafana_admin_password:
external: true
services:
paperless:
secrets:
- paperless_db_password
- paperless_secret_key
environment:
- PAPERLESS_DBPASS_FILE=/run/secrets/paperless_db_password
- PAPERLESS_SECRET_KEY_FILE=/run/secrets/paperless_secret_key
2. Missing Health Checks
Files Affected: All stack files
Problem: No services have health checks configured, meaning:
- Swarm can't detect unhealthy containers
- Auto-restart won't work properly
- Load balancers may route to failing instances
Fix: Add health checks to critical services:
services:
paperless:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
3. Incorrect node-exporter Command
File: monitoring-stack.yml:111-114
Problem:
command:
- '--config.file=/etc/prometheus/prometheus.yml' # Wrong! This is for Prometheus
- '--storage.tsdb.path=/prometheus' # Wrong!
Fix:
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
⚠️ High-Priority Warnings
4. Missing Networks on Database Services
File: full-stack-complete.yml
Problem: paperless-db (line 70) doesn't have a network defined, but Paperless tries to connect to it.
Fix:
paperless-db:
networks:
- homelab-backend # Add this
5. Resource Limits Too High for Pi Zero
File: full-stack-complete.yml
Problem: Services with node.labels.leader == true (Pi 4) have resource limits that may be too high:
- Paperless: 2GB memory (Pi 4 has 8GB total)
- Stirling-PDF: 2GB memory
- SearXNG: 2GB memory
- Combined: 6GB+ on one node
Fix: Reduce limits or spread services across nodes:
deploy:
placement:
constraints:
- node.labels.leader == true
- node.memory.available > 2G # Add memory check
6. Duplicate Portainer Definitions
Files: portainer-stack.yml vs tools-stack.yml
Problem: Portainer is defined in both files with different configurations:
portainer-stack.yml: Uses agent mode with global agentstools-stack.yml: Uses socket mode (simpler but less scalable)
Fix: Pick one approach and remove the duplicate.
7. Missing Traefik Network Declaration
File: monitoring-stack.yml:38-44
Problem: Prometheus has Traefik labels but isn't on the traefik-public network.
Fix:
prometheus:
networks:
- monitoring
- traefik-public # Add this
🟡 Medium-Priority Improvements
8. Missing Restart Policies
Files Affected: Most services
Problem: Only Portainer has restart policies. Other services will fail permanently on error.
Fix: Add to all services:
deploy:
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
9. Watchtower Interval Too Frequent
File: full-stack-complete.yml:191
Problem: --interval 300 = check every 5 minutes (too frequent)
Fix: Change to hourly or daily:
command: --cleanup --interval 86400 # Daily
10. Missing Logging Configuration
Files Affected: All
Problem: No log driver or limits configured. Logs can fill disk.
Fix:
deploy:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
11. Version 3.9 is Deprecated
Files Affected: All
Problem: Docker Compose v3.9 is deprecated. Should use Compose Specification (no version field) or v3.8.
Fix: Remove version line or use version: '3.8'
🟢 Best Practice Recommendations
12. Add Update Configs
Benefit: Zero-downtime deployments
deploy:
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
order: start-first
13. Use Specific Image Tags
Files Affected: Services using :latest
Current:
image: portainer/portainer-ce:latest
image: searxng/searxng:latest
Better:
image: portainer/portainer-ce:2.33.4
image: searxng/searxng:2024.11.20
Good tags already used: full-stack-complete.yml has several pinned versions ✓
14. Add Labels for Documentation
Benefit: Self-documenting infrastructure
deploy:
labels:
- "com.homelab.description=Paperless document management"
- "com.homelab.maintainer=@sj98"
- "com.homelab.version=2.19.3"
15. Separate Configs from Stacks
Problem: Mixing config and stack definitions
Current: Prometheus config is external (good!) Recommendation: Do the same for Traefik, Alertmanager configs
16. Add Dependency Ordering
Current: Some services use depends_on (good!)
Problem: Not all services that need it have it
paperless:
depends_on:
- paperless-redis
- paperless-db
📋 Detailed File-by-File Analysis
full-stack-complete.yml
Good:
- ✅ Proper network segmentation (traefik-public vs homelab-backend)
- ✅ Resource limits defined
- ✅ Node placement constraints
- ✅ Specific image tags for most services
Issues:
- 🔴 Hardcoded passwords (lines 96, 98)
- 🔴 No health checks
- ⚠️ paperless-db missing network
- ⚠️ Resource limits may be too high for Pi 4
Score: 6/10
monitoring-stack.yml
Good:
- ✅ Proper monitoring network
- ✅ External configs for Prometheus
- ✅ Resource limits
Issues:
- 🔴 Hardcoded Grafana password (line 52)
- 🔴 node-exporter has wrong command (lines 111-114)
- ⚠️ Prometheus missing traefik-public network
- ⚠️ No health checks
Score: 5/10
networking-stack.yml
Good:
- ✅ Uses secrets for DuckDNS token
- ✅ External volume for Let's Encrypt
- ✅ Proper network attachment
Issues:
- ⚠️ Traefik single replica (should be 2+ for HA)
- ⚠️ No health check
- ⚠️ whoami resource limits too strict
Score: 7/10
portainer-stack.yml
Good:
- ✅ Has restart policies!
- ✅ Supports both Windows and Linux agents
- ✅ Proper network setup
Issues:
- ⚠️ Duplicate of tools-stack.yml Portainer
- ⚠️ No health check
Score: 7/10
tools-stack.yml
Good:
- ✅ All tools on manager node (correct)
- ✅ Resource limits defined
Issues:
- ⚠️ Duplicate Portainer definition
- ⚠️ lazydocker needs TTY, won't work in Swarm
- ⚠️ No restart policies
Score: 6/10
node-exporter-stack.yml
Content (created by us):
version: '3.8'
services:
node-exporter:
image: prom/node-exporter:latest
command:
- '--path.rootfs=/host'
volumes:
- '/:/host:ro,rslave'
deploy:
mode: global
Good:
- ✅ Global mode (runs on all nodes)
- ✅ Read-only host mount
Issues:
- ⚠️ Uses
:latesttag - ⚠️ No resource limits
- ⚠️ No health check
Score: 6/10
🛠️ Recommended Action Plan
Phase 1: Critical Security (Do Immediately)
- ✅ Create Docker secrets for all passwords
- ✅ Update stack files to use secrets
- ✅ Fix node-exporter command
- ✅ Add missing network to paperless-db
Phase 2: Stability (Do This Week)
- ⏭️ Add health checks to all services
- ⏭️ Add restart policies
- ⏭️ Fix Prometheus network
- ⏭️ Remove duplicate Portainer
Phase 3: Optimization (Do This Month)
- ⏭️ Update all
:latesttags to specific versions - ⏭️ Add update configs
- ⏭️ Configure logging limits
- ⏭️ Review resource limits
Phase 4: Best Practices (Ongoing)
- ⏭️ Add documentation labels
- ⏭️ Separate configs from stacks
- ⏭️ Set up monitoring alerts for service health
🎯 Summary Scores
| Stack File | Security | Stability | Best Practices | Overall |
|---|---|---|---|---|
| full-stack-complete.yml | 3/10 | 6/10 | 7/10 | 6/10 |
| monitoring-stack.yml | 4/10 | 5/10 | 6/10 | 5/10 |
| networking-stack.yml | 8/10 | 6/10 | 7/10 | 7/10 |
| portainer-stack.yml | 7/10 | 7/10 | 7/10 | 7/10 |
| tools-stack.yml | 7/10 | 5/10 | 6/10 | 6/10 |
| node-exporter-stack.yml | 7/10 | 5/10 | 6/10 | 6/10 |
| Average | 6.0/10 | 5.7/10 | 6.5/10 | 6.2/10 |
📝 Next Steps
Would you like me to:
- Create fixed versions of the stack files with all critical issues resolved?
- Generate Docker secrets creation script for all passwords?
- Add health checks to all services?
- Consolidate duplicate configs (e.g., remove duplicate Portainer)?
- Create a migration guide for applying these changes safely?
Let me know which improvements you'd like me to implement!