Files
Homelab/docs/reviews/SWARM_STACK_REVIEW.md

11 KiB

Docker Swarm Stack Files - Review & Recommendations

Overview

Reviewed 9 Docker Swarm stack files totaling ~24KB of configuration. Found critical security issues, configuration inconsistencies, and optimization opportunities.


🔴 Critical Issues

1. Hardcoded Secrets in Plain Text

Files Affected: full-stack-complete.yml, monitoring-stack.yml

Problems:

# Line 96: Paperless DB password in plain text
- PAPERLESS_DBPASS=paperless

# Line 98: Hardcoded secret key
- PAPERLESS_SECRET_KEY=change-me-please-to-something-secure

# Line 52: Grafana admin password exposed
- GF_SECURITY_ADMIN_PASSWORD=change-me-please

Risk: Anyone with access to the repo can see credentials. These will be in Docker configs and logs.

Fix: Use Docker secrets:

secrets:
  paperless_db_password:
    external: true
  paperless_secret_key:
    external: true
  grafana_admin_password:
    external: true

services:
  paperless:
    secrets:
      - paperless_db_password
      - paperless_secret_key
    environment:
      - PAPERLESS_DBPASS_FILE=/run/secrets/paperless_db_password
      - PAPERLESS_SECRET_KEY_FILE=/run/secrets/paperless_secret_key

2. Missing Health Checks

Files Affected: All stack files

Problem: No services have health checks configured, meaning:

  • Swarm can't detect unhealthy containers
  • Auto-restart won't work properly
  • Load balancers may route to failing instances

Fix: Add health checks to critical services:

services:
  paperless:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

3. Incorrect node-exporter Command

File: monitoring-stack.yml:111-114

Problem:

command:
  - '--config.file=/etc/prometheus/prometheus.yml'  # Wrong! This is for Prometheus
  - '--storage.tsdb.path=/prometheus'              # Wrong!

Fix:

command:
  - '--path.procfs=/host/proc'
  - '--path.rootfs=/rootfs'
  - '--path.sysfs=/host/sys'
  - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'

⚠️ High-Priority Warnings

4. Missing Networks on Database Services

File: full-stack-complete.yml

Problem: paperless-db (line 70) doesn't have a network defined, but Paperless tries to connect to it.

Fix:

paperless-db:
  networks:
    - homelab-backend  # Add this

5. Resource Limits Too High for Pi Zero

File: full-stack-complete.yml

Problem: Services with node.labels.leader == true (Pi 4) have resource limits that may be too high:

  • Paperless: 2GB memory (Pi 4 has 8GB total)
  • Stirling-PDF: 2GB memory
  • SearXNG: 2GB memory
  • Combined: 6GB+ on one node

Fix: Reduce limits or spread services across nodes:

deploy:
  placement:
    constraints:
      - node.labels.leader == true
      - node.memory.available > 2G  # Add memory check

6. Duplicate Portainer Definitions

Files: portainer-stack.yml vs tools-stack.yml

Problem: Portainer is defined in both files with different configurations:

  • portainer-stack.yml: Uses agent mode with global agents
  • tools-stack.yml: Uses socket mode (simpler but less scalable)

Fix: Pick one approach and remove the duplicate.

7. Missing Traefik Network Declaration

File: monitoring-stack.yml:38-44

Problem: Prometheus has Traefik labels but isn't on the traefik-public network.

Fix:

prometheus:
  networks:
    - monitoring
    - traefik-public  # Add this

🟡 Medium-Priority Improvements

8. Missing Restart Policies

Files Affected: Most services

Problem: Only Portainer has restart policies. Other services will fail permanently on error.

Fix: Add to all services:

deploy:
  restart_policy:
    condition: on-failure
    delay: 5s
    max_attempts: 3

9. Watchtower Interval Too Frequent

File: full-stack-complete.yml:191

Problem: --interval 300 = check every 5 minutes (too frequent)

Fix: Change to hourly or daily:

command: --cleanup --interval 86400  # Daily

10. Missing Logging Configuration

Files Affected: All

Problem: No log driver or limits configured. Logs can fill disk.

Fix:

deploy:
  logging:
    driver: "json-file"
    options:
      max-size: "10m"
      max-file: "3"

11. Version 3.9 is Deprecated

Files Affected: All

Problem: Docker Compose v3.9 is deprecated. Should use Compose Specification (no version field) or v3.8.

Fix: Remove version line or use version: '3.8'


🟢 Best Practice Recommendations

12. Add Update Configs

Benefit: Zero-downtime deployments

deploy:
  update_config:
    parallelism: 1
    delay: 10s
    failure_action: rollback
    order: start-first

13. Use Specific Image Tags

Files Affected: Services using :latest

Current:

image: portainer/portainer-ce:latest
image: searxng/searxng:latest

Better:

image: portainer/portainer-ce:2.33.4
image: searxng/searxng:2024.11.20

Good tags already used: full-stack-complete.yml has several pinned versions ✓

14. Add Labels for Documentation

Benefit: Self-documenting infrastructure

deploy:
  labels:
    - "com.homelab.description=Paperless document management"
    - "com.homelab.maintainer=@sj98"
    - "com.homelab.version=2.19.3"

15. Separate Configs from Stacks

Problem: Mixing config and stack definitions

Current: Prometheus config is external (good!) Recommendation: Do the same for Traefik, Alertmanager configs

16. Add Dependency Ordering

Current: Some services use depends_on (good!) Problem: Not all services that need it have it

paperless:
  depends_on:
    - paperless-redis
    - paperless-db

📋 Detailed File-by-File Analysis

full-stack-complete.yml

Good:

  • Proper network segmentation (traefik-public vs homelab-backend)
  • Resource limits defined
  • Node placement constraints
  • Specific image tags for most services

Issues:

  • 🔴 Hardcoded passwords (lines 96, 98)
  • 🔴 No health checks
  • ⚠️ paperless-db missing network
  • ⚠️ Resource limits may be too high for Pi 4

Score: 6/10


monitoring-stack.yml

Good:

  • Proper monitoring network
  • External configs for Prometheus
  • Resource limits

Issues:

  • 🔴 Hardcoded Grafana password (line 52)
  • 🔴 node-exporter has wrong command (lines 111-114)
  • ⚠️ Prometheus missing traefik-public network
  • ⚠️ No health checks

Score: 5/10


networking-stack.yml

Good:

  • Uses secrets for DuckDNS token
  • External volume for Let's Encrypt
  • Proper network attachment

Issues:

  • ⚠️ Traefik single replica (should be 2+ for HA)
  • ⚠️ No health check
  • ⚠️ whoami resource limits too strict

Score: 7/10


portainer-stack.yml

Good:

  • Has restart policies!
  • Supports both Windows and Linux agents
  • Proper network setup

Issues:

  • ⚠️ Duplicate of tools-stack.yml Portainer
  • ⚠️ No health check

Score: 7/10


tools-stack.yml

Good:

  • All tools on manager node (correct)
  • Resource limits defined

Issues:

  • ⚠️ Duplicate Portainer definition
  • ⚠️ lazydocker needs TTY, won't work in Swarm
  • ⚠️ No restart policies

Score: 6/10


node-exporter-stack.yml

Content (created by us):

version: '3.8'
services:
  node-exporter:
    image: prom/node-exporter:latest
    command:
      - '--path.rootfs=/host'
    volumes:
      - '/:/host:ro,rslave'
    deploy:
      mode: global

Good:

  • Global mode (runs on all nodes)
  • Read-only host mount

Issues:

  • ⚠️ Uses :latest tag
  • ⚠️ No resource limits
  • ⚠️ No health check

Score: 6/10


Phase 1: Critical Security (Do Immediately)

  1. Create Docker secrets for all passwords
  2. Update stack files to use secrets
  3. Fix node-exporter command
  4. Add missing network to paperless-db

Phase 2: Stability (Do This Week)

  1. ⏭️ Add health checks to all services
  2. ⏭️ Add restart policies
  3. ⏭️ Fix Prometheus network
  4. ⏭️ Remove duplicate Portainer

Phase 3: Optimization (Do This Month)

  1. ⏭️ Update all :latest tags to specific versions
  2. ⏭️ Add update configs
  3. ⏭️ Configure logging limits
  4. ⏭️ Review resource limits

Phase 4: Best Practices (Ongoing)

  1. ⏭️ Add documentation labels
  2. ⏭️ Separate configs from stacks
  3. ⏭️ Set up monitoring alerts for service health

🎯 Summary Scores

Stack File Security Stability Best Practices Overall
full-stack-complete.yml 3/10 6/10 7/10 6/10
monitoring-stack.yml 4/10 5/10 6/10 5/10
networking-stack.yml 8/10 6/10 7/10 7/10
portainer-stack.yml 7/10 7/10 7/10 7/10
tools-stack.yml 7/10 5/10 6/10 6/10
node-exporter-stack.yml 7/10 5/10 6/10 6/10
Average 6.0/10 5.7/10 6.5/10 6.2/10

📝 Next Steps

Would you like me to:

  1. Create fixed versions of the stack files with all critical issues resolved?
  2. Generate Docker secrets creation script for all passwords?
  3. Add health checks to all services?
  4. Consolidate duplicate configs (e.g., remove duplicate Portainer)?
  5. Create a migration guide for applying these changes safely?

Let me know which improvements you'd like me to implement!