413 lines
11 KiB
Markdown
413 lines
11 KiB
Markdown
# Docker Swarm Stack Files - Review & Recommendations
|
|
|
|
## Overview
|
|
Reviewed 9 Docker Swarm stack files totaling ~24KB of configuration. Found **critical security issues**, configuration inconsistencies, and optimization opportunities.
|
|
|
|
---
|
|
|
|
## 🔴 Critical Issues
|
|
|
|
### 1. **Hardcoded Secrets in Plain Text**
|
|
**Files Affected**: [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml), [`monitoring-stack.yml`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml)
|
|
|
|
**Problems**:
|
|
```yaml
|
|
# Line 96: Paperless DB password in plain text
|
|
- PAPERLESS_DBPASS=paperless
|
|
|
|
# Line 98: Hardcoded secret key
|
|
- PAPERLESS_SECRET_KEY=change-me-please-to-something-secure
|
|
|
|
# Line 52: Grafana admin password exposed
|
|
- GF_SECURITY_ADMIN_PASSWORD=change-me-please
|
|
```
|
|
|
|
**Risk**: Anyone with access to the repo can see credentials. These will be in Docker configs and logs.
|
|
|
|
**Fix**: Use Docker secrets:
|
|
```yaml
|
|
secrets:
|
|
paperless_db_password:
|
|
external: true
|
|
paperless_secret_key:
|
|
external: true
|
|
grafana_admin_password:
|
|
external: true
|
|
|
|
services:
|
|
paperless:
|
|
secrets:
|
|
- paperless_db_password
|
|
- paperless_secret_key
|
|
environment:
|
|
- PAPERLESS_DBPASS_FILE=/run/secrets/paperless_db_password
|
|
- PAPERLESS_SECRET_KEY_FILE=/run/secrets/paperless_secret_key
|
|
```
|
|
|
|
### 2. **Missing Health Checks**
|
|
**Files Affected**: All stack files
|
|
|
|
**Problem**: No services have health checks configured, meaning:
|
|
- Swarm can't detect unhealthy containers
|
|
- Auto-restart won't work properly
|
|
- Load balancers may route to failing instances
|
|
|
|
**Fix**: Add health checks to critical services:
|
|
|
|
```yaml
|
|
services:
|
|
paperless:
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 60s
|
|
```
|
|
|
|
### 3. **Incorrect node-exporter Command**
|
|
**File**: [`monitoring-stack.yml:111-114`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml#L111-L114)
|
|
|
|
**Problem**:
|
|
```yaml
|
|
command:
|
|
- '--config.file=/etc/prometheus/prometheus.yml' # Wrong! This is for Prometheus
|
|
- '--storage.tsdb.path=/prometheus' # Wrong!
|
|
```
|
|
|
|
**Fix**:
|
|
```yaml
|
|
command:
|
|
- '--path.procfs=/host/proc'
|
|
- '--path.rootfs=/rootfs'
|
|
- '--path.sysfs=/host/sys'
|
|
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
|
|
```
|
|
|
|
---
|
|
|
|
## ⚠️ High-Priority Warnings
|
|
|
|
### 4. **Missing Networks on Database Services**
|
|
**File**: [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml)
|
|
|
|
**Problem**: `paperless-db` (line 70) doesn't have a network defined, but Paperless tries to connect to it.
|
|
|
|
**Fix**:
|
|
```yaml
|
|
paperless-db:
|
|
networks:
|
|
- homelab-backend # Add this
|
|
```
|
|
|
|
### 5. **Resource Limits Too High for Pi Zero**
|
|
**File**: [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml)
|
|
|
|
**Problem**: Services with `node.labels.leader == true` (Pi 4) have resource limits that may be too high:
|
|
- Paperless: 2GB memory (Pi 4 has 8GB total)
|
|
- Stirling-PDF: 2GB memory
|
|
- SearXNG: 2GB memory
|
|
- Combined: 6GB+ on one node
|
|
|
|
**Fix**: Reduce limits or spread services across nodes:
|
|
```yaml
|
|
deploy:
|
|
placement:
|
|
constraints:
|
|
- node.labels.leader == true
|
|
- node.memory.available > 2G # Add memory check
|
|
```
|
|
|
|
### 6. **Duplicate Portainer Definitions**
|
|
**Files**: [`portainer-stack.yml`](file:///workspace/homelab/services/swarm/stacks/portainer-stack.yml) vs [`tools-stack.yml`](file:///workspace/homelab/services/swarm/stacks/tools-stack.yml)
|
|
|
|
**Problem**: Portainer is defined in both files with different configurations:
|
|
- `portainer-stack.yml`: Uses agent mode with global agents
|
|
- `tools-stack.yml`: Uses socket mode (simpler but less scalable)
|
|
|
|
**Fix**: Pick one approach and remove the duplicate.
|
|
|
|
### 7. **Missing Traefik Network Declaration**
|
|
**File**: [`monitoring-stack.yml:38-44`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml#L38-L44)
|
|
|
|
**Problem**: Prometheus has Traefik labels but isn't on the `traefik-public` network.
|
|
|
|
**Fix**:
|
|
```yaml
|
|
prometheus:
|
|
networks:
|
|
- monitoring
|
|
- traefik-public # Add this
|
|
```
|
|
|
|
---
|
|
|
|
## 🟡 Medium-Priority Improvements
|
|
|
|
### 8. **Missing Restart Policies**
|
|
**Files Affected**: Most services
|
|
|
|
**Problem**: Only Portainer has restart policies. Other services will fail permanently on error.
|
|
|
|
**Fix**: Add to all services:
|
|
```yaml
|
|
deploy:
|
|
restart_policy:
|
|
condition: on-failure
|
|
delay: 5s
|
|
max_attempts: 3
|
|
```
|
|
|
|
### 9. **Watchtower Interval Too Frequent**
|
|
**File**: [`full-stack-complete.yml:191`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml#L191)
|
|
|
|
**Problem**: `--interval 300` = check every 5 minutes (too frequent)
|
|
|
|
**Fix**: Change to hourly or daily:
|
|
```yaml
|
|
command: --cleanup --interval 86400 # Daily
|
|
```
|
|
|
|
### 10. **Missing Logging Configuration**
|
|
**Files Affected**: All
|
|
|
|
**Problem**: No log driver or limits configured. Logs can fill disk.
|
|
|
|
**Fix**:
|
|
```yaml
|
|
deploy:
|
|
logging:
|
|
driver: "json-file"
|
|
options:
|
|
max-size: "10m"
|
|
max-file: "3"
|
|
```
|
|
|
|
### 11. **Version 3.9 is Deprecated**
|
|
**Files Affected**: All
|
|
|
|
**Problem**: Docker Compose v3.9 is deprecated. Should use Compose Specification (no version field) or v3.8.
|
|
|
|
**Fix**: Remove version line or use `version: '3.8'`
|
|
|
|
---
|
|
|
|
## 🟢 Best Practice Recommendations
|
|
|
|
### 12. **Add Update Configs**
|
|
**Benefit**: Zero-downtime deployments
|
|
|
|
```yaml
|
|
deploy:
|
|
update_config:
|
|
parallelism: 1
|
|
delay: 10s
|
|
failure_action: rollback
|
|
order: start-first
|
|
```
|
|
|
|
### 13. **Use Specific Image Tags**
|
|
**Files Affected**: Services using `:latest`
|
|
|
|
**Current**:
|
|
```yaml
|
|
image: portainer/portainer-ce:latest
|
|
image: searxng/searxng:latest
|
|
```
|
|
|
|
**Better**:
|
|
```yaml
|
|
image: portainer/portainer-ce:2.33.4
|
|
image: searxng/searxng:2024.11.20
|
|
```
|
|
|
|
**Good tags already used**: `full-stack-complete.yml` has several pinned versions ✓
|
|
|
|
### 14. **Add Labels for Documentation**
|
|
**Benefit**: Self-documenting infrastructure
|
|
|
|
```yaml
|
|
deploy:
|
|
labels:
|
|
- "com.homelab.description=Paperless document management"
|
|
- "com.homelab.maintainer=@sj98"
|
|
- "com.homelab.version=2.19.3"
|
|
```
|
|
|
|
### 15. **Separate Configs from Stacks**
|
|
**Problem**: Mixing config and stack definitions
|
|
|
|
**Current**: Prometheus config is external (good!)
|
|
**Recommendation**: Do the same for Traefik, Alertmanager configs
|
|
|
|
### 16. **Add Dependency Ordering**
|
|
**Current**: Some services use `depends_on` (good!)
|
|
**Problem**: Not all services that need it have it
|
|
|
|
```yaml
|
|
paperless:
|
|
depends_on:
|
|
- paperless-redis
|
|
- paperless-db
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Detailed File-by-File Analysis
|
|
|
|
### [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml)
|
|
**Good**:
|
|
- ✅ Proper network segmentation (traefik-public vs homelab-backend)
|
|
- ✅ Resource limits defined
|
|
- ✅ Node placement constraints
|
|
- ✅ Specific image tags for most services
|
|
|
|
**Issues**:
|
|
- 🔴 Hardcoded passwords (lines 96, 98)
|
|
- 🔴 No health checks
|
|
- ⚠️ paperless-db missing network
|
|
- ⚠️ Resource limits may be too high for Pi 4
|
|
|
|
**Score**: 6/10
|
|
|
|
---
|
|
|
|
### [`monitoring-stack.yml`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml)
|
|
**Good**:
|
|
- ✅ Proper monitoring network
|
|
- ✅ External configs for Prometheus
|
|
- ✅ Resource limits
|
|
|
|
**Issues**:
|
|
- 🔴 Hardcoded Grafana password (line 52)
|
|
- 🔴 node-exporter has wrong command (lines 111-114)
|
|
- ⚠️ Prometheus missing traefik-public network
|
|
- ⚠️ No health checks
|
|
|
|
**Score**: 5/10
|
|
|
|
---
|
|
|
|
### [`networking-stack.yml`](file:///workspace/homelab/services/swarm/stacks/networking-stack.yml)
|
|
**Good**:
|
|
- ✅ Uses secrets for DuckDNS token
|
|
- ✅ External volume for Let's Encrypt
|
|
- ✅ Proper network attachment
|
|
|
|
**Issues**:
|
|
- ⚠️ Traefik single replica (should be 2+ for HA)
|
|
- ⚠️ No health check
|
|
- ⚠️ whoami resource limits too strict
|
|
|
|
**Score**: 7/10
|
|
|
|
---
|
|
|
|
### [`portainer-stack.yml`](file:///workspace/homelab/services/swarm/stacks/portainer-stack.yml)
|
|
**Good**:
|
|
- ✅ Has restart policies!
|
|
- ✅ Supports both Windows and Linux agents
|
|
- ✅ Proper network setup
|
|
|
|
**Issues**:
|
|
- ⚠️ Duplicate of tools-stack.yml Portainer
|
|
- ⚠️ No health check
|
|
|
|
**Score**: 7/10
|
|
|
|
---
|
|
|
|
### [`tools-stack.yml`](file:///workspace/homelab/services/swarm/stacks/tools-stack.yml)
|
|
**Good**:
|
|
- ✅ All tools on manager node (correct)
|
|
- ✅ Resource limits defined
|
|
|
|
**Issues**:
|
|
- ⚠️ Duplicate Portainer definition
|
|
- ⚠️ lazydocker needs TTY, won't work in Swarm
|
|
- ⚠️ No restart policies
|
|
|
|
**Score**: 6/10
|
|
|
|
---
|
|
|
|
### [`node-exporter-stack.yml`](file:///workspace/homelab/services/swarm/stacks/node-exporter-stack.yml)
|
|
**Content** (created by us):
|
|
```yaml
|
|
version: '3.8'
|
|
services:
|
|
node-exporter:
|
|
image: prom/node-exporter:latest
|
|
command:
|
|
- '--path.rootfs=/host'
|
|
volumes:
|
|
- '/:/host:ro,rslave'
|
|
deploy:
|
|
mode: global
|
|
```
|
|
|
|
**Good**:
|
|
- ✅ Global mode (runs on all nodes)
|
|
- ✅ Read-only host mount
|
|
|
|
**Issues**:
|
|
- ⚠️ Uses `:latest` tag
|
|
- ⚠️ No resource limits
|
|
- ⚠️ No health check
|
|
|
|
**Score**: 6/10
|
|
|
|
---
|
|
|
|
## 🛠️ Recommended Action Plan
|
|
|
|
### Phase 1: Critical Security (Do Immediately)
|
|
1. ✅ Create Docker secrets for all passwords
|
|
2. ✅ Update stack files to use secrets
|
|
3. ✅ Fix node-exporter command
|
|
4. ✅ Add missing network to paperless-db
|
|
|
|
### Phase 2: Stability (Do This Week)
|
|
1. ⏭️ Add health checks to all services
|
|
2. ⏭️ Add restart policies
|
|
3. ⏭️ Fix Prometheus network
|
|
4. ⏭️ Remove duplicate Portainer
|
|
|
|
### Phase 3: Optimization (Do This Month)
|
|
1. ⏭️ Update all `:latest` tags to specific versions
|
|
2. ⏭️ Add update configs
|
|
3. ⏭️ Configure logging limits
|
|
4. ⏭️ Review resource limits
|
|
|
|
### Phase 4: Best Practices (Ongoing)
|
|
1. ⏭️ Add documentation labels
|
|
2. ⏭️ Separate configs from stacks
|
|
3. ⏭️ Set up monitoring alerts for service health
|
|
|
|
---
|
|
|
|
## 🎯 Summary Scores
|
|
|
|
| Stack File | Security | Stability | Best Practices | Overall |
|
|
|-----------|----------|-----------|----------------|---------|
|
|
| full-stack-complete.yml | 3/10 | 6/10 | 7/10 | **6/10** |
|
|
| monitoring-stack.yml | 4/10 | 5/10 | 6/10 | **5/10** |
|
|
| networking-stack.yml | 8/10 | 6/10 | 7/10 | **7/10** |
|
|
| portainer-stack.yml | 7/10 | 7/10 | 7/10 | **7/10** |
|
|
| tools-stack.yml | 7/10 | 5/10 | 6/10 | **6/10** |
|
|
| node-exporter-stack.yml | 7/10 | 5/10 | 6/10 | **6/10** |
|
|
| **Average** | **6.0/10** | **5.7/10** | **6.5/10** | **6.2/10** |
|
|
|
|
---
|
|
|
|
## 📝 Next Steps
|
|
|
|
Would you like me to:
|
|
1. **Create fixed versions** of the stack files with all critical issues resolved?
|
|
2. **Generate Docker secrets creation script** for all passwords?
|
|
3. **Add health checks** to all services?
|
|
4. **Consolidate duplicate configs** (e.g., remove duplicate Portainer)?
|
|
5. **Create a migration guide** for applying these changes safely?
|
|
|
|
Let me know which improvements you'd like me to implement!
|