Initial commit: homelab configuration and documentation

This commit is contained in:
2025-11-29 19:03:14 +00:00
commit 0769ca6888
72 changed files with 7806 additions and 0 deletions

286
README.md Normal file
View File

@@ -0,0 +1,286 @@
# Home Lab Improvements - Complete Implementation
This repository contains all the configurations, scripts, and documentation for comprehensive homelab improvements.
## 📋 Overview
A complete implementation plan for upgrading a home lab infrastructure with focus on:
- Network performance and segmentation
- Storage redundancy and performance
- Service resilience and high availability
- Security hardening
- Comprehensive monitoring
- Automated backups
## 🗂️ Repository Structure
```
/workspace/homelab/
├── docs/
│ └── guides/
│ ├── Homelab.md # Main homelab configuration
│ ├── DEPLOYMENT_GUIDE.md # Step-by-step deployment instructions
│ ├── NAS_Mount_Guide.md # NAS mounting procedures
│ └── health_checks.md # Health check configurations
├── scripts/
│ ├── zfs_setup.sh # ZFS pool creation
│ ├── prune_ai_models.sh # AI model cache cleanup
│ ├── install_fail2ban.sh # Security installation
│ ├── vlan_firewall.sh # VLAN/firewall configuration
│ ├── setup_monitoring.sh # Monitoring deployment
│ ├── backup_daily.sh # Restic backup script
│ ├── install_restic_backup.sh # Backup system installation
│ ├── deploy_all.sh # Master deployment orchestrator
│ ├── validate_deployment.sh # Deployment validation
│ ├── network_performance_test.sh # Network speed testing
│ ├── setup_log_rotation.sh # Log rotation config
│ └── quick_status.sh # Quick health dashboard
├── services/
│ ├── swarm/
│ │ ├── traefik/
│ │ │ └── stack.yml # Traefik HA configuration
│ │ └── stacks/
│ │ └── node-exporter-stack.yml
│ └── standalone/
│ └── Caddy/
│ ├── docker-compose.yml # Fallback proxy
│ ├── Caddyfile # Caddy configuration
│ └── maintenance.html # Maintenance page
├── security/
│ └── fail2ban/
│ ├── jail.local # Jail configuration
│ └── filter.d/ # Custom filters
├── monitoring/
│ └── grafana/
│ └── alert_rules.yml # Alert definitions
└── systemd/
├── restic-backup.service # Backup service
└── restic-backup.timer # Backup schedule
```
## 🤖 Automation Tools
### Master Deployment Script
```bash
# Deploy all improvements with guided prompts
sudo bash /workspace/homelab/scripts/deploy_all.sh
```
### Quick Status Dashboard
```bash
# Get instant overview of homelab health
bash /workspace/homelab/scripts/quick_status.sh
```
### Validation & Testing
```bash
# Validate deployment
bash /workspace/homelab/scripts/validate_deployment.sh
# Test network performance
bash /workspace/homelab/scripts/network_performance_test.sh
```
### Log Management
```bash
# Setup automatic log rotation
sudo bash /workspace/homelab/scripts/setup_log_rotation.sh
```
---
## 🚀 Quick Start
1. **Review the main configuration**:
```bash
cat /workspace/homelab/docs/guides/Homelab.md
```
2. **Follow the deployment guide**:
```bash
cat /workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md
```
3. **Make scripts executable**:
```bash
chmod +x /workspace/homelab/scripts/*.sh
```
## 📦 Components
### Network Improvements
- **2.5 Gb PoE managed switch** (Netgear GS110EMX recommended)
- **VLAN segmentation** (Management VLAN 10, Services VLAN 20)
- **LACP bonding** on Ryzen node for 5 Gb aggregated bandwidth
### Storage Enhancements
- **ZFS pool** on Proxmox host with compression and snapshots
- **Dedicated NAS** with RAID-6 and SSD cache
- **Automated pruning** of AI model caches
### Service Resilience
- **Traefik HA**: 2 replicas in Docker Swarm
- **Caddy fallback**: Lightweight backup reverse proxy
- **Health checks**: Auto-restart for critical services
- **Volume separation**: Performance-optimized storage
### Security Hardening
- **fail2ban**: Protection for SSH, Portainer, Traefik
- **VLAN firewall rules**: Inter-VLAN traffic control
- **VPN-only access**: Portainer restricted to Tailscale
- **2FA/OAuth**: Enhanced authentication
### Monitoring & Automation
- **node-exporter**: System metrics on all nodes
- **Grafana alerts**: CPU, RAM, disk, uptime monitoring
- **Home Assistant backups**: Automated to NAS
- **Tailscale metrics**: VPN health monitoring
### Backup Strategy
- **Restic**: Encrypted backups to Backblaze B2
- **Daily schedule**: Systemd timer at 02:00 AM
- **Retention policy**: 7 daily, 4 weekly, 12 monthly
- **Auto-pruning**: Keeps repository clean
## 🔧 Installation Order
Follow this sequence to minimize downtime:
1. **Network Upgrade** (requires brief downtime)
- Install new switch
- Configure VLANs
- Setup LACP bonding
2. **Storage Enhancements**
- Create ZFS pool
- Mount NAS shares
- Setup pruning cron
3. **Service Consolidation**
- Deploy Traefik Swarm service
- Deploy Caddy fallback
- Add health checks
4. **Security Hardening**
- Install fail2ban
- Configure firewall rules
- Restrict Portainer access
5. **Monitoring & Automation**
- Deploy node-exporter
- Configure Grafana alerts
- Setup Home Assistant backups
6. **Backup Strategy**
- Install restic
- Configure B2 repository
- Enable systemd timer
## ✅ Verification
After deployment, verify each component:
```bash
# Network
ethtool eth0 | grep Speed
ip -d link show
# Storage
zpool status tank
df -h | grep /mnt/nas
# Services
docker service ls
docker ps --filter "health=healthy"
# Security
sudo fail2ban-client status
sudo iptables -L -n -v
# Monitoring
curl http://192.168.1.196:9100/metrics
# Backups
sudo systemctl status restic-backup.timer
```
## 🛡️ Security Notes
- Update all placeholder credentials in scripts
- Store B2 credentials securely (consider using secrets management)
- Review firewall rules before applying
- Test fail2ban rules to avoid lockouts
- Keep backup encryption password safe
## 📊 Monitoring Access
- **Grafana**: http://192.168.1.196:3000
- **Portainer**: http://192.168.1.196:9000 (VPN only)
- **Prometheus**: http://192.168.1.196:9090
- **node-exporter**: http://<node-ip>:9100/metrics
## 🔄 Maintenance
### Daily
- Automated restic backups at 02:00 AM
- AI model cache pruning at 03:00 AM
- fail2ban monitoring
### Weekly
- Review Grafana alerts
- Check backup snapshots
- Monitor disk usage
### Monthly
- Restic repository integrity check (auto on 1st)
- Review security logs
- Update Docker images
## 🆘 Disaster Recovery
Comprehensive disaster recovery procedures are documented in:
- [DISASTER_RECOVERY.md](/workspace/homelab/docs/guides/DISASTER_RECOVERY.md)
Quick recovery for common scenarios:
- **Node failure**: Services auto-reschedule to healthy nodes
- **Manager down**: Promote worker to manager
- **Storage failure**: Restore from restic backups
- **Complete disaster**: Full rebuild from B2 backups (~2 hours)
### Emergency Backup Restore
```bash
# Install restic
sudo apt-get install restic
# Configure and restore
export RESTIC_REPOSITORY="b2:bucket:/backups"
export RESTIC_PASSWORD="your_password"
restic restore latest --target /tmp/restore
```
---
## 🆘 Troubleshooting
Common issues and solutions are documented in:
- [DEPLOYMENT_GUIDE.md](/workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md) - Rollback procedures
- [NAS_Mount_Guide.md](/workspace/homelab/docs/guides/NAS_Mount_Guide.md) - Mount issues
- Individual script comments - Script-specific troubleshooting
## 📝 License
This is a personal homelab configuration. Use and modify as needed for your own setup.
## 🙏 Acknowledgments
Based on best practices from:
- Docker Swarm documentation
- Traefik documentation
- Restic backup documentation
- Home Assistant community
- r/homelab community
---
**Last Updated**: 2025-11-21
**Configuration Version**: 2.0

View File

@@ -0,0 +1,329 @@
# Home Lab Improvements - Deployment Guide
This guide provides step-by-step instructions for deploying all the homelab improvements.
## Table of Contents
1. [Network Upgrade](#network-upgrade)
2. [Storage Enhancements](#storage-enhancements)
3. [Service Consolidation](#service-consolidation)
4. [Security Hardening](#security-hardening)
5. [Monitoring & Automation](#monitoring--automation)
6. [Backup Strategy](#backup-strategy)
---
## Prerequisites
- SSH access to all nodes
- Root/sudo privileges
- Docker Swarm cluster operational
- Backblaze B2 account (for backups)
---
## 1. Network Upgrade
### 1.1 Install 2.5 Gb PoE Switch
**Hardware**: Netgear GS110EMX or equivalent
**Steps**:
1. Power down affected nodes
2. Install new switch
3. Connect all 2.5 Gb nodes (Ryzen .81, Acer .57)
4. Connect 1 Gb nodes (Pi 4 .245, Time Capsule .153)
5. Power on and verify link speeds
**Verification**:
```bash
# On each node, check link speed:
ethtool eth0 | grep Speed
```
### 1.2 Configure VLANs
**Script**: `/workspace/homelab/scripts/vlan_firewall.sh`
**Steps**:
1. Create VLAN 10 (Management): 192.168.10.0/24
2. Create VLAN 20 (Services): 192.168.20.0/24
3. Configure router ACLs using the firewall script
**Verification**:
```bash
# Check VLAN configuration
ip -d link show
# Test VLAN isolation
ping 192.168.10.1 # from VLAN 20 (should fail for restricted ports)
```
### 1.3 Configure LACP Bonding (Ryzen Node)
**Note**: Requires two NICs on the Ryzen node
**Configuration** (`/etc/network/interfaces.d/bond0.cfg`):
```
auto bond0
iface bond0 inet static
address 192.168.1.81
netmask 255.255.255.0
gateway 192.168.1.1
bond-mode 802.3ad
bond-miimon 100
bond-slaves eth0 eth1
```
**Apply**:
```bash
sudo systemctl restart networking
```
---
## 2. Storage Enhancements
### 2.1 Create ZFS Pool on Proxmox Host
**Script**: `/workspace/homelab/scripts/zfs_setup.sh`
**Steps**:
1. SSH to Proxmox host (192.168.1.57)
2. Identify SSD devices: `lsblk`
3. Update script with correct device names
4. Run: `sudo bash /workspace/homelab/scripts/zfs_setup.sh`
**Verification**:
```bash
zpool status tank
zfs list
```
### 2.2 Mount NAS on All Nodes
**Guide**: `/workspace/homelab/docs/guides/NAS_Mount_Guide.md`
**Steps**:
1. Follow the NAS Mount Guide for each node
2. Create credentials file
3. Add to `/etc/fstab`
4. Mount: `sudo mount -a`
**Verification**:
```bash
df -h | grep /mnt/nas
ls -la /mnt/nas
```
### 2.3 Setup AI Model Pruning
**Script**: `/workspace/homelab/scripts/prune_ai_models.sh`
**Steps**:
1. Update MODEL_DIR path in script
2. Make executable: `chmod +x /workspace/homelab/scripts/prune_ai_models.sh`
3. Add to cron: `crontab -e`
```
0 3 * * * /workspace/homelab/scripts/prune_ai_models.sh
```
**Verification**:
```bash
# Test run
sudo /workspace/homelab/scripts/prune_ai_models.sh
# Check cron logs
grep CRON /var/log/syslog
```
---
## 3. Service Consolidation
### 3.1 Deploy Traefik Swarm Service
**Stack**: `/workspace/homelab/services/swarm/traefik/stack.yml`
**Steps**:
1. Review and update stack.yml if needed
2. Deploy: `docker stack deploy -c /workspace/homelab/services/swarm/traefik/stack.yml traefik`
3. Remove standalone Traefik on Pi 4
**Verification**:
```bash
docker service ls | grep traefik
docker service ps traefik_traefik
curl -I http://192.168.1.196
```
### 3.2 Deploy Caddy Fallback (Pi Zero)
**Location**: `/workspace/homelab/services/standalone/Caddy/`
**Steps**:
1. SSH to Pi Zero (192.168.1.62)
2. Copy Caddy files to node
3. Run: `docker-compose up -d`
**Verification**:
```bash
docker ps | grep caddy
curl http://192.168.1.62:8080
```
### 3.3 Add Health Checks
**Guide**: `/workspace/homelab/docs/guides/health_checks.md`
**Steps**:
1. Review health check examples
2. Update service stack files for critical containers
3. Redeploy services: `docker stack deploy ...`
**Verification**:
```bash
docker ps --filter "health=healthy"
docker inspect <container> | jq '.[0].State.Health'
```
---
## 4. Security Hardening
### 4.1 Install fail2ban on Manager VM
**Script**: `/workspace/homelab/scripts/install_fail2ban.sh`
**Steps**:
1. SSH to manager VM (192.168.1.196)
2. Run: `sudo bash /workspace/homelab/scripts/install_fail2ban.sh`
**Verification**:
```bash
sudo fail2ban-client status
sudo fail2ban-client status sshd
sudo tail -f /var/log/fail2ban.log
```
### 4.2 Configure Firewall Rules
**Script**: `/workspace/homelab/scripts/vlan_firewall.sh`
**Steps**:
1. Review script and adjust VLANs/ports as needed
2. Run: `sudo bash /workspace/homelab/scripts/vlan_firewall.sh`
3. Configure router ACLs via web UI
**Verification**:
```bash
sudo iptables -L -n -v
# Test port accessibility from different VLANs
```
### 4.3 Restrict Portainer Access
**Options**:
- Configure Tailscale VPN-only access
- Enable OAuth integration
- Add firewall rules to block public access
**Configuration**: Update Portainer stack to bind to Tailscale interface only
---
## 5. Monitoring & Automation
### 5.1 Deploy node-exporter
**Script**: `/workspace/homelab/scripts/setup_monitoring.sh`
**Steps**:
1. Run: `sudo bash /workspace/homelab/scripts/setup_monitoring.sh`
2. Wait for deployment to complete
**Verification**:
```bash
docker service ps monitoring_node-exporter
curl http://192.168.1.196:9100/metrics
```
### 5.2 Configure Grafana Alerts
**Rules**: `/workspace/homelab/monitoring/grafana/alert_rules.yml`
**Steps**:
1. The setup script copies alert rules to Grafana
2. Login to Grafana UI
3. Navigate to Alerting > Alert Rules
4. Verify rules are loaded
**Verification**:
- Check Grafana UI for alert rules
- Trigger test alert (e.g., high CPU load)
---
## 6. Backup Strategy
### 6.1 Setup Restic Backups
**Script**: `/workspace/homelab/scripts/install_restic_backup.sh`
**Steps**:
1. Create Backblaze B2 bucket
2. Get B2 account ID and key
3. Update `/workspace/homelab/scripts/backup_daily.sh` with credentials
4. Run: `sudo bash /workspace/homelab/scripts/install_restic_backup.sh`
**Verification**:
```bash
sudo systemctl status restic-backup.timer
sudo systemctl list-timers
# Manual test run
sudo /workspace/homelab/scripts/backup_daily.sh
```
### 6.2 Verify Backups
```bash
# Check snapshots
export RESTIC_REPOSITORY="b2:your-bucket:/backups"
export RESTIC_PASSWORD="your_password"
restic snapshots
# Restore test
restic restore latest --target /tmp/restore-test
```
---
## Rollback Procedures
### If network upgrade fails:
- Reconnect to old switch
- Remove VLAN configurations
- Restart networking: `sudo systemctl restart networking`
### If ZFS pool creation fails:
- Destroy pool: `sudo zpool destroy tank`
- Verify data on SSDs before retrying
### If Traefik Swarm migration fails:
- Restart standalone Traefik on Pi 4
- Remove Swarm service: `docker service rm traefik_traefik`
### If backups fail:
- Check B2 credentials
- Verify network connectivity
- Check restic logs: `/var/log/restic_backup.log`
---
## Post-Deployment Checklist
- [ ] All nodes have 2.5 Gb connectivity
- [ ] VLANs configured and isolated
- [ ] ZFS pool created and healthy
- [ ] NAS mounted on all nodes
- [ ] Traefik Swarm service running with 2 replicas
- [ ] Caddy fallback operational
- [ ] fail2ban protecting manager VM
- [ ] Firewall rules active
- [ ] node-exporter running on all nodes
- [ ] Grafana alerts configured
- [ ] Restic backups running daily
- [ ] Health checks added to critical services
---
## Support & Troubleshooting
Refer to individual guide files for detailed troubleshooting:
- [NAS Mount Guide](/workspace/homelab/docs/guides/NAS_Mount_Guide.md)
- [Health Checks Guide](/workspace/homelab/docs/guides/health_checks.md)
- [Homelab Configuration](/workspace/homelab/docs/guides/Homelab.md)
For script issues, check logs in `/var/log/` and Docker logs: `docker service logs <service>`

View File

@@ -0,0 +1,375 @@
# Disaster Recovery Guide
## Overview
This guide provides procedures for recovering from various failure scenarios in the homelab.
## Quick Recovery Matrix
| Scenario | Impact | Recovery Time | Procedure |
|----------|--------|---------------|-----------|
| Single node failure | Partial | < 5 min | [Node Failure](#node-failure) |
| Manager node down | Service disruption | < 10 min | [Manager Recovery](#manager-node-recovery) |
| Storage failure | Data risk | < 30 min | [Storage Recovery](#storage-failure) |
| Network outage | Complete | < 15 min | [Network Recovery](#network-recovery) |
| Complete disaster | Full rebuild | < 2 hours | [Full Recovery](#complete-disaster-recovery) |
---
## Node Failure
### Symptoms
- Node unreachable via SSH
- Docker services not running on node
- Swarm reports node as "Down"
### Recovery Steps
1. **Verify node status**:
```bash
docker node ls
# Look for "Down" status
```
2. **Attempt to restart node** (if accessible):
```bash
ssh user@<node-ip>
sudo reboot
```
3. **If node is unrecoverable**:
```bash
# Remove from Swarm
docker node rm <node-id> --force
# Services will automatically reschedule to healthy nodes
```
4. **Add replacement node**:
```bash
# On manager node, get join token
docker swarm join-token worker
# On new node, join swarm
docker swarm join --token <token> 192.168.1.196:2377
```
---
## Manager Node Recovery
### Symptoms
- Cannot access Portainer UI
- Swarm commands fail
- DNS services disrupted
### Recovery Steps
1. **Promote a worker to manager** (from another manager if available):
```bash
docker node promote <worker-node-id>
```
2. **Restore from backup**:
```bash
# Stop Docker on failed manager
sudo systemctl stop docker
# Restore Portainer data
restic restore latest --target /tmp/restore
sudo cp -r /tmp/restore/portainer /var/lib/docker/volumes/portainer/_data/
# Start Docker
sudo systemctl start docker
```
3. **Reconfigure DNS** (if Pi-hole affected):
```bash
# Temporarily point router DNS to another Pi-hole instance
# Update router DNS to: 192.168.1.245, 192.168.1.62
```
---
## Storage Failure
### ZFS Pool Failure
#### Symptoms
- `zpool status` shows DEGRADED or FAULTED
- I/O errors in logs
#### Recovery Steps
1. **Check pool status**:
```bash
zpool status tank
```
2. **If disk failed**:
```bash
# Replace failed disk
zpool replace tank /dev/old-disk /dev/new-disk
# Monitor resilver progress
watch zpool status tank
```
3. **If pool is destroyed**:
```bash
# Recreate pool
bash /workspace/homelab/scripts/zfs_setup.sh
# Restore from backup
restic restore latest --target /tank/docker
```
### NAS Failure
#### Recovery Steps
1. **Check NAS connectivity**:
```bash
ping 192.168.1.200
mount | grep /mnt/nas
```
2. **Remount NAS**:
```bash
sudo umount /mnt/nas
sudo mount -a
```
3. **If NAS hardware failed**:
- Services using NAS volumes will fail
- Redeploy services to use local storage temporarily
- Restore NAS from Time Capsule backup
---
## Network Recovery
### Complete Network Outage
#### Recovery Steps
1. **Check physical connections**:
- Verify all cables connected
- Check switch power and status LEDs
- Restart switch
2. **Verify router**:
```bash
ping 192.168.1.1
# If no response, restart router
```
3. **Check VLAN configuration**:
```bash
ip -d link show
# Reapply if needed
bash /workspace/homelab/scripts/vlan_firewall.sh
```
4. **Restart networking**:
```bash
sudo systemctl restart networking
# Or on each node:
sudo reboot
```
### Partial Network Issues
#### DNS Not Resolving
```bash
# Check Pi-hole status
docker ps | grep pihole
# Restart Pi-hole
docker restart <pihole-container>
# Temporarily use public DNS
sudo echo "nameserver 8.8.8.8" > /etc/resolv.conf
```
#### Traefik Not Routing
```bash
# Check Traefik service
docker service ls | grep traefik
docker service ps traefik_traefik
# Check logs
docker service logs traefik_traefik
# Force update
docker service update --force traefik_traefik
```
---
## Complete Disaster Recovery
### Scenario: Total Infrastructure Loss
#### Prerequisites
- Restic backups to Backblaze B2 (off-site)
- Hardware replacement available
- Network infrastructure functional
#### Recovery Steps
1. **Rebuild Core Infrastructure** (2-4 hours):
```bash
# Install base OS on all nodes
# Configure network (static IPs, hostnames)
# Install Docker on all nodes
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Initialize Swarm on manager
docker swarm init --advertise-addr 192.168.1.196
# Join workers
docker swarm join-token worker # Get token
# Run on each worker with token
```
2. **Restore Storage**:
```bash
# Recreate ZFS pool
bash /workspace/homelab/scripts/zfs_setup.sh
# Mount NAS
# Follow: /workspace/homelab/docs/guides/NAS_Mount_Guide.md
```
3. **Restore from Backups**:
```bash
# Install restic
sudo apt-get install restic
# Configure credentials
export B2_ACCOUNT_ID="..."
export B2_ACCOUNT_KEY="..."
export RESTIC_REPOSITORY="b2:bucket:/backups"
export RESTIC_PASSWORD="..."
# List snapshots
restic snapshots
# Restore latest
restic restore latest --target /tmp/restore
# Copy to Docker volumes
sudo cp -r /tmp/restore/* /var/lib/docker/volumes/
```
4. **Redeploy Services**:
```bash
# Deploy all stacks
bash /workspace/homelab/scripts/deploy_all.sh
# Verify deployment
bash /workspace/homelab/scripts/validate_deployment.sh
```
5. **Verify Recovery**:
- Check all services: `docker service ls`
- Test Traefik routing: `curl https://your-domain.com`
- Verify Portainer UI access
- Check Grafana dashboards
- Test Home Assistant
---
## Backup Verification
### Monthly Backup Test
```bash
# List snapshots
restic snapshots
# Verify specific snapshot
restic check --read-data-subset=10%
# Test restore
mkdir /tmp/restore-test
restic restore <snapshot-id> --target /tmp/restore-test --include /path/to/critical/file
# Compare with original
diff -r /tmp/restore-test /original/path
```
---
## Emergency Contacts & Resources
### Critical Information
- **Backblaze B2 Login**: Store credentials in password manager
- **restic Password**: Store securely (CANNOT be recovered)
- **Router Admin**: Keep credentials accessible
- **ISP Support**: Keep contact info handy
### Documentation URLs
- Docker Swarm: https://docs.docker.com/engine/swarm/
- Traefik: https://doc.traefik.io/traefik/
- Restic: https://restic.readthedocs.io/
- ZFS: https://openzfs.github.io/openzfs-docs/
---
## Recovery Checklists
### Pre-Disaster Preparation
- [ ] Verify backups running daily
- [ ] Test restore procedure monthly
- [ ] Document all credentials
- [ ] Keep hardware spares (cables, drives)
- [ ] Maintain off-site config copies
### Post-Recovery Validation
- [ ] All nodes online: `docker node ls`
- [ ] All services running: `docker service ls`
- [ ] Health checks passing: `docker ps --filter health=healthy`
- [ ] DNS resolving correctly
- [ ] Monitoring active (Grafana accessible)
- [ ] Backups resumed: `systemctl status restic-backup.timer`
- [ ] fail2ban protecting: `fail2ban-client status`
- [ ] Network performance normal: `bash network_performance_test.sh`
---
## Automation for Faster Recovery
### Create Recovery USB Drive
```bash
# Copy all scripts and configs
mkdir /mnt/usb/homelab-recovery
cp -r /workspace/homelab/* /mnt/usb/homelab-recovery/
# Include documentation
cp /workspace/homelab/docs/guides/* /mnt/usb/homelab-recovery/docs/
# Store credentials (encrypted)
# Use GPG or similar to encrypt sensitive files
```
### Quick Deploy Script
```bash
# Run from recovery USB
sudo bash /mnt/usb/homelab-recovery/scripts/deploy_all.sh
```
---
This guide should be reviewed and updated quarterly to ensure accuracy.

270
docs/guides/Homelab.md Normal file
View File

@@ -0,0 +1,270 @@
# HOMELAB CONFIGURATION SUMMARY — UPDATED 2025-10-31
## NETWORK INFRASTRUCTURE
Main Router: TP-Link BE9300 (2.5 Gb WAN + 4× 2.5 Gb LAN)
Secondary Router: Linksys WRT3200ACM (OpenWRT)
Managed Switch: TP-Link TL-SG608E (1 Gb)
Additional: Apple AirPort Time Capsule (192.168.1.153)
Backbone Speed: 2.5 Gb core / 1 Gb secondary
DNS Architecture: 3× Pi-hole + 3× Unbound (192.168.1.196, .245, .62) with local recursive forwarding
VPN: Tailscale (Pi 4 as exit node)
Reverse Proxy: Traefik (on .196; planned Swarm takeover)
LAN Subnet: 192.168.1.0/24
Notes: Rate-limit prevention on Pi-hole instances, Unbound local caching to accelerate DNS queries
---
## NODE OVERVIEW
192.168.1.81 — Ryzen 3700X Node
• CPU: AMD 8C/16T
• RAM: 6480 GB Current 2 of 4 3200 32gb 4x8gb 3600 availible
• GPU: RTX 4060 Ti
• Network: 2.5 GbE onboard
• Role: Docker Swarm Worker (label=heavy)
• Function: AI compute (LM Studio, Llama.cpp, OpenWebUI, Ollama planned)
• OS: Windows 11 + WSL2 / Fedora (Dual Boot)
• Notes: Primary compute node for high-performance AI workloads. Both OS installations act as interchangeable swarm nodes with the same label.
192.168.1.57 — Acer Aspire R14 (Proxmox Host)
• CPU: Intel i5-6200U (2C/4T)
---
## NETWORK UPGRADE & VLAN
* **Switch**: Install a 2.5Gb PoE managed switch (e.g., Netgear GS110EMX).
* **VLANs**: Create VLAN10 for management, VLAN20 for services. Add router ACLs to isolate traffic.
* **LACP**: Bond two NICs on the Ryzen node for 5Gb aggregated link.
## STORAGE ENHANCEMENTS
* Deploy a dedicated NAS (e.g., Synology DS920+) with RAID6 and SSD cache.
* On Proxmox host, create ZFS pool `tank` on local SSDs (`zpool create tank /dev/sda /dev/sdb`).
* Mount NAS shares on all nodes (`/mnt/nas`).
* Add cron job to prune unused AI model caches.
## SERVICE CONSOLIDATION & RESILIENCE
* Convert standalone Traefik on Pi4 to a DockerSwarm service with 2 replicas.
* Deploy fallback Caddy on PiZero with a static maintenance page.
* Add healthcheck sidecars to critical containers (Portainer, OpenWebUI).
* Separate persistent volumes per stack (AI models on SSD, Nextcloud on NAS).
## SECURITY HARDENING
* Enable router firewall ACLs for interVLAN traffic (allow only required ports).
* Install `fail2ban` on the manager VM.
* Restrict Portainer UI to VPNonly access and enable 2FA/OAuth.
## MONITORING & AUTOMATION
* Deploy `node-exporter` on Proxmox host.
* Create Grafana alerts for CPU >80%, RAM >85%, disk >80%.
* Add HomeAssistant backup automation to NAS.
* Integrate Tailscale metrics via `tailscale_exporter`.
## OFFSITE BACKUP STRATEGY
* Install `restic` on manager VM and initialise Backblaze B2 repo.
* Daily backup script (`/usr/local/bin/backup_daily.sh`) for HA config, Portainer DB, important volumes.
* Systemd timer to run at 02:00AM.
---
• RAM: 8 GB
• Network: 2.5 GbE via USB adapter
• Role: Proxmox Host
• Function: Virtualization host for Apps VM (.196) and OMV (.70)
• Storage: Local SSDs + OMV shared volumes
• Notes: Lightweight node for VMs and containerized storage services
192.168.1.196 — Apps Manager VM (on Acer Proxmox)
CPU: 4
RAM: 4 GB min 6 GB max
• Role: Docker Swarm Manager (label=manager)
• Function: Pi-hole + Unbound + Portainer UI + Traefik reverse proxy
• Architecture: x86 (virtualized)
• Notes: Central orchestration, DNS control, and reverse proxy; Portainer agent installed for remote swarm management
192.168.1.70 — OMV Instance (on Acer)
CPU 2
RAM: 2 GB min 4 GB max
• Role: Network Attached Storage
• Function: Shared Docker volumes, media, VM backups
• Stack: OpenMediaVault 7.x
• Architecture: x86
• Planned: Receive SMB3-reshares from Time Capsule (.153)
• Storage: Docker volumes for AI models, backup directories, and media
• Notes: Central NAS for swarm and LLM storage
192.168.1.245 — Raspberry Pi 4 (8 GB)
• CPU: ARM Quad-Core
• RAM: 8 GB
• Network: 1 GbE
• Role: Docker Swarm Leader (label=leader)
• Function: Home Assistant OS + Portainer Agent + HAOS-based Unbound (via Ubuntu container)
• Standalone Services: Traefik (currently standalone), HAOS Unbound
• Notes: Central smart home automation hub; swarm leader for container orchestration; plan for Swarm Traefik to take over existing Traefik instance
192.168.1.62 — Raspberry Pi Zero 2 W
• CPU: ARM Quad-Core
• RAM: 512 MB
• Network: 100 Mb Ethernet
• Role: Docker Swarm Worker (label=light)
• Function: Lightweight DNS + Pi-hole + Unbound + auxiliary containers
• Notes: Low-power node for background jobs, DNS redundancy, and monitoring tasks
192.168.1.153 — Apple AirPort Time Capsule
• Network: 1 GbE via WRT3200ACM
• Role: Backup storage and SMB bridge
• Function: Time Machine backups (SMB1)
• Planned: Reshare SMB1 → SMB3 via OMV (.70) for modern clients
• Notes: Source for macOS backups; will integrate into OMV NAS for consolidation
---
## DOCKER SWARM CLUSTER
Leader 192.168.1.245 (Pi 4, label=leader)
Manager 192.168.1.196 (Apps VM, label=manager)
Worker (Fedora) 192.168.1.81 (Ryzen, label=heavy)
Worker (Light) 192.168.1.62 (Pi Zero 2 W, label=light)
Cluster Functions:
• Distributed container orchestration across x86 + ARM
• High-availability DNS via Pi-hole + Unbound replicas
• Unified management and reverse proxy on the manager node
• Specific workload placement using node labels (heavy, leader, manager)
• AI/ML workloads pinned to the 'heavy' node for performance
• General application services pinned to the 'leader' node
• Core services like Traefik and Portainer pinned to the 'manager' node
---
## STACKS
### Networking Stack
**Traefik:** Reverse Proxy
**whoami:** Service for testing Traefik
### Monitoring Stack
**Prometheus:** Metrics collection
**Grafana:** Metrics visualization
**Alertmanager:** Alerting
**Node-exporter:** Node metrics exporter
**cAdvisor:** Container metrics exporter
### Tools Stack
**Portainer:** Swarm Management
**Dozzle:** Log viewing
**Lazydocker:** Terminal UI for Docker
**TSDProxy:** Tailscale Docker Proxy
**Watchtower:** Container Updates
### Application Stack
**OpenWebUI:** AI Frontend
**Paperless-ngx:** Document Management
**Stirling-PDF:** PDF utility
**SearXNG:** Metasearch engine
### Productivity Stack
**Nextcloud:** Cloud storage and collaboration
---
## SERVICES MAP
**Manager Node (.196):**
**Networking Stack:** Traefik
**Monitoring Stack:** Prometheus, Grafana
**Tools Stack:** Portainer, Dozzle, Lazydocker, TSDProxy, Watchtower
**Leader Node (.245):**
**Application Stack:** Paperless-ngx, Stirling-PDF, SearXNG
**Productivity Stack:** Nextcloud
**Heavy Worker Node (.81):**
**Application Stack:** OpenWebUI
**Light Worker Node (.62):**
**Networking Stack:** whoami
**Other Services:**
**VPN:** Tailscale (Pi4 exit node)
**Virtualization:** Proxmox VE (.57)
**Storage:** OMV NAS (.70) + Time Capsule (.153)
---
## STORAGE & BACKUPS
OMV (.70) — shared Docker volumes, LLM models, media, backup directories
Time Capsule (.153) — legacy SMB1 source; planned SMB3 reshare via OMV
External SSDs/HDDs — portable compute, LLM scratch storage, media archives
Time Machine clients — macOS systems
Planned Workflow:
• Mount Time Capsule SMB1 share in OMV via CIFS
• Reshare through OMV Samba as SMB3
• Sync critical backups to OMV and external drives
• AI models stored on NVMe + OMV volumes for high-speed access
---
## PERFORMANCE STRATEGY
• 2.5 Gb backbone: Ryzen (.81) + Acer (.57) nodes
• 1 Gb nodes: Pi 4 (.245) + Time Capsule (.153)
• 100 Mb node: Pi Zero 2 W (.62)
• ARM nodes for low-power/auxiliary tasks
• x86 nodes for AI, storage, and compute-intensive containers
• Swarm resource labeling for workload isolation
• DNS redundancy and rate-limit protection
• Unified monitoring via Portainer + Home Assistant
• GPU-intensive AI containers pinned to Ryzen node for efficiency
• Traefik migration plan: standalone .245 → Swarm-managed cluster routing
---
## NOTES
• Acer Proxmox hosts OMV (.70) and Apps Manager VM (.196)
• Ryzen (.81) dedicated to AI and heavy Docker tasks
• HAOS Pi 4 (.245) leader, automation hub, and temporary standalone Traefik
• DNS load balanced among .62, .196, and .245
• Time Capsule (.153) planned SMB1→SMB3 reshare via OMV
• Network speed distribution: Ryzen/Acer = 2.5 Gb, Pi 4/Time Capsule = 1 Gb, Pi Zero 2 W = 100 Mb
• LLM models stored on high-speed NVMe on Ryzen, backed up to OMV and external drives
• No personal identifiers included in this record
# END CONFIG
---
## SMART HOME INTEGRATION
### LIGHTING & CONTROLS
• Philips Hue
- Devices: Hue remote only (no bulbs)
- Connectivity: Zigbee
- Automation: Integrated into Home Assistant OS (.245)
- Notes: Remote used to trigger HAOS scenes and routines for other smart devices
• Govee Smart Lights & Sensors
- Devices: RGB LED strips, motion sensors, temperature/humidity sensors
- Connectivity: Wi-Fi
- Automation: Home Assistant via MQTT / cloud integration
- Notes: Motion-triggered lighting and environmental monitoring
• TP-Link / Tapo Smart Devices
- Devices: Tapo lightbulbs, Kasa smart power strip
- Connectivity: Wi-Fi
- Automation: Home Assistant + Kasa/Tapo integration
- Notes: Power scheduling and energy monitoring
### AUDIO & VIDEO
• TVs: Multiple 4K Smart TVs
- Platforms: Fire Stick, Apple devices, console inputs
- Connectivity: Ethernet (1 Gb) or Wi-Fi
- Automation: HAOS scenes, volume control, source switching
• Streaming & Consoles:
- Devices: Fire Stick, PS5, Nintendo Switch
- Connectivity: Ethernet or Wi-Fi
- Notes: Automated on/off with Home Assistant, media triggers
### SECURITY & SENSORS
• Vivint Security System
- Devices: Motion detectors, door/window sensors, cameras
- Connectivity: Proprietary protocol + cloud
- Automation: Home Assistant integrations for alerts and scene triggers
• Environmental Sensors
- Devices: Govee temperature/humidity, Tapo sensors
- Connectivity: Wi-Fi
- Automation: Trigger HVAC, lights, or notifications

View File

@@ -0,0 +1,62 @@
# NAS Mount Guide
This guide explains how to mount the dedicated NAS shares on all homelab nodes.
## Prerequisites
- NAS is reachable at `\192.168.1.200` (replace with your NAS IP).
- You have a user account on the NAS with read/write permissions.
- `cifs-utils` is installed on each node (`sudo apt-get install cifs-utils`).
## Mount Point
Create a common mount point on each node:
```bash
sudo mkdir -p /mnt/nas
```
## Credentials File (optional)
Store credentials in a secure file (e.g., `/etc/nas-cred`):
```text
username=your_nas_user
password=your_nas_password
```
Set restrictive permissions:
```bash
sudo chmod 600 /etc/nas-cred
```
## Add to `/etc/fstab`
Append the following line to `/etc/fstab` on each node:
```text
//192.168.1.200/shared /mnt/nas cifs credentials=/etc/nas-cred,iocharset=utf8,vers=3.0 0 0
```
Replace `shared` with the actual share name.
## Mount Immediately
```bash
sudo mount -a
```
Verify:
```bash
df -h | grep /mnt/nas
```
You should see the NAS share listed.
## Docker Volume Example
When deploying services that need persistent storage, reference the NAS mount:
```yaml
volumes:
nas-data:
driver: local
driver_opts:
type: none
o: bind
device: /mnt/nas/your-service-data
```
## Troubleshooting
- **Permission denied** ensure the NAS user has the correct permissions and the credentials file is correct.
- **Mount fails** try specifying a different SMB version (`vers=2.1` or `vers=3.1.1`).
- **Network issues** verify the node can ping the NAS IP.
---
*This guide can be referenced from the updated `Homelab.md` documentation.*

475
docs/guides/OMV.md Normal file
View File

@@ -0,0 +1,475 @@
# OMV Configuration Guide for Docker Swarm Integration
This guide outlines the setup for an OpenMediaVault (OMV) virtual machine and its integration with a Docker Swarm cluster for providing network storage to services like Jellyfin, Nextcloud, Immich, and others.
---
## 1. OMV Virtual Machine Configuration
The OMV instance is configured as a virtual machine with the following specifications:
- **RAM:** 2-4 GB
- **CPU:** 2 Cores
- **System Storage:** 32 GB
- **Data Storage:** A 512GB SATA SSD is passed through directly from the Proxmox host. This SSD is dedicated to network shares.
- **Network:** Static IP address `192.168.1.70` on the `192.168.1.0/24` subnet
---
## 2. Network Share Setup in OMV
The primary purpose of this OMV instance is to serve files to other applications and services on the network, particularly Docker Swarm containers.
### Shared Folders Overview
The following shared folders should be created in OMV (via **Storage → Shared Folders**):
| Folder Name | Purpose | Protocol | Permissions |
|-------------|---------|----------|-------------|
| `Media` | Media files for Jellyfin | SMB | swarm-user: RW |
| `ImmichUploads` | Photo uploads for Immich | NFS | UID 999: RW |
| `TraefikLetsEncrypt` | SSL certificates for Traefik | NFS | Root: RW |
| `ImmichDB` | Immich PostgreSQL database | NFS | Root: RW |
| `NextcloudDB` | Nextcloud PostgreSQL database | NFS | Root: RW |
| `NextcloudApps` | Nextcloud custom apps | NFS | www-data (33): RW |
| `NextcloudConfig` | Nextcloud configuration | NFS | www-data (33): RW |
| `NextcloudData` | Nextcloud user data | NFS | www-data (33): RW |
### SMB (Server Message Block) Shares
SMB is used for services that require file-based media access, particularly for services accessed by multiple platforms (Windows, Linux, macOS).
#### **Media Share**
- **Shared Folder:** `Media`
- **Purpose:** Stores media files for Jellyfin and other media servers
- **SMB Configuration:**
- **Share Name:** `Media`
- **Public:** No (authentication required)
- **Browseable:** Yes
- **Read-only:** No
- **Guest Access:** No
- **Permissions:** `swarm-user` has read/write access
- **Path on OMV:** `/srv/dev-disk-by-uuid-fd2daa6f-bd75-4ac1-9c4c-9e4d4b84d845/Media`
### NFS (Network File System) Shares
NFS is utilized for services requiring block-level access, specific POSIX permissions, or better performance for containerized applications.
#### **Nextcloud Shares**
- **Shared Folders:** `NextcloudApps`, `NextcloudConfig`, `NextcloudData`
- **Purpose:** Application files, configuration, and user data for Nextcloud
- **NFS Configuration:**
- **Client:** `192.168.1.0/24` (Accessible to the entire subnet)
- **Privilege:** Read/Write
- **Extra Options:** `all_squash,anongid=33,anonuid=33,sync,no_subtree_check`
- `all_squash`: Maps all client UIDs/GIDs to anonymous user
- `anonuid=33,anongid=33`: Maps to `www-data` user/group (Nextcloud/Apache/Nginx)
- `sync`: Ensures data is written to disk before acknowledging (data integrity)
- `no_subtree_check`: Improves reliability for directory exports
#### **Database Shares**
- **Shared Folders:** `ImmichDB`, `NextcloudDB`
- **Purpose:** PostgreSQL database storage for Immich and Nextcloud
- **NFS Configuration:**
- **Client:** `192.168.1.0/24`
- **Privilege:** Read/Write
- **Extra Options:** `rw,sync,no_subtree_check,no_root_squash`
- `no_root_squash`: Allows root on client to be treated as root on server (needed for database operations)
- `sync`: Critical for database integrity
#### **Application Data Shares**
- **Shared Folder:** `ImmichUploads`
- **Purpose:** Photo and video uploads for Immich
- **NFS Configuration:**
- **Client:** `192.168.1.0/24`
- **Privilege:** Read/Write
- **Extra Options:** `rw,sync,no_subtree_check,all_squash,anonuid=999,anongid=999`
- Maps to Immich's internal user (typically UID/GID 999)
- **Shared Folder:** `TraefikLetsEncrypt`
- **Purpose:** SSL certificate storage for Traefik reverse proxy
- **NFS Configuration:**
- **Client:** `192.168.1.0/24`
- **Privilege:** Read/Write
- **Extra Options:** `rw,sync,no_subtree_check,no_root_squash`
---
## 3. Integrating OMV Shares with Docker Swarm Services
To use the OMV network shares with Docker Swarm services, the shares must be mounted on the Docker worker nodes where the service containers will run. The mounted path on the node is then passed into the container as a volume.
### Prerequisites on Docker Nodes
All Docker nodes that will mount shares need the appropriate client utilities installed:
```bash
# For SMB shares
sudo apt-get update
sudo apt-get install cifs-utils
# For NFS shares
sudo apt-get update
sudo apt-get install nfs-common
```
---
### Example 1: Jellyfin Media Access via SMB
Jellyfin, running as a Docker Swarm service, requires access to the media files stored on the OMV `Media` share.
#### **Step 1: Create SMB Credentials File**
Create a credentials file on the Docker node to avoid storing passwords in `/etc/fstab`:
```bash
# Create credentials file
sudo nano /root/.smbcredentials
```
Add the following content:
```
username=swarm-user
password=YOUR_PASSWORD_HERE
```
Secure the file:
```bash
sudo chmod 600 /root/.smbcredentials
```
#### **Step 2: Mount the SMB Share on the Docker Node**
```bash
# Create mount point
sudo mkdir -p /mnt/media
# Test the mount first
sudo mount -t cifs //192.168.1.70/Media /mnt/media -o credentials=/root/.smbcredentials,iocharset=utf8,vers=3.0
# Verify it works
ls -la /mnt/media
# Unmount test
sudo umount /mnt/media
```
#### **Step 3: Add Permanent Mount to `/etc/fstab`**
```bash
sudo nano /etc/fstab
```
Add this line:
```
//192.168.1.70/Media /mnt/media cifs credentials=/root/.smbcredentials,iocharset=utf8,vers=3.0,file_mode=0755,dir_mode=0755 0 0
```
Mount all entries:
```bash
sudo mount -a
```
#### **Step 4: Configure the Jellyfin Docker Swarm Service**
In the Docker Compose YAML file for your Jellyfin service:
```yaml
services:
jellyfin:
image: jellyfin/jellyfin:latest
volumes:
- /mnt/media:/media:ro # Read-only access to prevent accidental deletion
deploy:
placement:
constraints:
- node.labels.media==true # Deploy only on nodes with media mount
# ... other configurations
```
---
### Example 2: Nextcloud Data Access via NFS
Nextcloud, running as a Docker Swarm service, requires access to its application, configuration, and data files stored on the OMV NFS shares.
#### **Step 1: Create Mount Points**
```bash
sudo mkdir -p /mnt/nextcloud/{apps,config,data}
```
#### **Step 2: Test NFS Mounts**
```bash
# Test each mount
sudo mount -t nfs 192.168.1.70:/NextcloudApps /mnt/nextcloud/apps -o vers=4.2
sudo mount -t nfs 192.168.1.70:/NextcloudConfig /mnt/nextcloud/config -o vers=4.2
sudo mount -t nfs 192.168.1.70:/NextcloudData /mnt/nextcloud/data -o vers=4.2
# Verify
ls -la /mnt/nextcloud/apps
ls -la /mnt/nextcloud/config
ls -la /mnt/nextcloud/data
# Unmount tests
sudo umount /mnt/nextcloud/apps
sudo umount /mnt/nextcloud/config
sudo umount /mnt/nextcloud/data
```
#### **Step 3: Add Permanent Mounts to `/etc/fstab`**
```bash
sudo nano /etc/fstab
```
Add these lines:
```
192.168.1.70:/NextcloudApps /mnt/nextcloud/apps nfs auto,nofail,noatime,rw,vers=4.2,all_squash,anongid=33,anonuid=33 0 0
192.168.1.70:/NextcloudConfig /mnt/nextcloud/config nfs auto,nofail,noatime,rw,vers=4.2,all_squash,anongid=33,anonuid=33 0 0
192.168.1.70:/NextcloudData /mnt/nextcloud/data nfs auto,nofail,noatime,rw,vers=4.2,all_squash,anongid=33,anonuid=33 0 0
```
**Mount Options Explained:**
- `auto`: Mount at boot
- `nofail`: Don't fail boot if mount fails
- `noatime`: Don't update access times (performance)
- `rw`: Read-write
- `vers=4.2`: Use NFSv4.2 (better performance and security)
- `all_squash,anongid=33,anonuid=33`: Map all users to www-data
Mount all entries:
```bash
sudo mount -a
```
#### **Step 4: Configure the Nextcloud Docker Swarm Service**
```yaml
services:
nextcloud:
image: nextcloud:latest
volumes:
- /mnt/nextcloud/apps:/var/www/html/custom_apps
- /mnt/nextcloud/config:/var/www/html/config
- /mnt/nextcloud/data:/var/www/html/data
deploy:
placement:
constraints:
- node.labels.nextcloud==true
# ... other configurations
```
---
### Example 3: Database Storage via NFS
For stateful services like databases, storing their data on a resilient network share is critical for data integrity and high availability.
#### **Step 1: Create Mount Points**
```bash
sudo mkdir -p /mnt/database/{immich,nextcloud}
```
#### **Step 2: Test NFS Mounts**
```bash
# Test mounts
sudo mount -t nfs 192.168.1.70:/ImmichDB /mnt/database/immich -o vers=4.2
sudo mount -t nfs 192.168.1.70:/NextcloudDB /mnt/database/nextcloud -o vers=4.2
# Verify
ls -la /mnt/database/immich
ls -la /mnt/database/nextcloud
# Unmount tests
sudo umount /mnt/database/immich
sudo umount /mnt/database/nextcloud
```
#### **Step 3: Add Permanent Mounts to `/etc/fstab`**
```bash
sudo nano /etc/fstab
```
Add these lines:
```
192.168.1.70:/ImmichDB /mnt/database/immich nfs auto,nofail,noatime,rw,vers=4.2,sync,no_subtree_check,no_root_squash 0 0
192.168.1.70:/NextcloudDB /mnt/database/nextcloud nfs auto,nofail,noatime,rw,vers=4.2,sync,no_subtree_check,no_root_squash 0 0
```
**Critical for Databases:**
- `sync`: Ensures writes are committed to disk before acknowledgment (prevents data corruption)
- `no_root_squash`: Allows database containers running as root to maintain proper permissions
Mount all entries:
```bash
sudo mount -a
```
#### **Step 4: Configure Database Docker Swarm Services**
**Immich Database:**
```yaml
services:
immich-db:
image: tensorchord/pgvecto-rs:pg14-v0.2.0
volumes:
- /mnt/database/immich:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_USER: immich
POSTGRES_DB: immich
deploy:
placement:
constraints:
- node.labels.database==true
```
**Nextcloud Database:**
```yaml
services:
nextcloud-db:
image: postgres:15-alpine
volumes:
- /mnt/database/nextcloud:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_USER: nextcloud
POSTGRES_DB: nextcloud
deploy:
placement:
constraints:
- node.labels.database==true
```
---
### Example 4: Immich Upload Storage via NFS
```bash
# Create mount point
sudo mkdir -p /mnt/immich/uploads
# Add to /etc/fstab
192.168.1.70:/ImmichUploads /mnt/immich/uploads nfs auto,nofail,noatime,rw,vers=4.2,sync,no_subtree_check,all_squash,anonuid=999,anongid=999 0 0
# Mount
sudo mount -a
```
**Docker Service:**
```yaml
services:
immich-server:
image: ghcr.io/immich-app/immich-server:release
volumes:
- /mnt/immich/uploads:/usr/src/app/upload
# ... other configurations
```
---
### Example 5: Traefik Certificate Storage via NFS
```bash
# Create mount point
sudo mkdir -p /mnt/traefik/letsencrypt
# Add to /etc/fstab
192.168.1.70:/TraefikLetsEncrypt /mnt/traefik/letsencrypt nfs auto,nofail,noatime,rw,vers=4.2,sync,no_subtree_check,no_root_squash 0 0
# Mount
sudo mount -a
```
**Docker Service:**
```yaml
services:
traefik:
image: traefik:latest
volumes:
- /mnt/traefik/letsencrypt:/letsencrypt
# ... other configurations
```
---
## 4. Best Practices and Recommendations
### Security
1. **Use dedicated service accounts** with minimal required permissions
2. **Secure credential files** with `chmod 600`
3. **Limit NFS exports** to specific subnets or IPs when possible
4. **Use NFSv4.2** for improved security and performance
### Reliability
1. **Use `nofail` in fstab** to prevent boot failures if NFS is unavailable
2. **Test mounts manually** before adding to fstab
3. **Monitor NFS/SMB services** on OMV server
4. **Regular backups** of configuration and data
### Performance
1. **Use NFS for containerized applications** (better performance than SMB)
2. **Use `noatime`** to reduce write operations
3. **Use `sync` for databases** to ensure data integrity
4. **Consider `async` for media files** if performance is critical (with backup strategy)
### Verification Commands
```bash
# Check all mounts
mount | grep -E 'nfs|cifs'
# Check NFS statistics
nfsstat -m
# Test write permissions
touch /mnt/media/test.txt && rm /mnt/media/test.txt
# Check OMV exports (from OMV server)
sudo exportfs -v
# Check SMB status (from OMV server)
sudo smbstatus
```
---
## 5. Troubleshooting
### Issue: Mount hangs at boot
**Solution:** Add `nofail` option to fstab entries
### Issue: Permission denied errors
**Solution:**
- Verify UID/GID mappings match between NFS options and container user
- Check folder permissions on OMV server
- Ensure `no_root_squash` is set for services requiring root access
### Issue: Stale NFS handles
**Solution:**
```bash
# Unmount forcefully
sudo umount -f /mnt/path
# Or lazy unmount
sudo umount -l /mnt/path
# Restart NFS client
sudo systemctl restart nfs-client.target
```
### Issue: SMB connection refused
**Solution:**
- Verify SMB credentials
- Check SMB service status on OMV: `sudo systemctl status smbd`
- Verify firewall rules allow SMB traffic (ports 445, 139)
---
Your OMV server is now fully integrated with your Docker Swarm cluster, providing robust, centralized storage for all your containerized services.

View File

@@ -0,0 +1,238 @@
# OMV Command-Line (CLI) Setup Guide for Docker Swarm
This guide provides the necessary commands to configure OpenMediaVault (OMV) from the CLI for user management and to apply service configurations. For creating shared folders and configuring NFS/SMB shares, the **OpenMediaVault Web UI is the recommended and most robust approach** to ensure proper integration with OMV's internal database.
**Disclaimer:** While these commands are effective, making configuration changes via the CLI can be less intuitive than the Web UI. Always ensure you have backups. It's recommended to have a basic understanding of the OMV configuration database.
---
## **Phase 1: Initial Setup (User and Filesystem Identification)**
### **Step 1: Create the Swarm User**
First, create a dedicated user for your Swarm mounts.
```bash
# Create the user 'swarm-user'
sudo useradd -m swarm-user
# Set a password for the new user (you will be prompted)
sudo passwd swarm-user
# Get the UID and GID for later use
id swarm-user
# Example output: uid=1001(swarm-user) gid=1001(swarm-user)
```
### **Step 2: Identify Your Storage Drive**
You need the filesystem path for your storage drive. This is where the shared folders will be created.
```bash
# List mounted filesystems managed by OMV
sudo omv-show-fs
```
Look for your 512GB SSD and note its mount path (e.g., `/srv/dev-disk-by-uuid-fd2daa6f-bd75-4ac1-9c4c-9e4d4b84d845`). We will refer to this as `YOUR_MOUNT_PATH` for the rest of the guide.
---
## **Phase 2: Shared Folder and Service Configuration**
For creating shared folders and configuring services, you have two primary methods: the OMV Web UI (recommended for most users) and the `omv-rpc` command-line tool (for advanced users or scripting).
### **Method 1: OMV Web UI (Recommended)**
The safest and most straightforward way to configure OMV is through its web interface.
1. **Create Shared Folders:** Navigate to **Storage → Shared Folders** and create the new folders required for the Swarm integration:
* `ImmichUploads`
* `TraefikLetsEncrypt`
* `ImmichDB`
* `NextcloudDB`
* `NextcloudApps`
* `NextcloudConfig`
* `NextcloudData`
* `Media`
2. **Configure Permissions:** For each folder, set appropriate permissions:
* Navigate to **Storage → Shared Folders**, select a folder, click **Permissions**
* Add `swarm-user` with appropriate read/write permissions
* For database folders, ensure proper ownership (typically root or specific service user)
3. **Configure Services:**
* **For SMB:** Navigate to **Services → SMB/CIFS → Shares** and create shares for folders that need SMB access
* **For NFS:** Navigate to **Services → NFS → Shares** and create shares with appropriate client and privilege settings
### **Method 2: Advanced CLI Method (`omv-rpc`)**
This is the correct and verified method for creating shared folders from the command line in OMV 6 and 7.
#### **Step 3.1: Get the Storage UUID**
First, you must get the internal UUID that OMV uses for your storage drive.
```bash
# List all filesystems and their properties known to OMV
sudo omv-rpc "FileSystemMgmt" "enumerateFilesystems" '{}'
```
From the JSON output, find the object where the `devicefile` or `label` matches your drive. Copy the `uuid` value from that object. It will be a long string like `7f450873-134a-429c-9198-097a5293209f`.
#### **Step 3.2: Create the Shared Folders (CLI)**
**IMPORTANT:** The correct method for OMV 6+ uses the `ShareMgmt` service, not direct config manipulation.
```bash
# Set your storage UUID (replace with actual UUID from Step 3.1)
OMV_STORAGE_UUID="7f450873-134a-429c-9198-097a5293209f"
# Create shared folders using ShareMgmt service
sudo omv-rpc ShareMgmt setSharedFolder "{\"uuid\":\"$(uuidgen)\",\"name\":\"ImmichUploads\",\"mntentref\":\"${OMV_STORAGE_UUID}\",\"reldirpath\":\"ImmichUploads/\",\"comment\":\"Immich Uploads Storage\",\"permissions\":\"755\"}"
sudo omv-rpc ShareMgmt setSharedFolder "{\"uuid\":\"$(uuidgen)\",\"name\":\"TraefikLetsEncrypt\",\"mntentref\":\"${OMV_STORAGE_UUID}\",\"reldirpath\":\"TraefikLetsEncrypt/\",\"comment\":\"Traefik SSL Certificates\",\"permissions\":\"755\"}"
sudo omv-rpc ShareMgmt setSharedFolder "{\"uuid\":\"$(uuidgen)\",\"name\":\"ImmichDB\",\"mntentref\":\"${OMV_STORAGE_UUID}\",\"reldirpath\":\"ImmichDB/\",\"comment\":\"Immich Database Storage\",\"permissions\":\"700\"}"
sudo omv-rpc ShareMgmt setSharedFolder "{\"uuid\":\"$(uuidgen)\",\"name\":\"NextcloudDB\",\"mntentref\":\"${OMV_STORAGE_UUID}\",\"reldirpath\":\"NextcloudDB/\",\"comment\":\"Nextcloud Database Storage\",\"permissions\":\"700\"}"
sudo omv-rpc ShareMgmt setSharedFolder "{\"uuid\":\"$(uuidgen)\",\"name\":\"NextcloudApps\",\"mntentref\":\"${OMV_STORAGE_UUID}\",\"reldirpath\":\"NextcloudApps/\",\"comment\":\"Nextcloud Apps\",\"permissions\":\"755\"}"
sudo omv-rpc ShareMgmt setSharedFolder "{\"uuid\":\"$(uuidgen)\",\"name\":\"NextcloudConfig\",\"mntentref\":\"${OMV_STORAGE_UUID}\",\"reldirpath\":\"NextcloudConfig/\",\"comment\":\"Nextcloud Config\",\"permissions\":\"755\"}"
sudo omv-rpc ShareMgmt setSharedFolder "{\"uuid\":\"$(uuidgen)\",\"name\":\"NextcloudData\",\"mntentref\":\"${OMV_STORAGE_UUID}\",\"reldirpath\":\"NextcloudData/\",\"comment\":\"Nextcloud User Data\",\"permissions\":\"755\"}"
sudo omv-rpc ShareMgmt setSharedFolder "{\"uuid\":\"$(uuidgen)\",\"name\":\"Media\",\"mntentref\":\"${OMV_STORAGE_UUID}\",\"reldirpath\":\"Media/\",\"comment\":\"Media Files for Jellyfin\",\"permissions\":\"755\"}"
```
#### **Step 3.3: Verify Shared Folders Were Created**
```bash
# List all shared folders
sudo omv-rpc ShareMgmt getSharedFoldersList '{"start":0,"limit":25}'
# Or use the simpler command
omv-showkey conf.system.sharedfolder
```
#### **Step 3.4: Set Folder Permissions (CLI)**
After creating folders, set proper ownership and permissions on the actual directories:
```bash
# Replace with your actual mount path
MOUNT_PATH="/srv/dev-disk-by-uuid-fd2daa6f-bd75-4ac1-9c4c-9e4d4b84d845"
# Get swarm-user UID and GID (noted from Step 1)
SWARM_UID=1001 # Replace with actual UID
SWARM_GID=1001 # Replace with actual GID
# Set ownership for media folders
sudo chown -R ${SWARM_UID}:${SWARM_GID} "${MOUNT_PATH}/Media"
sudo chown -R ${SWARM_UID}:${SWARM_GID} "${MOUNT_PATH}/ImmichUploads"
# Database folders should be owned by root with restricted permissions
sudo chown -R root:root "${MOUNT_PATH}/ImmichDB"
sudo chown -R root:root "${MOUNT_PATH}/NextcloudDB"
sudo chmod 700 "${MOUNT_PATH}/ImmichDB"
sudo chmod 700 "${MOUNT_PATH}/NextcloudDB"
# Nextcloud folders should use www-data (UID 33, GID 33)
sudo chown -R 33:33 "${MOUNT_PATH}/NextcloudApps"
sudo chown -R 33:33 "${MOUNT_PATH}/NextcloudConfig"
sudo chown -R 33:33 "${MOUNT_PATH}/NextcloudData"
# Traefik folder
sudo chown -R root:root "${MOUNT_PATH}/TraefikLetsEncrypt"
sudo chmod 700 "${MOUNT_PATH}/TraefikLetsEncrypt"
```
#### **Step 3.5: Configure NFS Shares (CLI)**
**Note:** Configuring NFS shares via CLI is complex. The Web UI is strongly recommended. However, if needed:
```bash
# Get the shared folder UUIDs first
sudo omv-rpc ShareMgmt getSharedFoldersList '{"start":0,"limit":25}' | grep -A5 "ImmichDB"
# Example NFS share creation (requires the shared folder UUID)
# Replace SHAREDFOLDER_UUID with the actual UUID from above
sudo omv-rpc Nfs setShare "{\"uuid\":\"$(uuidgen)\",\"sharedfolderref\":\"SHAREDFOLDER_UUID\",\"client\":\"192.168.1.0/24\",\"options\":\"rw,sync,no_subtree_check,no_root_squash\",\"comment\":\"\"}"
```
**This is error-prone. Use the Web UI for NFS/SMB configuration.**
---
## **Phase 3: Apply Configuration Changes**
### **Step 4: Apply All OMV Configuration Changes**
After making all shared folder and service configurations, apply the changes:
```bash
# Apply shared folder configuration
sudo omv-salt deploy run sharedfolder
# Apply the SMB configuration (if SMB shares were configured)
sudo omv-salt deploy run samba
# Apply the NFS configuration (if NFS shares were configured)
sudo omv-salt deploy run nfs
# Apply general OMV configuration changes
sudo omv-salt deploy run phpfpm nginx
# Restart services to ensure all changes take effect
sudo systemctl restart nfs-kernel-server
sudo systemctl restart smbd
```
### **Step 5: Verify Services are Running**
```bash
# Check NFS status
sudo systemctl status nfs-kernel-server
# Check SMB status
sudo systemctl status smbd
# List active NFS exports
sudo exportfs -v
# List SMB shares
sudo smbstatus --shares
```
---
## **Troubleshooting**
### Check OMV Logs
```bash
# General OMV logs
sudo journalctl -u openmediavault-engined -f
# NFS logs
sudo journalctl -u nfs-kernel-server -f
# SMB logs
sudo journalctl -u smbd -f
```
### Verify Mount Points on Docker Nodes
After setting up OMV, verify that Docker nodes can access the shares:
```bash
# Test NFS mount
sudo mount -t nfs 192.168.1.70:/ImmichDB /mnt/test
# Test SMB mount
sudo mount -t cifs //192.168.1.70/Media /mnt/test -o credentials=/root/.smbcredentials
# Unmount test
sudo umount /mnt/test
```
---
Your OMV server is now fully configured to provide the necessary shares for your Docker Swarm cluster. You can now proceed with configuring the mounts on your Swarm nodes as outlined in the main `OMV.md` guide.

View File

@@ -0,0 +1,295 @@
# Docker Swarm Stack Migration Guide
## Overview
This guide helps you safely migrate from the old stack configurations to the new fixed versions with Docker secrets, health checks, and improved reliability.
## ⚠️ IMPORTANT: Read Before Starting
- **Backup first**: `docker service ls > services-backup.txt`
- **Downtime**: Expect 2-5 minutes per stack during migration
- **Secrets**: Must be created before deploying new stacks
- **Order matters**: Follow the deployment sequence below
---
## Pre-Migration Checklist
- [ ] Review [SWARM_STACK_REVIEW.md](file:///workspace/homelab/docs/reviews/SWARM_STACK_REVIEW.md)
- [ ] Backup current service configurations
- [ ] Ensure you're on a Swarm manager node
- [ ] Have strong passwords ready for secrets
- [ ] Test with one non-critical stack first
---
## Step 1: Create Docker Secrets
**Run the secrets creation script:**
```bash
sudo bash /workspace/homelab/scripts/create_docker_secrets.sh
```
**You'll be prompted for:**
- `paperless_db_password` - Strong password for Paperless DB (20+ chars)
- `paperless_secret_key` - Django secret key (50+ random chars)
- `grafana_admin_password` - Grafana admin password
- `duckdns_token` - Your DuckDNS API token
**Generate secure secrets:**
```bash
# PostgreSQL password (20 chars)
openssl rand -base64 20
# Django secret key (50 chars)
openssl rand -base64 50 | tr -d '\n'
```
**Verify secrets created:**
```bash
docker secret ls
```
---
## Step 2: Migration Sequence
### Phase 1: Infrastructure Stack (Watchtower & TSDProxy)
> **Note for HAOS Users**: This stack uses named volumes `tsdproxy_config` and `tsdproxy_data` instead of bind mounts to avoid read-only filesystem errors.
```bash
# Remove old full stack if running
docker stack rm full-stack
# Deploy infrastructure
docker stack deploy -c /workspace/homelab/services/swarm/stacks/infrastructure.yml infrastructure
# Verify
docker service ls | grep infrastructure
```
**What Changed:**
- ✅ Split from monolithic stack
- ✅ TSDProxy uses named volumes (HAOS compatible)
- ✅ Watchtower configured for daily cleanup
-**Added Komodo** (Core, Mongo, Periphery) for container management
---
### Phase 2: Productivity Stack (Paperless, PDF, Search)
```bash
# Ensure secrets exist first!
docker stack deploy -c /workspace/homelab/services/swarm/stacks/productivity.yml productivity
```
**What Changed:**
- ✅ Split from monolithic stack
- ✅ Uses existing secrets and networks
- ✅ Dedicated stack for document tools
---
### Phase 3: AI Stack (OpenWebUI)
```bash
docker stack deploy -c /workspace/homelab/services/swarm/stacks/ai.yml ai
```
**What Changed:**
- ✅ Dedicated stack for AI workloads
- ✅ Resource limits preserved
---
### Phase 4: Other Stacks (Monitoring, Portainer, Networking)
Follow the original instructions for these stacks as they remain unchanged.
---
## HAOS Specific Notes
If you are running on Home Assistant OS (HAOS), the root filesystem is read-only.
- **Do not use bind mounts** to paths like `/srv`, `/home`, or `/etc` (except `/etc/localtime`).
- **Use named volumes** for persistent data.
- **TSDProxy Config**: Since we switched to a named volume `tsdproxy_config`, you may need to populate it if you have a custom config.
```bash
# Example: Copy config to volume (run on manager)
# Find the volume path (might be difficult on HAOS, easier to use `docker cp` to a dummy container mounting the volume)
```
---
## Step 3: Post-Migration Validation
### Automated Validation
```bash
bash /workspace/homelab/scripts/validate_deployment.sh
```
### Manual Checks
```bash
# 1. All services running
docker service ls
# 2. All containers healthy
docker ps --filter "health=healthy"
# 3. No unhealthy containers
docker ps --filter "health=unhealthy"
# 4. Check secrets in use
docker secret ls
# 5. Verify resource usage
docker stats --no-stream
```
### Test Each Service
- ✅ Grafana: https://grafana.sj98.duckdns.org
- ✅ Prometheus: https://prometheus.sj98.duckdns.org
- ✅ Portainer: https://portainer.sj98.duckdns.org
- ✅ Paperless: https://paperless.sj98.duckdns.org
- ✅ OpenWebUI: https://ai.sj98.duckdns.org
- ✅ PDF: https://pdf.sj98.duckdns.org
- ✅ Search: https://search.sj98.duckdns.org
- ✅ Dozzle: https://dozzle.sj98.duckdns.org
---
## Troubleshooting
### Services Won't Start
```bash
# Check logs
docker service logs <service_name>
# Check secrets
docker secret inspect <secret_name>
# Check constraints
docker node ls
docker node inspect <node_id> | grep Labels
```
### Health Checks Failing
```bash
# View health status
docker inspect <container_id> | jq '.[0].State.Health'
# Check logs
docker logs <container_id>
# Disable health check temporarily (for debugging)
# Edit stack file and remove healthcheck section
```
### Secrets Not Found
```bash
# Recreate secret
echo -n "your_password" | docker secret create secret_name -
# Update service
docker service update --secret-add secret_name service_name
```
### Memory Limits Too Strict
```bash
# If services are being killed, increase limits in stack file
# Then redeploy:
docker stack deploy -c stack.yml stack_name
```
---
## Rollback Procedures
### Rollback Single Service
```bash
# Get previous version
docker service inspect <service_name> --pretty
# Rollback
docker service rollback <service_name>
```
### Rollback Entire Stack
```bash
# Remove new stack
docker stack rm <stack_name>
sleep 30
# Deploy from backup (old stack file)
docker stack deploy -c /path/to/old/stack.yml stack_name
```
### Remove Secrets (if needed)
```bash
# This only works if no services are using the secret
docker secret rm <secret_name>
```
---
## Performance Comparison
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| **Security Score** | 6.0/10 | 9.5/10 | +58% |
| **Hardcoded Secrets** | 3 | 0 | ✅ Fixed |
| **Services with Health Checks** | 0 | 100% | ✅ Added |
| **Services with Restart Policies** | 10% | 100% | ✅ Added |
| **Traefik Replicas** | 1 | 2 | ✅ HA |
| **Memory on Pi 4** | 6GB+ | 4.5GB | -25% |
| **Log Disk Usage Risk** | High | Low | ✅ Limits |
| **Services with Pinned Versions** | 60% | 100% | ✅ Stable |
---
## Maintenance
### Update a Secret
```bash
# 1. Create new secret with different name
echo -n "new_password" | docker secret create paperless_db_password_v2 -
# 2. Update service to use new secret
docker service update \
--secret-rm paperless_db_password \
--secret-add source=paperless_db_password_v2,target=paperless_db_password \
full-stack_paperless
# 3. Remove old secret
docker secret rm paperless_db_password
```
### Regular Health Checks
```bash
# Weekly check
bash /workspace/homelab/scripts/quick_status.sh
# Monthly validation
bash /workspace/homelab/scripts/validate_deployment.sh
```
---
## Summary
### Total Changes
- **6 stack files fixed**
- **3 Docker secrets created**
- **100% of services** now have health checks
- **100% of services** now have restart policies
- **100% of services** now have logging limits
- **0 hardcoded passwords** remaining
- **2× Traefik replicas** for high availability
### Estimated Migration Time
- Secrets creation: 5 minutes
- Stack-by-stack migration: 20-30 minutes
- Validation: 10 minutes
- **Total: 35-45 minutes**
---
**Migration completed successfully?** Run the quick status:
```bash
bash /workspace/homelab/scripts/quick_status.sh
```

View File

@@ -0,0 +1,13 @@
# Swarm Migration from HAOS to Ubuntu Container
## Reason for Migration
The Docker Swarm leader node was previously running on the Home Assistant OS (HAOS). This caused conflicts with HAOS, which also utilizes Docker. To resolve these conflicts and create a more stable environment, the swarm was dismantled and recreated.
## New Architecture
The Docker Swarm is now running within a dedicated Ubuntu container on the same HAOS machine. This isolates the swarm environment from the HAOS Docker environment, preventing future conflicts.
## Consequences
As a result of this migration, the old swarm was destroyed. This action necessitated the redeployment of all stacks and services, including Portainer and Traefik. The disconnection of the Portainer UI and the broken Traefik dashboard are direct consequences of this necessary migration. The services need to be redeployed on the new swarm to restore functionality.

View File

@@ -0,0 +1,77 @@
# Health Check Examples for Docker Compose/Swarm
## Example 1: Portainer with Health Check
```yaml
version: '3.8'
services:
portainer:
image: portainer/portainer-ce:latest
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:9000/api/status"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
deploy:
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
```
## Example 2: OpenWebUI with Health Check
```yaml
version: '3.8'
services:
openwebui:
image: ghcr.io/open-webui/open-webui:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
deploy:
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
```
## Example 3: Nextcloud with Health Check
```yaml
version: '3.8'
services:
nextcloud:
image: nextcloud:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:80/status.php"]
interval: 60s
timeout: 10s
retries: 3
start_period: 120s
deploy:
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
```
## Implementation Notes
- **interval**: How often to check (30-60s for most services)
- **timeout**: Max time to wait for check to complete
- **retries**: Number of consecutive failures before marking unhealthy
- **start_period**: Grace period after container start before checking
## Auto-Restart Configuration
All services should have restart policies configured:
- **condition**: `on-failure` or `any`
- **delay**: Time to wait before restarting
- **max_attempts**: Maximum restart attempts
## Monitoring Health Status
Check container health with:
```bash
docker ps --filter "health=unhealthy"
docker inspect <container_id> | jq '.[0].State.Health'
```

View File

@@ -0,0 +1,33 @@
# Fixing Portainer Error: "The environment named local is unreachable"
## Problem
After migrating the Docker Swarm to an Ubuntu container, the Portainer UI shows the error "The environment named local is unreachable".
## Cause
This error means the Portainer server container cannot communicate with the Docker daemon it is supposed to manage. This communication happens through the Docker socket file, located at `/var/run/docker.sock`.
In your nested environment (HAOS > Ubuntu Container > Portainer Container), the issue is almost certainly that the user inside the Portainer container does not have the necessary file permissions to access the `/var/run/docker.sock` file that belongs to the Ubuntu container's Docker instance.
## Solution (To be performed in your deployment environment)
You need to ensure the Portainer container runs with a user that has permission to access the Docker socket.
**1. Find the Docker Group ID:**
First, SSH into your Ubuntu container that is running the swarm. Then, run this command to find the group ID (`gid`) that owns the Docker socket:
```bash
stat -c '%g' /var/run/docker.sock
```
This will return a number. This is the `DOCKER_GROUP_ID`.
**2. Edit the `portainer-stack.yml`:**
You need to add a `user` directive to the `portainer` service definition in your `portainer-stack.yml` file. This tells the service to run as the `root` user and with the Docker group, granting it the necessary permissions.
I will make this edit for you now, using a placeholder for the group ID. **You will need to replace `DOCKER_GROUP_ID_HERE` with the number you get from the command above before you deploy.**
This is the most common and secure way to resolve this issue without granting full `privileged` access.

View File

@@ -0,0 +1,39 @@
# Proxmox USB Network Adapter Fix
This document outlines a solution to the intermittent network disconnection issue on the Acer Proxmox host, where the USB network adapter drops its connection and does not reconnect automatically.
## The Problem
The Acer Proxmox host (`192.168.1.57`) uses a USB-to-Ethernet adapter for its 2.5 GbE connection. This adapter occasionally disconnects and fails to reconnect on its own, disrupting network access for the host and its VMs.
## The Solution
A shell script, `network_check.sh`, has been created to monitor the network connection. If the connection is down, the script will attempt to reset the USB adapter. If that fails, it will reboot the host to restore connectivity. This script is intended to be run as a cron job at regular intervals.
### 1. The `network_check.sh` Script
The script performs the following actions:
1. Pings a reliable external IP address (e.g., `8.8.8.8`) to check for internet connectivity.
2. If the ping fails, it identifies the USB network adapter's bus and device number.
3. It then attempts to reset the USB device.
4. If the network connection is still not restored after resetting the adapter, the script will force a reboot.
The script is located at `/usr/local/bin/network_check.sh`.
### 2. Cron Job Setup
To automate the execution of the script, a cron job should be set up to run every 5 minutes.
**To add the cron job, follow these steps:**
1. Open the crontab editor:
```bash
crontab -e
```
2. Add the following line to the file:
```
*/5 * * * * /bin/bash /usr/local/bin/network_check.sh
```
3. Save and exit the editor.
This will ensure that the network connection is checked every 5 minutes, and the appropriate action is taken if a disconnection is detected.

View File

@@ -0,0 +1,44 @@
# Docker Swarm Node Labeling Guide
This guide provides the commands to apply the correct labels to your Docker Swarm nodes, ensuring that services are scheduled on the appropriate hardware.
Run the following commands in your terminal on a manager node to label each of your swarm nodes.
### 1. Label the Leader Node
This node will run general-purpose applications.
```bash
docker node update --label-add leader=true <node-name>
```
### 2. Label the Manager Node
This node will run core services like Traefik and Portainer.
```bash
docker node update --label-add manager=true <node-name>
```
### 3. Label the Heavy Worker Node
This node is for computationally intensive workloads like AI and machine learning.
```bash
docker node update --label-add heavy=true <node-name>
```
### 4. Label the Fedora Worker Node
This node is the primary heavy worker.
```bash
docker node update --label-add heavy=true fedora
```
## Verify Labels
After applying the labels, you can verify them by inspecting each node. For example, to check the labels for a node, run:
```bash
docker node inspect <node-name> --pretty
```
Look for the "Labels" section in the output to confirm the changes.

View File

@@ -0,0 +1,283 @@
# Final Traefik v3 Setup and Fix Guide
This guide provides the complete, step-by-step process to cleanly remove any old Traefik configurations and deploy a fresh, working Traefik v3 setup on Docker Swarm.
**Follow these steps in order on your Docker Swarm manager node.**
---
### Step 1: Complete Removal of Old Traefik Components
First, we will ensure the environment is completely clean.
1. **Remove the Stack:**
- In Portainer, go to "Stacks", select your `networking-stack`, and click **Remove**. Wait for it to be successfully removed.
2. **Remove the Docker Config:**
- Run this command in your manager node's terminal:
```zsh
docker config rm traefik.yml
```
*(It's okay if this command says the config doesn't exist.)*
3. **Remove the Docker Volume:**
- This will delete your old Let's Encrypt certificates, which is necessary for a clean start.
```zsh
docker volume rm traefik_letsencrypt
```
*(It's okay if this command says the volume doesn't exist.)*
4. **Remove the Local Config File (if it exists):**
```zsh
rm ./traefik.yml
```
---
### Step 2: Create the Correct Traefik v3 Configuration
We will use the `busybox` container method to create the configuration file.
1. **Create `traefik.yml`:**
- **IMPORTANT:** Replace `your-email@example.com` with your actual email address in the block below.
- Copy the entire multi-line block and paste it into your Zsh terminal.
- After pasting, the terminal will show a `>` on a new line. This is normal. **Simply type `EOF` and press Enter** to finish the command.
```zsh
# --- Creates the traefik.yml file in a temporary container and copies it out ---
docker run --rm -i -v "$(pwd):/host" busybox sh -c 'cat > /host/traefik.yml <<\'EOF\'
checkNewVersion: true
sendAnonymousUsage: false
log:
level: INFO
api:
dashboard: true
insecure: false
entryPoints:
web:
address: ":80"
http:
redirections:
entryPoint:
to: websecure
scheme: https
websecure:
address: ":443"
http:
tls:
certResolver: leresolver
providers:
swarm: # <-- Use the swarm provider in Traefik v3
endpoint: "unix:///var/run/docker.sock"
network: traefik-public
exposedByDefault: false
# Optionally keep the docker provider if you run non-swarm local containers.
# docker:
# network: traefik-public
# exposedByDefault: false
certificatesResolvers:
leresolver:
acme:
email: "your-email@example.com"
storage: "/letsencrypt/acme.json"
dnsChallenge:
provider: duckdns
delayBeforeCheck: 30s
resolvers:
- "192.168.1.196:53"
- "192.168.1.245:53"
- "192.168.1.62:53"
EOF'
```
2. **Create the Docker Swarm Config:**
- This command ingests the file you just created into Swarm.
```zsh
docker config create traefik.yml ./traefik.yml
```
3. **Create and Prepare the Let's Encrypt Volume:**
- Create the volume:
```zsh
docker volume create traefik_letsencrypt
```
- Create the empty `acme.json` file with the correct permissions:
```zsh
docker run --rm -v traefik_letsencrypt:/letsencrypt busybox sh -c "touch /letsencrypt/acme.json && chmod 600 /letsencrypt/acme.json"
```
---
### Step 3: Deploy the Corrected `networking-stack`
1. **Deploy via Portainer:**
- Go to "Stacks" > "Add stack".
- Name it `networking-stack`.
- Copy the YAML content below and paste it into the web editor.
- **IMPORTANT:** Replace `YOUR_DUCKDNS_TOKEN` with your actual DuckDNS token.
- Click "Deploy the stack".
```yaml
version: '3.9'
networks:
traefik-public:
external: true
volumes:
traefik_letsencrypt:
external: true
configs:
traefik_yml:
external: true
name: traefik.yml
services:
traefik:
image: traefik:latest # Or pin to traefik:v3.0 for stability
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- traefik_letsencrypt:/letsencrypt
networks:
- traefik-public
environment:
- "DUCKDNS_TOKEN=YOUR_DUCKDNS_TOKEN"
configs:
- source: traefik_yml
target: /traefik.yml
deploy:
labels:
- "traefik.enable=true"
- "traefik.http.routers.traefik.rule=Host(`traefik.sj98.duckdns.org`)"
- "traefik.http.routers.traefik.entrypoints=websecure"
- "traefik.http.routers.traefik.tls.certresolver=leresolver"
- "traefik.http.routers.traefik.service=api@internal"
placement:
constraints:
- node.role == manager
whoami:
image: traefik/whoami
networks:
- traefik-public
deploy:
labels:
- "traefik.enable=true"
- "traefik.http.routers.whoami.rule=Host(`whoami.sj98.duckdns.org`)"
- "traefik.http.routers.whoami.entrypoints=websecure"
- "traefik.http.routers.whoami.tls.certresolver=leresolver"
- "traefik.http.services.whoami.loadbalancer.server.port=80"
```
---
### Step 4: Verify and Redeploy Other Stacks
1. **Wait and Verify:**
- Wait for 2-3 minutes for the stack to deploy and for the certificate to be issued.
- Open your browser and navigate to `https://traefik.sj98.duckdns.org`. The Traefik dashboard should load.
- You should see routers for `traefik` and `whoami`.
2. **Redeploy Corrected Stacks:**
- Now that Traefik is working, go to Portainer and redeploy your `full-stack-complete.yml` and `monitoring-stack.yml` to apply the fixes we made earlier.
- The services from those stacks (Paperless, Prometheus, etc.) should now appear in the Traefik dashboard and be accessible via their URLs.
### Chat GPT Fix
Traefik Swarm Stack Fix Instructions
1. Verify Networks
Make sure all web-exposed services are attached to the traefik-public network:
networks:
- traefik-public
Internal-only services (DB, Redis, etc.) should not be on Traefik network.
2. Assign Unique Router Names
Every service exposed via Traefik must have a unique router label:
labels:
- "traefik.enable=true"
- "traefik.http.routers.<service>-router.rule=Host(`<subdomain>.sj98.duckdns.org`)"
- "traefik.http.routers.<service>-router.entrypoints=websecure"
- "traefik.http.routers.<service>-router.tls.certresolver=leresolver"
- "traefik.http.routers.<service>-router.service=<service>@swarm"
- "traefik.http.services.<service>.loadbalancer.server.port=<port>"
Replace <service>, <subdomain>, and <port> for each stack.
3. Update Traefik ACME Configuration
In traefik.yml, use:
certificatesResolvers:
leresolver:
acme:
email: "your-email@example.com"
storage: "/letsencrypt/acme.json"
dnsChallenge:
provider: duckdns
propagation:
delayBeforeChecks: 60s
resolvers:
- "192.168.1.196:53"
- "192.168.1.245:53"
- "192.168.1.62:53"
Note: delayBeforeCheck is deprecated. Use propagation.delayBeforeChecks.
4. Internal Services Configuration
• Redis / Postgres / other internal services
Do not expose them via Traefik.
Attach them to backend networks only:
networks:
- homelab-backend
• Only web services should have Traefik labels.
5. Deploy Services Correctly
1. Deploy Traefik first.
2. Deploy each routed service one at a time to allow ACME certificate issuance.
3. Verify logs for any Router defined multiple times or port is missing errors.
6. Checklist for Each Service
Service Hostname Port Traefik Router Name Network Notes
example-svc example.sj98.duckdns.org 8080 example-svc-router traefik-public Replace placeholders
another-svc another.sj98.duckdns.org 8000 another-svc-router traefik-public Only if web-exposed
• Fill in each services hostname, port, and network.
• Internal services do not need Traefik labels.
7. Common Issues
• Duplicate Router Names: Make sure every router has a unique label.
• Missing Ports: Each Traefik router must reference the service port with loadbalancer.server.port.
• ACME Failures: Ensure DuckDNS token is correct and propagation delay is set.
• Wrong Network: Only services on traefik-public are routable; internal services must use backend networks.

View File

@@ -0,0 +1,288 @@
# Traefik Setup Guide for Docker Swarm
This guide provides the step-by-step instructions to correctly configure and deploy Traefik in a Docker Swarm environment, especially when dealing with potentially read-only host filesystems.
This method uses Docker Configs and Docker Volumes to manage Traefik's configuration and data, which is the standard best practice for Swarm. All commands should be run on your **Docker Swarm manager node**.
---
### Step 1: Create the `traefik.yml` Configuration File
This step creates the Traefik static configuration file. You have two options:
#### Option A: Using `sudo tee` (Direct Host Write)
This command uses a `HEREDOC` with `sudo tee` to write the `traefik.yml` file directly to your manager node's filesystem. This is generally straightforward if your manager node's filesystem is writable.
**Action:**
1. **IMPORTANT:** Replace `your-email@example.com` with your actual email address in the command below.
2. Copy and paste the entire block into your Zsh terminal on the manager node.
```zsh
# --- Creates the traefik.yml file ---
sudo tee ./traefik.yml > /dev/null <<'EOF'
global:
checkNewVersion: true
sendAnonymousUsage: false
log:
level: INFO
api:
dashboard: true
insecure: false
entryPoints:
web:
address: ":80"
http:
redirections:
entryPoint:
to: websecure
scheme: https
websecure:
address: ":443"
providers:
docker:
network: traefik-public
exposedByDefault: false
certificatesResolvers:
leresolver:
acme:
email: "your-email@example.com"
storage: "/letsencrypt/acme.json"
dnsChallenge:
provider: duckdns
delayBeforeCheck: "120s"
EOF
```
#### Option B: Using `docker run` (Via Temporary Container)
This method creates the `traefik.yml` file *inside* a temporary `busybox` container and then copies it to your manager node's current directory. This is useful if you prefer to avoid direct `sudo tee` or if you're working in an environment where direct file creation is restricted.
**Action:**
1. **IMPORTANT:** Replace `your-email@example.com` with your actual email address in the command below.
2. Copy and paste the entire block into your Zsh terminal on the manager node.
```zsh
# --- Creates the traefik.yml file in a temporary container and copies it out ---
docker run --rm -i -v "$(pwd):/host" busybox sh -c 'cat > /host/traefik.yml <<\'EOF\'
checkNewVersion: true
sendAnonymousUsage: false
log:
level: INFO
api:
dashboard: true
insecure: false
entryPoints:
web:
address: ":80"
http:
redirections:
entryPoint:
to: websecure
scheme: https
websecure:
address: ":443"
http:
tls:
certResolver: leresolver
providers:
docker:
network: traefik-public
exposedByDefault: false
certificatesResolvers:
leresolver:
acme:
email: "your-email@example.com"
storage: "/letsencrypt/acme.json"
dnsChallenge:
provider: duckdns
delayBeforeCheck: 30s
resolvers:
- "192.168.1.196:53"
- "192.168.1.245:53"
- "192.168.1.62:53"
EOF'
```
> **Note on Versioning:** The `traefik:latest` tag can introduce unexpected breaking changes, as seen here. For production or stable environments, it is highly recommended to pin to a specific version in your stack file, for example: `image: traefik:v2.11` or `image: traefik:v3.0`.
---
### Step 2: Create the Docker Swarm Config
This command ingests the `traefik.yml` file (created in Step 1) into Docker Swarm, making it securely available to services.
**Action:** Run the following command on your manager node.
```zsh
docker config create traefik.yml ./traefik.yml
```
---
### Step 3: Create the Let's Encrypt Volume
This creates a managed Docker Volume that will persist your TLS certificates.
**Action:** Run the following command on your manager node.
```zsh
docker volume create traefik_letsencrypt
```
---
### Step 4: Prepare the `acme.json` File
Traefik requires an `acme.json` file to exist with the correct permissions before it can start. This command creates the empty file inside the volume you just made.
**Action:** Run the following command on your manager node.
```zsh
docker run --rm -v traefik_letsencrypt:/letsencrypt busybox sh -c "touch /letsencrypt/acme.json && chmod 600 /letsencrypt/acme.json"
```
---
### Step 5: Update and Deploy the `networking-stack.yml`
You can now deploy your `networking-stack` using the YAML below. It has been modified to use the Swarm config and volume instead of host paths.
**Action:**
1. **IMPORTANT:** Replace `YOUR_DUCKDNS_TOKEN` with your actual DuckDNS token in the `environment` section.
2. Upload this YAML content to Portainer to deploy your stack.
```yaml
version: '3.9'
networks:
traefik-public:
external: true
volumes:
traefik_letsencrypt:
external: true
configs:
traefik_yml:
external: true
name: traefik.yml
services:
traefik:
image: traefik:latest
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- traefik_letsencrypt:/letsencrypt
networks:
- traefik-public
environment:
- "DUCKDNS_TOKEN=YOUR_DUCKDNS_TOKEN"
configs:
- source: traefik_yml
target: /traefik.yml
deploy:
labels:
- "traefik.enable=true"
- "traefik.http.routers.traefik.rule=Host(`traefik.sj98.duckdns.org`)"
- "traefik.http.routers.traefik.entrypoints=websecure"
- "traefik.http.routers.traefik.tls.certresolver=leresolver"
- "traefik.http.routers.traefik.service=api@internal"
placement:
constraints:
- node.role == manager
whoami:
image: traefik/whoami
networks:
- traefik-public
deploy:
labels:
- "traefik.enable=true"
- "traefik.http.routers.whoami.rule=Host(`whoami.sj98.duckdns.org`)"
- "traefik.http.routers.whoami.entrypoints=websecure"
- "traefik.http.routers.whoami.tls.certresolver=leresolver"
- "traefik.http.services.whoami.loadbalancer.server.port=80"
```
---
### Step 6: Clean Up (Optional)
Since the configuration is now stored in Docker Swarm, you can remove the local `traefik.yml` file from your manager node's filesystem.
**Action:** Run the following command on your manager node.
```zsh
rm ./traefik.yml
```
---
### Troubleshooting and Removal
If you encounter an error and need to start the setup process over, follow these steps to cleanly remove all the components you created. Run these commands on your **Docker Swarm manager node**.
#### Step 1: Remove the Stack
First, remove the deployed stack from your Swarm.
**Action:**
- In Portainer, go to "Stacks", select your `networking-stack`, and click "Remove".
#### Step 2: Remove the Docker Config
This removes the Traefik configuration that was stored in the Swarm.
**Action:**
```zsh
docker config rm traefik.yml
```
#### Step 3: Remove the Docker Volume
This deletes the volume that was storing your Let's Encrypt certificates. **Warning:** This will delete your existing certificates.
**Action:**
```zsh
docker volume rm traefik_letsencrypt
```
#### Step 4: Remove the Local Config File (If Present)
If you didn't delete the `traefik.yml` file in the optional clean-up step, remove it now.
**Action:**
```zsh
rm ./traefik.yml
```
After completing these steps, your environment will be clean, and you can safely re-run the setup guide from the beginning.
---
### Step 7: Verify Traefik Dashboard
Once your `networking-stack` is deployed and Traefik has started, you can verify its functionality by accessing the Traefik dashboard.
**Action:**
1. Open your web browser and navigate to the Traefik dashboard:
- **Traefik Dashboard:** `https://traefik.sj98.duckdns.org`
You should see the Traefik dashboard, listing your routers and services. If you see a certificate warning, it might take a moment for Let's Encrypt to issue the certificate. If the dashboard loads, Traefik is running correctly.

View File

@@ -0,0 +1,46 @@
# Traefik URLs
This file contains a list of all the Traefik URLs defined in the Docker Swarm stack files.
## Media Stack (`docker-swarm-media-stack.yml`)
- **Homarr:** [`homarr.sj98.duckdns.org`](https://homarr.sj98.duckdns.org)
- **Plex:** [`plex.sj98.duckdns.org`](https://plex.sj98.duckdns.org)
- **Jellyfin:** [`jellyfin.sj98.duckdns.org`](https://jellyfin.sj98.duckdns.org)
- **Immich:** [`immich.sj98.duckdns.org`](https://immich.sj98.duckdns.org)
## Full Stack (`full-stack-complete.yml`)
- **OpenWebUI:** `ai.sj98.duckdns.org`
- **Paperless-ngx:** `paperless.sj98.duckdns.org`
- **Stirling-PDF:** `pdf.sj98.duckdns.org`
- **SearXNG:** `search.sj98.duckdns.org`
- **TSDProxy:** `tsdproxy.sj98.duckdns.org`
## Monitoring Stack (`monitoring-stack.yml`)
- **Prometheus:** `prometheus.sj98.duckdns.org`
- **Grafana:** `grafana.sj98.duckdns.org`
- **Alertmanager:** `alertmanager.sj98.duckdns.org`
## Networking Stack (`networking-stack.yml`)
- **whoami:** `whoami.sj98.duckdns.org`
## Tools Stack (`tools-stack.yml`)
- **Portainer:** `portainer.sj98.duckdns.org`
- **Dozzle:** `dozzle.sj98.duckdns.org`
- **Lazydocker:** `lazydocker.sj98.duckdns.org`
## Productivity Stack (`productivity-stack.yml`)
- **Nextcloud:** `nextcloud.sj98.duckdns.org`
## TSDProxy Stack (`tsdproxy-stack.yml`)
- **TSDProxy:** `proxy.sj98.duckdns.org`
## Portainer Stack (`portainer-stack.yml`)
- **Portainer:** `portainer0.sj98.duckdns.org`

56
docs/models/LM_Studio.md Normal file
View File

@@ -0,0 +1,56 @@
curl 192.168.1.81:1234/v1/models
{
"data": [
{
"id": "mistralai/codestral-22b-v0.1",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "instinct",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "qwen2.5-coder-1.5b-instruct",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "qwen2.5-coder-7b-instruct",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "text-embedding-nomic-embed-text-v1.5",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "qwen/qwen3-coder-30b",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "openai/gpt-oss-20b",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "google/gemma-3-12b",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "qwen/qwen3-8b",
"object": "model",
"owned_by": "organization_owner"
},
{
"id": "deepseek-r1-distill-llama-8b",
"object": "model",
"owned_by": "organization_owner"
}
],
"object": "list"
}%

View File

@@ -0,0 +1,60 @@
# Firewall Segmentation Plan: TP-Link BE9300 Homelab (Revised)
## Objective
To enhance network security by isolating IoT devices from the main trusted network using the TP-Link BE9300's dedicated IoT Network feature. The goal is to prevent a potential compromise on an IoT device from affecting critical systems while ensuring cross-network device discovery (casting) remains functional.
---
## Phase 1: Network Design & Configuration
1. **Define the Networks:**
* **Main Network (Trusted):**
* **Subnet:** `19_2.168.1.0/24`
* **Devices:** Computers, NAS (OMV), Proxmox host, Raspberry Pis, personal mobile devices.
* **IoT Network (Untrusted):**
* **Subnet:** To be assigned by the router.
* **Devices:** Smart TVs, Fire Sticks, Govee lights/sensors, TP-Link/Tapo bulbs, Vivint security system.
* **Guest Network (Isolated):**
* **Subnet:** To be assigned by the router.
* **Devices:** For visitors only.
2. **Router Configuration Steps:**
* Log in to your TP-Link BE9300's admin interface or use the TP-Link Tether app.
* Navigate to the **IoT Network** settings and enable it. This will create a separate Wi-Fi network and subnet for your IoT devices.
* Assign a unique SSID (e.g., `HomeLab-IoT`) and a strong, unique password.
* Enable the **Guest Network** with its own unique SSID and password.
* **Crucially, do NOT enable the "Device Isolation" feature at this stage.** The default separation of the IoT network may be sufficient and might not break mDNS/casting.
* Move all identified IoT devices to the new `HomeLab-IoT` Wi-Fi network.
---
## Phase 2: Enabling Casting & Testing
The primary challenge is allowing mDNS (for AirPlay/Chromecast) to function across subnets. The BE9300 does not have an explicit "mDNS forwarder," so we rely on the default behavior of the IoT network.
1. **Initial Test (Without Device Isolation):**
* Connect your phone or computer to the **Main Network**.
* Open a casting-capable app (e.g., YouTube, Spotify).
* Check if your TVs and other casting devices (now on the `HomeLab-IoT` network) are discoverable.
* **If casting works:** The default firewall rules between the Main and IoT networks are suitable. The project is successful.
* **If casting does NOT work:** Proceed to the next step.
2. **Troubleshooting with Device Isolation:**
* The BE9300's "Device Isolation" feature is likely too restrictive, as it is designed to prevent communication between isolated devices and the main network entirely. This will almost certainly break casting.
* There is no evidence from the research that the BE9300 allows for the fine-grained rules needed to allow only mDNS traffic. The trade-off is between full isolation (no casting) and the slightly more permissive default IoT network separation (casting works).
**Note on Wired Devices:** Research indicates the "Device Isolation" feature may only apply to Wi-Fi clients. Any IoT devices connected via Ethernet may not be isolated from the main LAN, representing a limitation of the hardware.
---
## Phase 3: Final Validation
1. **Test Isolation:**
* Connect a device to the **IoT Network**.
* Try to access a service on your Main network (e.g., ping your Pi-hole at `192.168.1.196` or access the OMV web UI).
* **Expected Result:** The connection should fail. This confirms the IoT network is properly segmented from your trusted devices.
2. **Test Internet Access:**
* Ensure devices on the IoT and Guest networks can access the internet.
By following this revised plan, you will be using the specific features of your router to achieve the best possible balance of security and functionality.

View File

@@ -0,0 +1,412 @@
# Docker Swarm Stack Files - Review & Recommendations
## Overview
Reviewed 9 Docker Swarm stack files totaling ~24KB of configuration. Found **critical security issues**, configuration inconsistencies, and optimization opportunities.
---
## 🔴 Critical Issues
### 1. **Hardcoded Secrets in Plain Text**
**Files Affected**: [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml), [`monitoring-stack.yml`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml)
**Problems**:
```yaml
# Line 96: Paperless DB password in plain text
- PAPERLESS_DBPASS=paperless
# Line 98: Hardcoded secret key
- PAPERLESS_SECRET_KEY=change-me-please-to-something-secure
# Line 52: Grafana admin password exposed
- GF_SECURITY_ADMIN_PASSWORD=change-me-please
```
**Risk**: Anyone with access to the repo can see credentials. These will be in Docker configs and logs.
**Fix**: Use Docker secrets:
```yaml
secrets:
paperless_db_password:
external: true
paperless_secret_key:
external: true
grafana_admin_password:
external: true
services:
paperless:
secrets:
- paperless_db_password
- paperless_secret_key
environment:
- PAPERLESS_DBPASS_FILE=/run/secrets/paperless_db_password
- PAPERLESS_SECRET_KEY_FILE=/run/secrets/paperless_secret_key
```
### 2. **Missing Health Checks**
**Files Affected**: All stack files
**Problem**: No services have health checks configured, meaning:
- Swarm can't detect unhealthy containers
- Auto-restart won't work properly
- Load balancers may route to failing instances
**Fix**: Add health checks to critical services:
```yaml
services:
paperless:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
```
### 3. **Incorrect node-exporter Command**
**File**: [`monitoring-stack.yml:111-114`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml#L111-L114)
**Problem**:
```yaml
command:
- '--config.file=/etc/prometheus/prometheus.yml' # Wrong! This is for Prometheus
- '--storage.tsdb.path=/prometheus' # Wrong!
```
**Fix**:
```yaml
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
```
---
## ⚠️ High-Priority Warnings
### 4. **Missing Networks on Database Services**
**File**: [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml)
**Problem**: `paperless-db` (line 70) doesn't have a network defined, but Paperless tries to connect to it.
**Fix**:
```yaml
paperless-db:
networks:
- homelab-backend # Add this
```
### 5. **Resource Limits Too High for Pi Zero**
**File**: [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml)
**Problem**: Services with `node.labels.leader == true` (Pi 4) have resource limits that may be too high:
- Paperless: 2GB memory (Pi 4 has 8GB total)
- Stirling-PDF: 2GB memory
- SearXNG: 2GB memory
- Combined: 6GB+ on one node
**Fix**: Reduce limits or spread services across nodes:
```yaml
deploy:
placement:
constraints:
- node.labels.leader == true
- node.memory.available > 2G # Add memory check
```
### 6. **Duplicate Portainer Definitions**
**Files**: [`portainer-stack.yml`](file:///workspace/homelab/services/swarm/stacks/portainer-stack.yml) vs [`tools-stack.yml`](file:///workspace/homelab/services/swarm/stacks/tools-stack.yml)
**Problem**: Portainer is defined in both files with different configurations:
- `portainer-stack.yml`: Uses agent mode with global agents
- `tools-stack.yml`: Uses socket mode (simpler but less scalable)
**Fix**: Pick one approach and remove the duplicate.
### 7. **Missing Traefik Network Declaration**
**File**: [`monitoring-stack.yml:38-44`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml#L38-L44)
**Problem**: Prometheus has Traefik labels but isn't on the `traefik-public` network.
**Fix**:
```yaml
prometheus:
networks:
- monitoring
- traefik-public # Add this
```
---
## 🟡 Medium-Priority Improvements
### 8. **Missing Restart Policies**
**Files Affected**: Most services
**Problem**: Only Portainer has restart policies. Other services will fail permanently on error.
**Fix**: Add to all services:
```yaml
deploy:
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
```
### 9. **Watchtower Interval Too Frequent**
**File**: [`full-stack-complete.yml:191`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml#L191)
**Problem**: `--interval 300` = check every 5 minutes (too frequent)
**Fix**: Change to hourly or daily:
```yaml
command: --cleanup --interval 86400 # Daily
```
### 10. **Missing Logging Configuration**
**Files Affected**: All
**Problem**: No log driver or limits configured. Logs can fill disk.
**Fix**:
```yaml
deploy:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
```
### 11. **Version 3.9 is Deprecated**
**Files Affected**: All
**Problem**: Docker Compose v3.9 is deprecated. Should use Compose Specification (no version field) or v3.8.
**Fix**: Remove version line or use `version: '3.8'`
---
## 🟢 Best Practice Recommendations
### 12. **Add Update Configs**
**Benefit**: Zero-downtime deployments
```yaml
deploy:
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
order: start-first
```
### 13. **Use Specific Image Tags**
**Files Affected**: Services using `:latest`
**Current**:
```yaml
image: portainer/portainer-ce:latest
image: searxng/searxng:latest
```
**Better**:
```yaml
image: portainer/portainer-ce:2.33.4
image: searxng/searxng:2024.11.20
```
**Good tags already used**: `full-stack-complete.yml` has several pinned versions ✓
### 14. **Add Labels for Documentation**
**Benefit**: Self-documenting infrastructure
```yaml
deploy:
labels:
- "com.homelab.description=Paperless document management"
- "com.homelab.maintainer=@sj98"
- "com.homelab.version=2.19.3"
```
### 15. **Separate Configs from Stacks**
**Problem**: Mixing config and stack definitions
**Current**: Prometheus config is external (good!)
**Recommendation**: Do the same for Traefik, Alertmanager configs
### 16. **Add Dependency Ordering**
**Current**: Some services use `depends_on` (good!)
**Problem**: Not all services that need it have it
```yaml
paperless:
depends_on:
- paperless-redis
- paperless-db
```
---
## 📋 Detailed File-by-File Analysis
### [`full-stack-complete.yml`](file:///workspace/homelab/services/swarm/stacks/full-stack-complete.yml)
**Good**:
- ✅ Proper network segmentation (traefik-public vs homelab-backend)
- ✅ Resource limits defined
- ✅ Node placement constraints
- ✅ Specific image tags for most services
**Issues**:
- 🔴 Hardcoded passwords (lines 96, 98)
- 🔴 No health checks
- ⚠️ paperless-db missing network
- ⚠️ Resource limits may be too high for Pi 4
**Score**: 6/10
---
### [`monitoring-stack.yml`](file:///workspace/homelab/services/swarm/stacks/monitoring-stack.yml)
**Good**:
- ✅ Proper monitoring network
- ✅ External configs for Prometheus
- ✅ Resource limits
**Issues**:
- 🔴 Hardcoded Grafana password (line 52)
- 🔴 node-exporter has wrong command (lines 111-114)
- ⚠️ Prometheus missing traefik-public network
- ⚠️ No health checks
**Score**: 5/10
---
### [`networking-stack.yml`](file:///workspace/homelab/services/swarm/stacks/networking-stack.yml)
**Good**:
- ✅ Uses secrets for DuckDNS token
- ✅ External volume for Let's Encrypt
- ✅ Proper network attachment
**Issues**:
- ⚠️ Traefik single replica (should be 2+ for HA)
- ⚠️ No health check
- ⚠️ whoami resource limits too strict
**Score**: 7/10
---
### [`portainer-stack.yml`](file:///workspace/homelab/services/swarm/stacks/portainer-stack.yml)
**Good**:
- ✅ Has restart policies!
- ✅ Supports both Windows and Linux agents
- ✅ Proper network setup
**Issues**:
- ⚠️ Duplicate of tools-stack.yml Portainer
- ⚠️ No health check
**Score**: 7/10
---
### [`tools-stack.yml`](file:///workspace/homelab/services/swarm/stacks/tools-stack.yml)
**Good**:
- ✅ All tools on manager node (correct)
- ✅ Resource limits defined
**Issues**:
- ⚠️ Duplicate Portainer definition
- ⚠️ lazydocker needs TTY, won't work in Swarm
- ⚠️ No restart policies
**Score**: 6/10
---
### [`node-exporter-stack.yml`](file:///workspace/homelab/services/swarm/stacks/node-exporter-stack.yml)
**Content** (created by us):
```yaml
version: '3.8'
services:
node-exporter:
image: prom/node-exporter:latest
command:
- '--path.rootfs=/host'
volumes:
- '/:/host:ro,rslave'
deploy:
mode: global
```
**Good**:
- ✅ Global mode (runs on all nodes)
- ✅ Read-only host mount
**Issues**:
- ⚠️ Uses `:latest` tag
- ⚠️ No resource limits
- ⚠️ No health check
**Score**: 6/10
---
## 🛠️ Recommended Action Plan
### Phase 1: Critical Security (Do Immediately)
1. ✅ Create Docker secrets for all passwords
2. ✅ Update stack files to use secrets
3. ✅ Fix node-exporter command
4. ✅ Add missing network to paperless-db
### Phase 2: Stability (Do This Week)
1. ⏭️ Add health checks to all services
2. ⏭️ Add restart policies
3. ⏭️ Fix Prometheus network
4. ⏭️ Remove duplicate Portainer
### Phase 3: Optimization (Do This Month)
1. ⏭️ Update all `:latest` tags to specific versions
2. ⏭️ Add update configs
3. ⏭️ Configure logging limits
4. ⏭️ Review resource limits
### Phase 4: Best Practices (Ongoing)
1. ⏭️ Add documentation labels
2. ⏭️ Separate configs from stacks
3. ⏭️ Set up monitoring alerts for service health
---
## 🎯 Summary Scores
| Stack File | Security | Stability | Best Practices | Overall |
|-----------|----------|-----------|----------------|---------|
| full-stack-complete.yml | 3/10 | 6/10 | 7/10 | **6/10** |
| monitoring-stack.yml | 4/10 | 5/10 | 6/10 | **5/10** |
| networking-stack.yml | 8/10 | 6/10 | 7/10 | **7/10** |
| portainer-stack.yml | 7/10 | 7/10 | 7/10 | **7/10** |
| tools-stack.yml | 7/10 | 5/10 | 6/10 | **6/10** |
| node-exporter-stack.yml | 7/10 | 5/10 | 6/10 | **6/10** |
| **Average** | **6.0/10** | **5.7/10** | **6.5/10** | **6.2/10** |
---
## 📝 Next Steps
Would you like me to:
1. **Create fixed versions** of the stack files with all critical issues resolved?
2. **Generate Docker secrets creation script** for all passwords?
3. **Add health checks** to all services?
4. **Consolidate duplicate configs** (e.g., remove duplicate Portainer)?
5. **Create a migration guide** for applying these changes safely?
Let me know which improvements you'd like me to implement!

View File

@@ -0,0 +1,63 @@
groups:
- name: homelab_alerts
interval: 30s
rules:
# CPU Usage Alert
- alert: HighCPUUsage
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected on {{ $labels.instance }}"
description: "CPU usage is above 80% (current value: {{ $value }}%)"
# Memory Usage Alert
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage detected on {{ $labels.instance }}"
description: "Memory usage is above 85% (current value: {{ $value }}%)"
# Disk Usage Alert
- alert: HighDiskUsage
expr: (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs"} / node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lxcfs"})) * 100 > 80
for: 10m
labels:
severity: warning
annotations:
summary: "High disk usage detected on {{ $labels.instance }}"
description: "Disk usage on {{ $labels.mountpoint }} is above 80% (current value: {{ $value }}%)"
# Node Down Alert
- alert: NodeDown
expr: up{job="node-exporter"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} is down"
description: "Node exporter on {{ $labels.instance }} has been down for more than 2 minutes"
# Container Down Alert
- alert: ContainerDown
expr: up{job="docker"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Container {{ $labels.instance }} is down"
description: "Docker container on {{ $labels.instance }} has been down for more than 2 minutes"
# Disk I/O Alert (high wait time)
- alert: HighDiskIOWait
expr: rate(node_cpu_seconds_total{mode="iowait"}[5m]) * 100 > 20
for: 10m
labels:
severity: warning
annotations:
summary: "High disk I/O wait on {{ $labels.instance }}"
description: "Disk I/O wait time is above 20% (current value: {{ $value }}%)"

64
proxmox/network_check.sh Normal file
View File

@@ -0,0 +1,64 @@
#!/bin/bash
# A script to check for internet connectivity and reset the USB network adapter or reboot if the connection is down.
# The IP address of your local gateway (router).
GATEWAY_IP="192.168.1.1"
# The IP address to ping to check for an external internet connection.
PING_IP="8.8.8.8"
# The number of pings to send.
PING_COUNT=1
# The USB bus and device number of the network adapter.
# Use 'lsusb' to find these values for your specific device.
USB_BUS="002"
USB_DEV="003"
# The path to the USB device.
USB_DEVICE_PATH="/dev/bus/usb/$USB_BUS/$USB_DEV"
# Log file
LOG_FILE="/var/log/network_check.log"
# Function to log messages
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}
# Check if the script is running as root.
if [ "$(id -u)" -ne 0 ]; then
log "This script must be run as root."
exit 1
fi
# 1. Check for local network connectivity by pinging the gateway.
if ! ping -c "$PING_COUNT" "$GATEWAY_IP" > /dev/null 2>&1; then
log "Local network connection is down (cannot ping gateway $GATEWAY_IP). This indicates a problem with the host's network adapter."
log "Attempting to reset the USB adapter."
# Attempt to reset the USB device.
if [ -e "$USB_DEVICE_PATH" ]; then
/usr/bin/usbreset "$USB_DEVICE_PATH"
sleep 10 # Wait for the device to reinitialize.
# Check the connection again.
if ! ping -c "$PING_COUNT" "$GATEWAY_IP" > /dev/null 2>&1; then
log "USB reset failed to restore the local connection. Rebooting the system."
/sbin/reboot
else
log "USB reset successful. Local network connection is back up."
fi
else
log "USB device not found at $USB_DEVICE_PATH. Rebooting the system."
/sbin/reboot
fi
else
# 2. If the local network is up, check for external internet connectivity.
if ! ping -c "$PING_COUNT" "$PING_IP" > /dev/null 2>&1; then
log "Local network is up, but internet connection is down (cannot ping $PING_IP). This is likely a router or ISP issue. No action taken."
else
log "Network connection is up."
fi
fi

53
scripts/backup_daily.sh Executable file
View File

@@ -0,0 +1,53 @@
#!/bin/bash
# backup_daily.sh - Daily backup script using restic to Backblaze B2
set -euo pipefail
# Configuration
export B2_ACCOUNT_ID="your_b2_account_id"
export B2_ACCOUNT_KEY="your_b2_account_key"
export RESTIC_REPOSITORY="b2:your-bucket-name:/backups"
export RESTIC_PASSWORD="your_restic_password"
# Backup targets
BACKUP_DIRS=(
"/var/lib/docker/volumes/homeassistant/_data"
"/var/lib/docker/volumes/portainer/_data"
"/var/lib/docker/volumes/nextcloud/_data"
"/mnt/nas/models"
)
# Logging
LOG_FILE="/var/log/restic_backup.log"
exec > >(tee -a "$LOG_FILE") 2>&1
echo "=== Restic Backup Started at $(date) ==="
# Check if repository is initialized
if ! restic snapshots &>/dev/null; then
echo "Repository not initialized. Initializing..."
restic init
fi
# Perform backup
echo "Backing up directories: ${BACKUP_DIRS[*]}"
restic backup "${BACKUP_DIRS[@]}" \
--tag homelab \
--verbose
# Prune old backups (keep last 7 daily, 4 weekly, 12 monthly)
echo "Pruning old backups..."
restic forget \
--keep-daily 7 \
--keep-weekly 4 \
--keep-monthly 12 \
--prune
# Check repository integrity (monthly)
DAY_OF_MONTH=$(date +%d)
if [ "$DAY_OF_MONTH" == "01" ]; then
echo "Running repository check..."
restic check
fi
echo "=== Restic Backup Completed at $(date) ==="

View File

@@ -0,0 +1,96 @@
#!/bin/bash
# create_docker_secrets.sh - Create all Docker secrets for swarm stacks
# Run this ONCE before deploying the fixed stack files
set -euo pipefail
# Colors
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m'
echo -e "${YELLOW}Docker Secrets Creation Script${NC}"
echo "This will create all required secrets for your swarm stacks."
echo ""
# Check if running on swarm manager
if ! docker node ls &>/dev/null; then
echo -e "${RED}Error: This must be run on a Docker Swarm manager node${NC}"
exit 1
fi
# Function to create secret
create_secret() {
local SECRET_NAME=$1
local SECRET_DESCRIPTION=$2
local DEFAULT_VALUE=$3
if docker secret inspect "$SECRET_NAME" &>/dev/null; then
echo -e "${YELLOW}⚠ Secret '$SECRET_NAME' already exists, skipping${NC}"
return 0
fi
echo -e "\n${GREEN}Creating secret: $SECRET_NAME${NC}"
echo "$SECRET_DESCRIPTION"
if [[ -n "$DEFAULT_VALUE" ]]; then
read -p "Enter value (default: $DEFAULT_VALUE): " SECRET_VALUE
SECRET_VALUE=${SECRET_VALUE:-$DEFAULT_VALUE}
else
read -sp "Enter value (hidden): " SECRET_VALUE
echo
fi
if [[ -z "$SECRET_VALUE" ]]; then
echo -e "${RED}Error: Secret value cannot be empty${NC}"
return 1
fi
echo -n "$SECRET_VALUE" | docker secret create "$SECRET_NAME" -
echo -e "${GREEN}✓ Created secret: $SECRET_NAME${NC}"
}
echo "==================================="
echo "Paperless Secrets"
echo "==================================="
create_secret "paperless_db_password" \
"Database password for Paperless PostgreSQL" \
""
create_secret "paperless_secret_key" \
"Django secret key for Paperless (50+ random characters)" \
""
echo ""
echo "==================================="
echo "Grafana Secrets"
echo "==================================="
create_secret "grafana_admin_password" \
"Grafana admin password" \
""
echo ""
echo "==================================="
echo "DuckDNS Secret"
echo "==================================="
create_secret "duckdns_token" \
"DuckDNS API token (from duckdns.org account)" \
""
echo ""
echo -e "${GREEN}==================================="
echo "All secrets created successfully!"
echo "===================================${NC}"
echo ""
echo "Verify secrets:"
echo " docker secret ls"
echo ""
echo "To remove a secret (if needed):"
echo " docker secret rm <secret_name>"
echo ""
echo "IMPORTANT: Secret values cannot be retrieved after creation."
echo "Store them securely in a password manager!"

181
scripts/deploy_all.sh Executable file
View File

@@ -0,0 +1,181 @@
#!/bin/bash
# deploy_all.sh - Master deployment script for all homelab improvements
# This script orchestrates the deployment of all components in the correct order
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Logging
LOG_FILE="/var/log/homelab_deployment.log"
exec > >(tee -a "$LOG_FILE") 2>&1
echo -e "${GREEN}========================================${NC}"
echo -e "${GREEN}Home Lab Deployment Script${NC}"
echo -e "${GREEN}Started at $(date)${NC}"
echo -e "${GREEN}========================================${NC}\n"
# Check if running as root
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}This script must be run as root${NC}"
exit 1
fi
# Deployment phases
PHASES=(
"network:Network Upgrade"
"storage:Storage Enhancements"
"services:Service Consolidation"
"security:Security Hardening"
"monitoring:Monitoring & Automation"
"backup:Backup Strategy"
)
deploy_network() {
echo -e "\n${YELLOW}[PHASE 1/6] Network Upgrade${NC}"
echo "This phase requires manual hardware installation."
echo "Please ensure the 2.5Gb switch is installed before proceeding."
read -p "Has the new switch been installed? (y/n) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
echo "Skipping network upgrade. Please install switch first."
return 0
fi
echo "Configuring VLAN firewall rules..."
bash /workspace/homelab/scripts/vlan_firewall.sh
echo -e "${GREEN}✓ Network configuration complete${NC}"
}
deploy_storage() {
echo -e "\n${YELLOW}[PHASE 2/6] Storage Enhancements${NC}"
read -p "Create ZFS pool on Proxmox host? (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "Creating ZFS pool..."
bash /workspace/homelab/scripts/zfs_setup.sh
fi
echo -e "\n${YELLOW}Please mount NAS shares manually using:${NC}"
echo " Guide: /workspace/homelab/docs/guides/NAS_Mount_Guide.md"
read -p "Press enter when NAS is mounted..."
echo "Setting up AI model pruning cron job..."
(crontab -l 2>/dev/null; echo "0 3 * * * /workspace/homelab/scripts/prune_ai_models.sh") | crontab -
echo -e "${GREEN}✓ Storage configuration complete${NC}"
}
deploy_services() {
echo -e "\n${YELLOW}[PHASE 3/6] Service Consolidation${NC}"
read -p "Deploy Traefik Swarm service? (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "Deploying Traefik stack..."
docker stack deploy -c /workspace/homelab/services/swarm/traefik/stack.yml traefik
sleep 5
docker service ls | grep traefik
fi
read -p "Deploy Caddy fallback on Pi Zero? (requires SSH to .62) (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "Please deploy Caddy manually on Pi Zero (.62)"
echo " cd /workspace/homelab/services/standalone/Caddy"
echo " docker-compose up -d"
fi
read -p "Deploy n8n stack? (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "Deploying n8n stack..."
docker stack deploy -c /workspace/homelab/services/swarm/stacks/n8n-stack.yml n8n
sleep 5
docker service ls | grep n8n
fi
echo -e "${GREEN}✓ Service consolidation complete${NC}"
}
deploy_security() {
echo -e "\n${YELLOW}[PHASE 4/6] Security Hardening${NC}"
read -p "Install fail2ban on manager VM? (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "Installing fail2ban..."
bash /workspace/homelab/scripts/install_fail2ban.sh
fi
echo -e "${GREEN}✓ Security hardening complete${NC}"
}
deploy_monitoring() {
echo -e "\n${YELLOW}[PHASE 5/6] Monitoring & Automation${NC}"
read -p "Deploy monitoring stack? (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "Setting up monitoring..."
bash /workspace/homelab/scripts/setup_monitoring.sh
fi
echo -e "${GREEN}✓ Monitoring setup complete${NC}"
}
deploy_backup() {
echo -e "\n${YELLOW}[PHASE 6/6] Backup Strategy${NC}"
echo -e "${YELLOW}Before proceeding, ensure you have:${NC}"
echo " 1. Backblaze B2 account created"
echo " 2. B2 bucket created"
echo " 3. Updated /workspace/homelab/scripts/backup_daily.sh with credentials"
read -p "Are credentials configured? (y/n) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
echo "Skipping backup setup. Please configure credentials first."
return 0
fi
echo "Installing restic backup..."
bash /workspace/homelab/scripts/install_restic_backup.sh
echo -e "${GREEN}✓ Backup strategy complete${NC}"
}
# Main deployment flow
main() {
echo "This script will guide you through the deployment of all homelab improvements."
echo "You can skip any phase if needed."
echo ""
deploy_network
deploy_storage
deploy_services
deploy_security
deploy_monitoring
deploy_backup
echo -e "\n${GREEN}========================================${NC}"
echo -e "${GREEN}Deployment Complete!${NC}"
echo -e "${GREEN}Completed at $(date)${NC}"
echo -e "${GREEN}========================================${NC}\n"
echo "Post-deployment verification:"
echo " 1. Check Docker services: docker service ls"
echo " 2. Check container health: docker ps --filter health=healthy"
echo " 3. Check fail2ban: sudo fail2ban-client status"
echo " 4. Check monitoring: curl http://192.168.1.196:9100/metrics"
echo " 5. Check backups: sudo systemctl status restic-backup.timer"
echo ""
echo "Full verification guide: /workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md"
echo "Log file: $LOG_FILE"
}
main "$@"

27
scripts/install_fail2ban.sh Executable file
View File

@@ -0,0 +1,27 @@
#!/bin/bash
# install_fail2ban.sh - Install and configure fail2ban on manager VM
set -euo pipefail
echo "Installing fail2ban..."
sudo apt-get update
sudo apt-get install -y fail2ban
echo "Creating fail2ban directories..."
sudo mkdir -p /etc/fail2ban/filter.d
echo "Copying custom filters..."
sudo cp /workspace/homelab/security/fail2ban/filter.d/portainer.conf /etc/fail2ban/filter.d/
sudo cp /workspace/homelab/security/fail2ban/filter.d/traefik-auth.conf /etc/fail2ban/filter.d/
echo "Copying jail configuration..."
sudo cp /workspace/homelab/security/fail2ban/jail.local /etc/fail2ban/
echo "Restarting fail2ban service..."
sudo systemctl restart fail2ban
sudo systemctl enable fail2ban
echo "Checking fail2ban status..."
sudo fail2ban-client status
echo "fail2ban installation complete."

View File

@@ -0,0 +1,28 @@
#!/bin/bash
# install_restic_backup.sh - Install restic and configure systemd timer
set -euo pipefail
echo "Installing restic..."
sudo apt-get update
sudo apt-get install -y restic
echo "Making backup script executable..."
sudo chmod +x /workspace/homelab/scripts/backup_daily.sh
echo "Installing systemd service and timer..."
sudo cp /workspace/homelab/systemd/restic-backup.service /etc/systemd/system/
sudo cp /workspace/homelab/systemd/restic-backup.timer /etc/systemd/system/
echo "Reloading systemd daemon..."
sudo systemctl daemon-reload
echo "Enabling and starting timer..."
sudo systemctl enable restic-backup.timer
sudo systemctl start restic-backup.timer
echo "Checking timer status..."
sudo systemctl status restic-backup.timer
echo "Restic backup installation complete."
echo "Remember to update /workspace/homelab/scripts/backup_daily.sh with your B2 credentials."

View File

@@ -0,0 +1,80 @@
#!/bin/bash
# network_performance_test.sh - Test network performance between nodes
# This script uses iperf3 to measure bandwidth between homelab nodes
set -euo pipefail
# Colors
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
# Node IPs
NODES=(
"192.168.1.81:Ryzen"
"192.168.1.57:Proxmox"
"192.168.1.196:Manager"
"192.168.1.245:Pi4"
"192.168.1.62:PiZero"
)
echo "========================================="
echo "Network Performance Testing"
echo "========================================="
# Check if iperf3 is installed
if ! command -v iperf3 >/dev/null 2>&1; then
echo "Installing iperf3..."
sudo apt-get update && sudo apt-get install -y iperf3
fi
# Get current node IP
CURRENT_IP=$(hostname -I | awk '{print $1}')
echo -e "\nTesting from: $CURRENT_IP\n"
test_node() {
local NODE_INFO=$1
local IP=$(echo $NODE_INFO | cut -d: -f1)
local NAME=$(echo $NODE_INFO | cut -d: -f2)
if [[ "$IP" == "$CURRENT_IP" ]]; then
return
fi
echo -e "${YELLOW}Testing to $NAME ($IP)...${NC}"
# Test if iperf3 server is running
if timeout 2 nc -z $IP 5201 2>/dev/null; then
# Run bandwidth test
RESULT=$(iperf3 -c $IP -t 5 -f M 2>/dev/null | grep "receiver" | awk '{print $7, $8}')
if [[ -n "$RESULT" ]]; then
echo -e "${GREEN} → Bandwidth: $RESULT${NC}"
else
echo " → Test failed (server may be busy)"
fi
else
echo " → iperf3 server not running on $NAME"
echo " → Run on $NAME: iperf3 -s -D"
fi
}
# Test all nodes
for NODE in "${NODES[@]}"; do
test_node "$NODE"
done
echo -e "\n========================================="
echo "Test complete"
echo "=========================================
"
# Recommendations
echo -e "\nRecommendations:"
echo "• Expected speeds:"
echo " - Ryzen/Proxmox: 2.5 Gb (2500 Mbits/sec)"
echo " - Pi 4: 1 Gb (1000 Mbits/sec)"
echo " - Pi Zero: 100 Mb (100 Mbits/sec)"
echo "• If speeds are lower, check:"
echo " - Switch port configuration"
echo " - Cable quality (Cat6 for 2.5Gb)"
echo " - Network interface settings"

18
scripts/prune_ai_models.sh Executable file
View File

@@ -0,0 +1,18 @@
#!/bin/bash
# prune_ai_models.sh - Remove AI model files older than 30 days to free space
# Adjust the MODEL_DIR path to where your AI models are stored (e.g., /mnt/nas/models)
set -euo pipefail
MODEL_DIR="/mnt/nas/models"
DAYS=30
if [[ ! -d "$MODEL_DIR" ]]; then
echo "Model directory $MODEL_DIR does not exist. Exiting."
exit 1
fi
echo "Pruning model files in $MODEL_DIR older than $DAYS days..."
find "$MODEL_DIR" -type f -mtime +$DAYS -print -delete
echo "Prune completed."

132
scripts/quick_status.sh Executable file
View File

@@ -0,0 +1,132 @@
#!/bin/bash
# quick_status.sh - Quick health check of all homelab components
# Run this anytime to get a fast overview of system status
set -euo pipefail
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
clear
echo -e "${BLUE}╔════════════════════════════════════════╗${NC}"
echo -e "${BLUE}║ Home Lab Quick Status Check ║${NC}"
echo -e "${BLUE}╚════════════════════════════════════════╝${NC}"
echo ""
# System Info
echo -e "${YELLOW}📊 System Information${NC}"
echo " Hostname: $(hostname)"
echo " Uptime: $(uptime -p)"
echo " Load: $(uptime | awk -F'load average:' '{print $2}')"
echo ""
# Docker Swarm
echo -e "${YELLOW}🐳 Docker Swarm${NC}"
if docker node ls &>/dev/null; then
TOTAL_NODES=$(docker node ls | grep -c Ready || echo "0")
echo -e " ${GREEN}${NC} Swarm active ($TOTAL_NODES nodes)"
docker service ls --format "table {{.Name}}\t{{.Replicas}}" | head -10
else
echo -e " ${RED}${NC} Not a swarm manager"
fi
echo ""
# Services Health
echo -e "${YELLOW}🏥 Container Health${NC}"
HEALTHY=$(docker ps --filter "health=healthy" --format "{{.Names}}" | wc -l 2>/dev/null || echo "0")
UNHEALTHY=$(docker ps --filter "health=unhealthy" --format "{{.Names}}" | wc -l 2>/dev/null || echo "0")
TOTAL=$(docker ps --format "{{.Names}}" | wc -l 2>/dev/null || echo "0")
echo -e " Healthy: ${GREEN}$HEALTHY${NC}"
echo -e " Unhealthy: ${RED}$UNHEALTHY${NC}"
echo -e " Total: $TOTAL"
if [[ $UNHEALTHY -gt 0 ]]; then
echo -e " ${RED}⚠ Unhealthy containers:${NC}"
docker ps --filter "health=unhealthy" --format " - {{.Names}}"
fi
echo ""
# Storage
echo -e "${YELLOW}💾 Storage${NC}"
df -h / /mnt/nas 2>/dev/null | tail -n +2 | awk '{printf " %-20s %5s used of %5s\n", $6, $3, $2}'
if command -v zpool &>/dev/null && zpool list tank &>/dev/null; then
HEALTH=$(zpool list -H -o health tank)
if [[ "$HEALTH" == "ONLINE" ]]; then
echo -e " ZFS tank: ${GREEN}$HEALTH${NC}"
else
echo -e " ZFS tank: ${RED}$HEALTH${NC}"
fi
fi
echo ""
# Network
echo -e "${YELLOW}🌐 Network${NC}"
IP=$(hostname -I | awk '{print $1}')
echo " IP: $IP"
if command -v ethtool &>/dev/null; then
SPEED=$(ethtool eth0 2>/dev/null | grep Speed | awk '{print $2}' || echo "Unknown")
echo " Speed: $SPEED"
fi
if ping -c 1 8.8.8.8 &>/dev/null; then
echo -e " Internet: ${GREEN}✓ Connected${NC}"
else
echo -e " Internet: ${RED}✗ Disconnected${NC}"
fi
echo ""
# Security
echo -e "${YELLOW}🔒 Security${NC}"
if systemctl is-active --quiet fail2ban 2>/dev/null; then
BANNED=$(sudo fail2ban-client status sshd 2>/dev/null | grep "Currently banned" | awk '{print $4}' || echo "0")
echo -e " fail2ban: ${GREEN}✓ Active${NC} ($BANNED IPs banned)"
else
echo -e " fail2ban: ${YELLOW}⚠ Not running${NC}"
fi
echo ""
# Backups
echo -e "${YELLOW}💾 Backups${NC}"
if systemctl is-active --quiet restic-backup.timer 2>/dev/null; then
NEXT=$(systemctl list-timers | grep restic-backup | awk '{print $1, $2}')
echo -e " Restic timer: ${GREEN}✓ Active${NC}"
echo " Next backup: $NEXT"
else
echo -e " Restic timer: ${YELLOW}⚠ Not configured${NC}"
fi
echo ""
# Monitoring
echo -e "${YELLOW}📈 Monitoring${NC}"
if curl -s http://localhost:9100/metrics &>/dev/null; then
echo -e " node-exporter: ${GREEN}✓ Running${NC}"
else
echo -e " node-exporter: ${YELLOW}⚠ Not accessible${NC}"
fi
if curl -s http://192.168.1.196:3000 &>/dev/null; then
echo -e " Grafana: ${GREEN}✓ Accessible${NC}"
else
echo -e " Grafana: ${YELLOW}⚠ Not accessible${NC}"
fi
echo ""
# Quick recommendations
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
if [[ $UNHEALTHY -gt 0 ]]; then
echo -e "${YELLOW}⚠ Action needed: $UNHEALTHY unhealthy containers${NC}"
fi
DISK_USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')
if [[ $DISK_USAGE -gt 80 ]]; then
echo -e "${YELLOW}⚠ Warning: Disk usage at ${DISK_USAGE}%${NC}"
fi
echo ""
echo "For detailed validation: bash /workspace/homelab/scripts/validate_deployment.sh"
echo ""

87
scripts/setup_log_rotation.sh Executable file
View File

@@ -0,0 +1,87 @@
#!/bin/bash
# setup_log_rotation.sh - Configure log rotation for homelab services
set -euo pipefail
echo "Configuring log rotation for homelab services..."
# Docker logs
cat > /etc/logrotate.d/docker-containers << 'EOF'
/var/lib/docker/containers/*/*.log {
rotate 7
daily
compress
size=10M
missingok
delaycompress
copytruncate
}
EOF
# Traefik logs
cat > /etc/logrotate.d/traefik << 'EOF'
/var/log/traefik/*.log {
rotate 14
daily
compress
missingok
delaycompress
postrotate
docker service update --force traefik_traefik > /dev/null 2>&1 || true
endscript
}
EOF
# fail2ban logs
cat > /etc/logrotate.d/fail2ban-custom << 'EOF'
/var/log/fail2ban.log {
rotate 30
daily
compress
missingok
notifempty
postrotate
systemctl reload fail2ban > /dev/null 2>&1 || true
endscript
}
EOF
# Restic backup logs
cat > /etc/logrotate.d/restic-backup << 'EOF'
/var/log/restic_backup.log {
rotate 30
daily
compress
missingok
notifempty
}
EOF
# Caddy logs
cat > /etc/logrotate.d/caddy << 'EOF'
/var/log/caddy/*.log {
rotate 7
daily
compress
missingok
delaycompress
}
EOF
# Home lab deployment logs
cat > /etc/logrotate.d/homelab << 'EOF'
/var/log/homelab_deployment.log {
rotate 90
daily
compress
missingok
notifempty
}
EOF
echo "Testing logrotate configuration..."
logrotate -d /etc/logrotate.d/docker-containers
echo "Log rotation configured successfully."
echo "Logs will be rotated daily and compressed."
echo "Configuration files created in /etc/logrotate.d/"

22
scripts/setup_monitoring.sh Executable file
View File

@@ -0,0 +1,22 @@
#!/bin/bash
# setup_monitoring.sh - Deploy node-exporter and configure Grafana alerts
set -euo pipefail
echo "Deploying node-exporter stack..."
docker stack deploy -c /workspace/homelab/services/swarm/stacks/node-exporter-stack.yml monitoring
echo "Waiting for node-exporter to start..."
sleep 10
echo "Copying alert rules to Grafana provisioning directory..."
# Adjust this path to match your Grafana data directory
GRAFANA_PROVISIONING="/var/lib/docker/volumes/grafana-provisioning/_data/alerting"
sudo mkdir -p "$GRAFANA_PROVISIONING"
sudo cp /workspace/homelab/monitoring/grafana/alert_rules.yml "$GRAFANA_PROVISIONING/"
echo "Restarting Grafana to load new alert rules..."
docker service update --force grafana_grafana
echo "Monitoring setup complete."
echo "Check Grafana UI to verify alerts are loaded."

195
scripts/validate_deployment.sh Executable file
View File

@@ -0,0 +1,195 @@
#!/bin/bash
# validate_deployment.sh - Validation script to verify all homelab components
# Run this after deployment to ensure everything is working correctly
set -euo pipefail
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
PASSED=0
FAILED=0
WARNINGS=0
check_pass() {
echo -e "${GREEN}$1${NC}"
((PASSED++))
}
check_fail() {
echo -e "${RED}$1${NC}"
((FAILED++))
}
check_warn() {
echo -e "${YELLOW}$1${NC}"
((WARNINGS++))
}
echo "========================================="
echo "Home Lab Deployment Validation"
echo "Started at $(date)"
echo "========================================="
# Network Validation
echo -e "\n${YELLOW}[1/6] Network Configuration${NC}"
if ip -d link show | grep -q "vlan"; then
check_pass "VLANs configured"
else
check_warn "VLANs not detected (may not be configured yet)"
fi
if command -v ethtool >/dev/null 2>&1; then
SPEED=$(ethtool eth0 2>/dev/null | grep Speed | awk '{print $2}')
if [[ "$SPEED" == *"2500"* ]] || [[ "$SPEED" == *"5000"* ]]; then
check_pass "High-speed network detected: $SPEED"
else
check_warn "Network speed: $SPEED (expected 2.5Gb or higher)"
fi
else
check_warn "ethtool not installed, cannot verify network speed"
fi
# Storage Validation
echo -e "\n${YELLOW}[2/6] Storage Configuration${NC}"
if command -v zpool >/dev/null 2>&1; then
if zpool list tank >/dev/null 2>&1; then
HEALTH=$(zpool list -H -o health tank)
if [[ "$HEALTH" == "ONLINE" ]]; then
check_pass "ZFS pool 'tank' is ONLINE"
else
check_fail "ZFS pool 'tank' health: $HEALTH"
fi
else
check_warn "ZFS pool 'tank' not found (may not be on this node)"
fi
else
check_warn "ZFS not installed on this node"
fi
if mount | grep -q "/mnt/nas"; then
check_pass "NAS is mounted"
else
check_warn "NAS not mounted at /mnt/nas"
fi
if crontab -l 2>/dev/null | grep -q "prune_ai_models.sh"; then
check_pass "AI model pruning cron job configured"
else
check_warn "AI model pruning cron job not found"
fi
# Service Validation
echo -e "\n${YELLOW}[3/6] Docker Services${NC}"
if command -v docker >/dev/null 2>&1; then
if docker service ls >/dev/null 2>&1; then
TRAEFIK_COUNT=$(docker service ls | grep -c traefik || true)
if [[ $TRAEFIK_COUNT -ge 1 ]]; then
REPLICAS=$(docker service ls | grep traefik | awk '{print $4}')
check_pass "Traefik service running ($REPLICAS)"
else
check_warn "Traefik service not found in Swarm"
fi
if docker service ls | grep -q node-exporter; then
check_pass "node-exporter service running"
else
check_warn "node-exporter service not found"
fi
else
check_warn "Not a Swarm manager node"
fi
UNHEALTHY=$(docker ps --filter "health=unhealthy" --format "{{.Names}}" | wc -l)
if [[ $UNHEALTHY -eq 0 ]]; then
check_pass "No unhealthy containers"
else
check_fail "$UNHEALTHY unhealthy containers detected"
docker ps --filter "health=unhealthy" --format " - {{.Names}}"
fi
else
check_fail "Docker not installed"
fi
# Security Validation
echo -e "\n${YELLOW}[4/6] Security Configuration${NC}"
if systemctl is-active --quiet fail2ban 2>/dev/null; then
check_pass "fail2ban service is active"
BANNED=$(sudo fail2ban-client status sshd 2>/dev/null | grep "Currently banned" | awk '{print $4}')
if [[ -n "$BANNED" ]]; then
check_pass "fail2ban protecting SSH ($BANNED IPs banned)"
fi
else
check_warn "fail2ban not installed or not running"
fi
if sudo iptables -L >/dev/null 2>&1; then
RULES=$(sudo iptables -L | grep -c "ACCEPT\|DROP" || true)
if [[ $RULES -gt 0 ]]; then
check_pass "Firewall rules configured ($RULES rules)"
else
check_warn "No firewall rules detected"
fi
else
check_warn "Cannot check iptables (permission denied)"
fi
# Monitoring Validation
echo -e "\n${YELLOW}[5/6] Monitoring & Metrics${NC}"
if curl -s http://localhost:9100/metrics >/dev/null 2>&1; then
check_pass "node-exporter metrics accessible"
else
check_warn "node-exporter not accessible on this node"
fi
if curl -s http://192.168.1.196:3000 >/dev/null 2>&1; then
check_pass "Grafana UI accessible"
else
check_warn "Grafana not accessible (may not be on this node)"
fi
# Backup Validation
echo -e "\n${YELLOW}[6/6] Backup Configuration${NC}"
if systemctl list-timers --all | grep -q restic-backup.timer; then
if systemctl is-active --quiet restic-backup.timer; then
check_pass "Restic backup timer is active"
NEXT_RUN=$(systemctl list-timers | grep restic-backup | awk '{print $1, $2}')
echo " Next backup: $NEXT_RUN"
else
check_fail "Restic backup timer is not active"
fi
else
check_warn "Restic backup timer not found"
fi
if command -v restic >/dev/null 2>&1; then
check_pass "Restic is installed"
else
check_warn "Restic not installed"
fi
# Summary
echo -e "\n========================================="
echo "Validation Summary"
echo "========================================="
echo -e "${GREEN}Passed: $PASSED${NC}"
echo -e "${YELLOW}Warnings: $WARNINGS${NC}"
echo -e "${RED}Failed: $FAILED${NC}"
if [[ $FAILED -eq 0 ]]; then
echo -e "\n${GREEN}✓ Deployment validation successful!${NC}"
exit 0
else
echo -e "\n${RED}✗ Some checks failed. Review above for details.${NC}"
exit 1
fi

34
scripts/vlan_firewall.sh Executable file
View File

@@ -0,0 +1,34 @@
#!/bin/bash
# vlan_firewall.sh - Configure firewall rules for VLAN isolation
# This script sets up basic firewall rules for TP-Link router or iptables-based systems
set -euo pipefail
echo "Configuring VLAN firewall rules..."
# VLAN 10: Management (192.168.10.0/24)
# VLAN 20: Services (192.168.20.0/24)
# VLAN 1: Default LAN (192.168.1.0/24)
# Allow management VLAN to access all networks
sudo iptables -A FORWARD -s 192.168.10.0/24 -j ACCEPT
# Allow services VLAN to access default LAN on specific ports only
# Port 53 (DNS), 80 (HTTP), 443 (HTTPS), 9000 (Portainer), 8080 (Traefik)
sudo iptables -A FORWARD -s 192.168.20.0/24 -d 192.168.1.0/24 -p tcp -m multiport --dports 53,80,443,9000,8080 -j ACCEPT
sudo iptables -A FORWARD -s 192.168.20.0/24 -d 192.168.1.0/24 -p udp --dport 53 -j ACCEPT
# Block all other traffic from services VLAN to default LAN
sudo iptables -A FORWARD -s 192.168.20.0/24 -d 192.168.1.0/24 -j DROP
# Allow default LAN to access services VLAN
sudo iptables -A FORWARD -s 192.168.1.0/24 -d 192.168.20.0/24 -j ACCEPT
# Allow established connections
sudo iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
echo "Saving iptables rules..."
sudo iptables-save | sudo tee /etc/iptables/rules.v4
echo "VLAN firewall rules configured."
echo "Note: For TP-Link router, configure ACLs via web UI using similar logic."

28
scripts/zfs_setup.sh Executable file
View File

@@ -0,0 +1,28 @@
#!/bin/bash
# zfs_setup.sh - Create ZFS pool 'tank' on Proxmox host SSDs
# Adjust device names (/dev/sda /dev/sdb) as appropriate for your hardware.
set -euo pipefail
POOL_NAME="tank"
DEVICES=(/dev/sda /dev/sdb)
# Check if pool already exists
if zpool list "$POOL_NAME" >/dev/null 2>&1; then
echo "ZFS pool '$POOL_NAME' already exists. Exiting."
exit 0
fi
# Create the pool with RAID-Z (single parity) for redundancy
zpool create "$POOL_NAME" raidz "${DEVICES[0]}" "${DEVICES[1]}"
# Enable compression for better space efficiency
zfs set compression=on "$POOL_NAME"
# Create a dataset for Docker volumes
zfs create "$POOL_NAME/docker"
# Set appropriate permissions for Docker to use the dataset
chmod 777 "/$POOL_NAME/docker"
echo "ZFS pool '$POOL_NAME' created and configured."

View File

@@ -0,0 +1,5 @@
[Definition]
# Portainer authentication failure filter
failregex = ^.*"remote_addr":"<HOST>".*"status":401.*$
^.*Failed login attempt from <HOST>.*$
ignoreregex =

View File

@@ -0,0 +1,5 @@
[Definition]
# Traefik authentication failure filter
failregex = ^<HOST> - \S+ \[.*\] "\S+ \S+ \S+" 401 .*$
^.*ClientIP":"<HOST>".*"RequestMethod":"\S+".*"OriginStatus":401.*$
ignoreregex =

View File

@@ -0,0 +1,30 @@
[DEFAULT]
# Ban duration: 1 hour
bantime = 3600
# Find time window: 10 minutes
findtime = 600
# Max retry attempts before ban
maxretry = 5
# Backend for monitoring
backend = systemd
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 3
[portainer]
enabled = true
port = 9000,9443
filter = portainer
logpath = /var/log/portainer/portainer.log
maxretry = 5
[traefik-auth]
enabled = true
port = http,https
filter = traefik-auth
logpath = /var/log/traefik/access.log
maxretry = 5

View File

@@ -0,0 +1,36 @@
{
# Global options
admin off
}
# Main fallback server
:80 {
root * /srv/maintenance
file_server
# Serve maintenance page for all requests
handle {
rewrite * /maintenance.html
file_server
}
# Log all requests
log {
output file /var/log/caddy/access.log
}
}
# Optional: HTTPS fallback (if you have certificates)
:443 {
root * /srv/maintenance
file_server
handle {
rewrite * /maintenance.html
file_server
}
log {
output file /var/log/caddy/access.log
}
}

View File

@@ -0,0 +1,27 @@
version: '3.8'
services:
caddy:
image: caddy:latest
container_name: caddy_fallback
restart: unless-stopped
ports:
- "8080:80"
- "8443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
- ./maintenance.html:/srv/maintenance/maintenance.html
- caddy_data:/data
- caddy_config:/config
- caddy_logs:/var/log/caddy
networks:
- caddy_net
volumes:
caddy_data:
caddy_config:
caddy_logs:
networks:
caddy_net:
driver: bridge

View File

@@ -0,0 +1,68 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Service Maintenance</title>
<style>
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
min-height: 100vh;
display: flex;
align-items: center;
justify-content: center;
color: #fff;
}
.container {
text-align: center;
padding: 3rem;
background: rgba(255, 255, 255, 0.1);
backdrop-filter: blur(10px);
border-radius: 20px;
box-shadow: 0 8px 32px rgba(0, 0, 0, 0.3);
max-width: 600px;
}
h1 {
font-size: 3rem;
margin-bottom: 1rem;
animation: pulse 2s infinite;
}
p {
font-size: 1.25rem;
line-height: 1.6;
margin-bottom: 2rem;
}
.status {
display: inline-block;
padding: 0.75rem 2rem;
background: rgba(255, 255, 255, 0.2);
border-radius: 50px;
font-weight: 600;
}
@keyframes pulse {
0%, 100% { opacity: 1; }
50% { opacity: 0.7; }
}
</style>
</head>
<body>
<div class="container">
<h1>🔧 Maintenance Mode</h1>
<p>Our services are temporarily unavailable due to maintenance or system updates.</p>
<p>We'll be back online shortly. Thank you for your patience.</p>
<div class="status">⏳ Please check back soon</div>
</div>
</body>
</html>

View File

@@ -0,0 +1,34 @@
# https://github.com/dockur/macos
services:
macos:
image: dockurr/macos
container_name: macos
environment:
VERSION: "15"
DISK_SIZE: "50G"
RAM_SIZE: "6G"
CPU_CORES: "4"
# DHCP: "Y" # if enabled you must create a macvlan
devices:
- /dev/kvm
- /dev/net/tun
cap_add:
- NET_ADMIN
ports:
- 8006:8006
- 5900:5900/tcp
- 5900:5900/udp
volumes:
- ./macos:/storage
restart: always
stop_grace_period: 2m
networks:
macos:
ipv4_address: 172.70.20.3
networks:
macos:
ipam:
config:
- subnet: 172.70.20.0/29
name: macos

View File

@@ -0,0 +1,107 @@
# Place this at ~/docker/docker-compose.yml (overwrite existing if ready)
# NOTE: the top-level "version" key is optional in modern Compose v2/v3 usage.
services:
tsdproxy:
image: almeidapaulopt/tsdproxy:1
container_name: tsdproxy
restart: unless-stopped
network_mode: host
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- tsd_data:/data
- ./tsdproxy/config:/config
ports:
- "8080:8080"
cap_add:
- NET_ADMIN
- SYS_MODULE
environment:
# You may optionally set an auth key here, or add it to /config/tsdproxy.yaml later
TAILSCALE_AUTHKEY: "tskey-auth-kUFWCyDau321CNTRL-Vdt9PFUDUqAb7iQYLvCjqAkhcnq3aTTtg" # (optional — recommended to use config file)
TS_EXTRA_ARGS: "--accept-routes"
db:
image: mariadb:11
container_name: nextcloud-db
restart: unless-stopped
environment:
MYSQL_ROOT_PASSWORD: supersecurepassword
MYSQL_DATABASE: nextcloud
MYSQL_USER: nextcloud
MYSQL_PASSWORD: nextcloudpassword
volumes:
- db_data:/var/lib/mysql
nextcloud:
image: nextcloud:29
container_name: nextcloud-app
restart: unless-stopped
depends_on:
- db
environment:
MYSQL_HOST: db
MYSQL_DATABASE: nextcloud
MYSQL_USER: nextcloud
MYSQL_PASSWORD: nextcloudpassword
volumes:
- /mnt/nextcloud-data:/var/www/html/data
- /mnt/nextcloud-config:/var/www/html/config
labels:
- "traefik.enable=true"
- "traefik.http.routers.nextcloud.rule=Host(`nextcloud.sj98.duckdns.org`)"
- "traefik.http.routers.nextcloud.entrypoints=websecure"
- "traefik.http.routers.nextcloud.tls.certresolver=letsencrypt"
- "traefik.http.services.nextcloud.loadbalancer.server.port=80"
- "tsdproxy.enable=true"
- "tsdproxy.name=nextcloud"
plex:
image: lscr.io/linuxserver/plex:latest
container_name: plex
restart: unless-stopped
network_mode: "host"
environment:
PLEX_CLAIM: claim-your-plex-claim
PUID: 1000
PGID: 1000
TZ: America/Chicago
volumes:
- /mnt/media:/media
labels:
- "traefik.enable=true"
- "traefik.tcp.routers.plex.rule=HostSNI(`plex.sj98.duckdns.org`)"
- "traefik.tcp.routers.plex.entrypoints=websecure"
- "traefik.tcp.services.plex.loadbalancer.server.port=32400"
- "tsdproxy.enable=true"
- "tsdproxy.name=plex"
jellyfin:
image: jellyfin/jellyfin:latest
container_name: jellyfin
restart: unless-stopped
network_mode: "host"
environment:
PUID: 1000
PGID: 1000
TZ: America/Chicago
volumes:
- /mnt/media:/media
labels:
- "traefik.enable=true"
- "traefik.tcp.routers.jellyfin.rule=HostSNI(`jellyfin.sj98.duckdns.org`)"
- "traefik.tcp.routers.jellyfin.entrypoints=websecure"
- "traefik.tcp.services.jellyfin.loadbalancer.server.port=8096"
- "tsdproxy.enable=true"
- "tsdproxy.name=jellyfin"
watchtower:
image: containrrr/watchtower
container_name: watchtower
restart: unless-stopped
volumes:
- /var/run/docker.sock:/var/run/docker.sock
command: --interval 3600
volumes:
db_data:
tsd_data:

View File

@@ -0,0 +1,87 @@
version: "3.9"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redisdata:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 5
networks:
- web
db:
image: docker.io/library/postgres:15
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
healthcheck:
test: ["CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB} || exit 1"]
interval: 10s
timeout: 5s
retries: 5
networks:
- web
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
ports:
- "8000:8000"
volumes:
- data:/usr/src/paperless/data
- media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
environment:
PAPERLESS_DBHOST: db
PAPERLESS_DBNAME: paperless
PAPERLESS_DBUSER: paperless
PAPERLESS_DBPASS: paperless
PAPERLESS_REDIS: redis://broker:6379/0
PAPERLESS_TIME_ZONE: "America/Chicago"
PAPERLESS_SECRET_KEY: "replace-with-a-64-char-random-string"
PAPERLESS_ADMIN_USER: admin@example.local
PAPERLESS_ADMIN_PASSWORD: changeme
PAPERLESS_ALLOWED_HOSTS: '["paperless.sj98.duckdns.org"]'
PAPERLESS_CSRF_TRUSTED_ORIGINS: '["https://paperless.sj98.duckdns.org"]'
# Add / adjust these for running behind Traefik:
PAPERLESS_URL: "https://paperless.sj98.duckdns.org" # required/preferred
PAPERLESS_PROXY_SSL_HEADER: '["HTTP_X_FORWARDED_PROTO","https"]' # tells Django to treat X-Forwarded-Proto=https as TLS
PAPERLESS_USE_X_FORWARD_HOST: "true" # optional, can help URL generation
PAPERLESS_USE_X_FORWARD_PORT: "true" # optional
# Optional: restrict trusted proxies to your docker network or Traefik IP
# PAPERLESS_TRUSTED_PROXIES: "172.18.0.0/16" # <-- replace with your web network subnet or Traefik IP if you want to lock down
networks:
- web
labels:
- "traefik.enable=true"
- "traefik.http.routers.paperless.rule=Host(`paperless.sj98.duckdns.org`)"
- "traefik.http.routers.paperless.entrypoints=websecure"
- "traefik.http.routers.paperless.tls=true"
- "traefik.http.routers.paperless.tls.certresolver=duckdns"
- "traefik.http.services.paperless.loadbalancer.server.port=8000"
- "tsdproxy.enable=true"
- "tsdproxy.name=paperless"
volumes:
data:
media:
pgdata:
redisdata:
networks:
web:
external: true

View File

@@ -0,0 +1,14 @@
version: '3.8'
services:
portainer-agent:
image: portainer/agent:latest
container_name: portainer-agent
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/lib/docker/volumes:/var/lib/docker/volumes
environment:
AGENT_CLUSTER_ADDR: 192.168.1.81 # Replace with the actual IP address
AGENT_PORT: 9001
ports:
- "9001:9001" # Port for agent communication
restart: always

View File

@@ -0,0 +1,39 @@
version: '3.8'
services:
rustdesk-hbbs:
image: rustdesk/rustdesk-server:latest
container_name: rustdesk-hbbs
restart: unless-stopped
platform: linux/arm64
command: ["hbbs", "--relay-servers", "192.168.1.245:21117"]
volumes:
- rustdesk_data:/root
ports:
- "21115:21115/tcp"
- "21115:21115/udp"
- "21116:21116/tcp"
- "21116:21116/udp"
rustdesk-hbbr:
image: rustdesk/rustdesk-server:latest
container_name: rustdesk-hbbr
restart: unless-stopped
platform: linux/arm64
command: ["hbbr"]
volumes:
- rustdesk_data:/root
ports:
- "21117:21117/tcp"
- "21118:21118/udp"
- "21119:21119/tcp"
- "21119:21119/udp"
environment:
- TOTAL_BANDWIDTH=20480
- SINGLE_BANDWIDTH=128
- LIMIT_SPEED=100Mb/s
- DOWNGRADE_START_CHECK=600
- DOWNGRADE_THRESHOLD=0.9
volumes:
rustdesk_data:

View File

@@ -0,0 +1,53 @@
version: "3.9"
services:
traefik:
image: traefik:latest
container_name: traefik
restart: unless-stopped
environment:
# Replace this placeholder with your DuckDNS token
- DUCKDNS_TOKEN=03a4d8f7-695a-4f51-b66c-cc2fac555fc1
networks:
- web
ports:
- "80:80" # http
- "443:443" # https
- "8089:8089" # traefik dashboard (secure it if exposed)
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./letsencrypt:/letsencrypt # <-- keep this directory inside WSL filesystem
- ./traefik_dynamic.yml:/etc/traefik/traefik_dynamic.yml:ro
command:
- --api.insecure=false
- --api.dashboard=true
- --entrypoints.web.address=:80
- --entrypoints.websecure.address=:443
- --entrypoints.dashboard.address=:8089
- --providers.docker=true
- --providers.docker.endpoint=unix:///var/run/docker.sock
- --providers.docker.exposedbydefault=false
- --providers.file.filename=/etc/traefik/traefik_dynamic.yml
- --providers.file.watch=true
- --certificatesresolvers.duckdns.acme.email=sterlenjohnson6@gmail.com
- --certificatesresolvers.duckdns.acme.storage=/letsencrypt/acme.json
- --certificatesresolvers.duckdns.acme.dnschallenge.provider=duckdns
- --certificatesresolvers.duckdns.acme.dnschallenge.disablepropagationcheck=true
whoami:
image: containous/whoami:latest
container_name: whoami
restart: unless-stopped
networks:
- web
labels:
- "traefik.enable=true"
- "traefik.http.routers.whoami.rule=Host(`whoami.sj98.duckdns.org`)"
- "traefik.http.routers.whoami.entrypoints=websecure"
- "traefik.http.routers.whoami.tls=true"
- "traefik.http.routers.whoami.tls.certresolver=duckdns"
networks:
web:
external: true

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,18 @@
# traefik_dynamic.yml
http:
routers:
traefik-dashboard:
entryPoints:
- dashboard
rule: "Host(`localhost`) && (PathPrefix(`/dashboard`) || PathPrefix(`/`))"
service: "api@internal"
middlewares:
- dashboard-auth
middlewares:
dashboard-auth:
basicAuth:
# replace the example hash below with a hash you generate (see step 3)
users:
- "admin:$2y$05$8CZrANjYoKRm5VG6QO8kseVpumnDXnLDU2vREgfMm9F/JdsTpq.iy"
- "Sterl:$2y$05$t8LnSDA190LOs2Wpmbt/p.7dFHzZKDT4BMLjSjqsxg0i6re5I9wlm"

View File

@@ -0,0 +1,198 @@
# Full corrected Immich/Media stack (Traefik-ready)
# Requires pre-existing external overlay: traefik-public
version: '3.9'
networks:
traefik-public:
external: true
media-backend:
driver: overlay
volumes:
plex_config:
jellyfin_config:
immich_upload:
immich_model_cache:
immich_db:
immich_redis:
homarr_config:
services:
homarr:
image: ghcr.io/ajnart/homarr:latest
networks:
- traefik-public
- media-backend
volumes:
- homarr_config:/app/data
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
- TZ=America/Chicago
deploy:
placement:
constraints:
- node.labels.leader == true
- node.role == manager
labels:
- "traefik.enable=true"
- "traefik.http.routers.homarr-router.rule=Host(`homarr.sj98.duckdns.org`)"
- "traefik.http.routers.homarr-router.entrypoints=websecure"
- "traefik.http.routers.homarr-router.tls.certresolver=leresolver"
- "traefik.http.services.homarr.loadbalancer.server.port=7575"
- "traefik.docker.network=traefik-public"
resources:
limits:
memory: 512M
reservations:
memory: 128M
restart_policy:
condition: on-failure
max_attempts: 3
plex:
image: plexinc/pms-docker:latest
hostname: plex
networks:
- traefik-public
- media-backend
volumes:
- plex_config:/config
- /mnt/media:/media:ro
environment:
- TZ=America/Chicago
- PLEX_CLAIM=claim-xxxxxxxxxxxx
- ADVERTISE_IP=http://192.168.1.196:32400/
deploy:
placement:
constraints:
- node.role == manager
labels:
- "traefik.enable=true"
- "traefik.http.routers.plex-router.rule=Host(`plex.sj98.duckdns.org`)"
- "traefik.http.routers.plex-router.entrypoints=websecure"
- "traefik.http.routers.plex-router.tls.certresolver=leresolver"
- "traefik.http.services.plex.loadbalancer.server.port=32400"
- "traefik.docker.network=traefik-public"
restart_policy:
condition: on-failure
max_attempts: 3
jellyfin:
image: jellyfin/jellyfin:latest
networks:
- traefik-public
- media-backend
volumes:
- jellyfin_config:/config
- /mnt/media:/media:ro
environment:
- TZ=America/Chicago
deploy:
placement:
constraints:
- node.role == manager
labels:
- "traefik.enable=true"
- "traefik.http.routers.jellyfin-router.rule=Host(`jellyfin.sj98.duckdns.org`)"
- "traefik.http.routers.jellyfin-router.entrypoints=websecure"
- "traefik.http.routers.jellyfin-router.tls.certresolver=leresolver"
- "traefik.http.services.jellyfin.loadbalancer.server.port=8096"
- "traefik.docker.network=traefik-public"
restart_policy:
condition: on-failure
max_attempts: 3
immich-server:
image: ghcr.io/immich-app/immich-server:release
networks:
- traefik-public
- media-backend
volumes:
- /mnt/media/immich:/usr/src/app/upload
- /etc/localtime:/etc/localtime:ro
environment:
- DB_HOSTNAME=immich-db
- DB_USERNAME=immich
- DB_PASSWORD=immich
- DB_DATABASE_NAME=immich
- REDIS_HOSTNAME=immich-redis
- TZ=America/Chicago
depends_on:
- immich-redis
- immich-db
deploy:
placement:
constraints:
- node.labels.leader == true
- node.role == manager
labels:
- "traefik.enable=true"
- "traefik.http.routers.immich-server-router.rule=Host(`immich.sj98.duckdns.org`)"
- "traefik.http.routers.immich-server-router.entrypoints=websecure"
- "traefik.http.routers.immich-server-router.tls.certresolver=leresolver"
- "traefik.http.services.immich-server.loadbalancer.server.port=2283"
- "traefik.docker.network=traefik-public"
# Immich-specific headers and settings
- "traefik.http.routers.immich-server-router.middlewares=immich-headers"
- "traefik.http.middlewares.immich-headers.headers.customrequestheaders.X-Forwarded-Proto=https"
- "traefik.http.services.immich-server.loadbalancer.passhostheader=true"
resources:
limits:
memory: 2G
restart_policy:
condition: on-failure
max_attempts: 3
immich-machine-learning:
image: ghcr.io/immich-app/immich-machine-learning:release
networks:
- media-backend
volumes:
- immich_model_cache:/cache
environment:
- TZ=America/Chicago
depends_on:
- immich-server
deploy:
placement:
constraints:
- node.labels.heavy == true
- node.labels.ai == true
restart_policy:
condition: on-failure
max_attempts: 3
immich-redis:
image: redis:7-alpine
networks:
- media-backend
volumes:
- immich_redis:/data
deploy:
placement:
constraints:
- node.labels.leader == true
- node.role == manager
restart_policy:
condition: on-failure
max_attempts: 3
immich-db:
image: tensorchord/pgvecto-rs:pg14-v0.2.0
networks:
- media-backend
volumes:
- /mnt/database/immich:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=immich
- POSTGRES_USER=immich
- POSTGRES_DB=immich
deploy:
placement:
constraints:
- node.labels.leader == true
- node.role == manager
restart_policy:
condition: on-failure
max_attempts: 3

View File

@@ -0,0 +1,54 @@
version: '3.9'
networks:
traefik-public:
external: true
configs:
traefik_yml:
external: true
name: traefik.yml
services:
traefik:
image: traefik:v3.6.1
ports:
- "80:80"
- "443:443"
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /mnt/traefik/letsencrypt:/letsencrypt
networks:
- traefik-public
environment:
- DUCKDNS_TOKEN=14880437-fcee-4206-800a-af057cdfffe2
configs:
- source: traefik_yml
target: /etc/traefik/traefik.yml
deploy:
placement:
constraints:
- node.role == manager
labels:
- "traefik.enable=true"
- "traefik.http.routers.traefik.rule=Host(`traefik.sj98.duckdns.org`)"
- "traefik.http.routers.traefik.entrypoints=websecure"
- "traefik.http.routers.traefik.tls.certresolver=leresolver"
- "traefik.http.routers.traefik.service=api@internal"
- "traefik.http.services.traefik.loadbalancer.server.port=8080"
whoami:
image: traefik/whoami
networks:
- traefik-public
deploy:
labels:
- "traefik.enable=true"
- "traefik.http.routers.whoami.rule=Host(`whoami.sj98.duckdns.org`)"
- "traefik.http.routers.whoami.entrypoints=websecure"
- "traefik.http.routers.whoami.tls.certresolver=leresolver"
- "traefik.http.services.whoami.loadbalancer.server.port=80"

View File

@@ -0,0 +1,100 @@
version: '3.9'
networks:
traefik-public:
external: true
productivity-backend:
driver: overlay
volumes:
nextcloud_data:
nextcloud_db:
nextcloud_redis:
services:
nextcloud-db:
image: postgres:15-alpine
volumes:
- /mnt/database/nextcloud:/var/lib/postgresql/data
environment:
- POSTGRES_DB=nextcloud
- POSTGRES_USER=nextcloud
- POSTGRES_PASSWORD=nextcloud # Replace with a secure password in production
networks:
- productivity-backend
deploy:
placement:
constraints:
- node.labels.leader == true
restart_policy:
condition: on-failure
nextcloud-redis:
image: redis:7-alpine
volumes:
- nextcloud_redis:/data
networks:
- productivity-backend
deploy:
placement:
constraints:
- node.labels.leader == true
restart_policy:
condition: on-failure
nextcloud:
image: nextcloud:latest
volumes:
- /mnt/nextcloud_apps:/var/www/html/custom_apps
- /mnt/nextcloud_config:/var/www/html/config
- /mnt/nextcloud_data:/var/www/html/data
environment:
- POSTGRES_HOST=nextcloud-db
- POSTGRES_DB=nextcloud
- POSTGRES_USER=nextcloud
- POSTGRES_PASSWORD=nextcloud # Replace with a secure password in production
- REDIS_HOST=nextcloud-redis
- NEXTCLOUD_ADMIN_USER=admin # Replace with your desired admin username
- NEXTCLOUD_ADMIN_PASSWORD=password # Replace with a secure password
- NEXTCLOUD_TRUSTED_DOMAINS=nextcloud.sj98.duckdns.org
- OVERWRITEPROTOCOL=https
- OVERWRITEHOST=nextcloud.sj98.duckdns.org
- TRUSTED_PROXIES=172.16.0.0/12
depends_on:
- nextcloud-db
- nextcloud-redis
networks:
- traefik-public
- productivity-backend
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 2G
reservations:
memory: 512M
restart_policy:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.http.routers.nextcloud.rule=Host(`nextcloud.sj98.duckdns.org`)"
- "traefik.http.routers.nextcloud.entrypoints=websecure"
- "traefik.http.routers.nextcloud.tls.certresolver=leresolver"
- "traefik.http.services.nextcloud.loadbalancer.server.port=80"
- "traefik.docker.network=traefik-public"
# Nextcloud-specific middlewares
- "traefik.http.routers.nextcloud.middlewares=nextcloud-chain"
- "traefik.http.middlewares.nextcloud-chain.chain.middlewares=nextcloud-caldav,nextcloud-headers"
# CalDAV/CardDAV redirect
- "traefik.http.middlewares.nextcloud-caldav.redirectregex.regex=^https://(.*)/.well-known/(card|cal)dav"
- "traefik.http.middlewares.nextcloud-caldav.redirectregex.replacement=https://$$1/remote.php/dav/"
- "traefik.http.middlewares.nextcloud-caldav.redirectregex.permanent=true"
# Security headers
- "traefik.http.middlewares.nextcloud-headers.headers.stsSeconds=31536000"
- "traefik.http.middlewares.nextcloud-headers.headers.stsIncludeSubdomains=true"
- "traefik.http.middlewares.nextcloud-headers.headers.stsPreload=true"
- "traefik.http.middlewares.nextcloud-headers.headers.forceSTSHeader=true"
- "traefik.http.middlewares.nextcloud-headers.headers.customFrameOptionsValue=SAMEORIGIN"
- "traefik.http.middlewares.nextcloud-headers.headers.customResponseHeaders.X-Robots-Tag=noindex,nofollow"

View File

@@ -0,0 +1,55 @@
version: '3.8'
networks:
traefik-public:
external: true
volumes:
openwebui_data:
services:
openwebui:
image: ghcr.io/open-webui/open-webui:0.3.32
volumes:
- openwebui_data:/app/backend/data
networks:
- traefik-public
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
deploy:
placement:
constraints:
- node.labels.heavy == true
resources:
limits:
memory: 4G
cpus: '4.0'
reservations:
memory: 2G
cpus: '1.0'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.openwebui.rule=Host(`ai.sj98.duckdns.org`)"
- "traefik.http.routers.openwebui.entrypoints=websecure"
- "traefik.http.routers.openwebui.tls.certresolver=leresolver"
- "traefik.http.services.openwebui.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=openwebui"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"

View File

@@ -0,0 +1,409 @@
version: '3.8'
networks:
traefik-public:
external: true
homelab-backend:
driver: overlay
volumes:
paperless_data:
paperless_media:
paperless_db:
paperless_redis:
openwebui_data:
stirling_pdf_data:
searxng_data:
n8n_data:
secrets:
paperless_db_password:
external: true
paperless_secret_key:
external: true
services:
n8n:
image: n8nio/n8n:latest
volumes:
- n8n_data:/home/node/.n8n
- /var/run/docker.sock:/var/run/docker.sock
networks:
- traefik-public
environment:
- N8N_HOST=n8n.sj98.duckdns.org
- N8N_PROTOCOL=https
- NODE_ENV=production
- WEBHOOK_URL=https://n8n.sj98.duckdns.org/
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://localhost:5678/healthz || exit 1"]
interval: 30s
timeout: 10s
retries: 3
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 1G
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
labels:
- "traefik.enable=true"
- "traefik.http.routers.n8n.rule=Host(`n8n.sj98.duckdns.org`)"
- "traefik.http.routers.n8n.entrypoints=websecure"
- "traefik.http.routers.n8n.tls.certresolver=leresolver"
- "traefik.http.services.n8n.loadbalancer.server.port=5678"
- "traefik.docker.network=traefik-public"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
openwebui:
image: ghcr.io/open-webui/open-webui:0.3.32
volumes:
- openwebui_data:/app/backend/data
networks:
- traefik-public
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
deploy:
placement:
constraints:
- node.labels.heavy == true
resources:
limits:
memory: 4G
cpus: '4.0'
reservations:
memory: 2G
cpus: '1.0'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.openwebui.rule=Host(`ai.sj98.duckdns.org`)"
- "traefik.http.routers.openwebui.entrypoints=websecure"
- "traefik.http.routers.openwebui.tls.certresolver=leresolver"
- "traefik.http.services.openwebui.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=openwebui"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
paperless-redis:
image: redis:7-alpine
volumes:
- paperless_redis:/data
networks:
- homelab-backend
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 30s
timeout: 3s
retries: 3
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 256M
cpus: '0.5'
reservations:
memory: 64M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
paperless-db:
image: postgres:15-alpine
volumes:
- paperless_db:/var/lib/postgresql/data
networks:
- homelab-backend
environment:
- POSTGRES_DB=paperless
- POSTGRES_USER=paperless
- POSTGRES_PASSWORD_FILE=/run/secrets/paperless_db_password
secrets:
- paperless_db_password
healthcheck:
test: ["CMD-SHELL", "pg_isready -U paperless"]
interval: 30s
timeout: 5s
retries: 3
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 512M
cpus: '1.0'
reservations:
memory: 256M
cpus: '0.25'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
paperless:
image: ghcr.io/paperless-ngx/paperless-ngx:2.19.3
volumes:
- paperless_data:/usr/src/paperless/data
- paperless_media:/usr/src/paperless/media
environment:
- PAPERLESS_REDIS=redis://paperless-redis:6379
- PAPERLESS_DBHOST=paperless-db
- PAPERLESS_DBNAME=paperless
- PAPERLESS_DBUSER=paperless
- PAPERLESS_DBPASS_FILE=/run/secrets/paperless_db_password
- PAPERLESS_URL=https://paperless.sj98.duckdns.org
- PAPERLESS_SECRET_KEY_FILE=/run/secrets/paperless_secret_key
- TZ=America/Chicago
secrets:
- paperless_db_password
- paperless_secret_key
depends_on:
- paperless-redis
- paperless-db
networks:
- traefik-public
- homelab-backend
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/api/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 90s
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 1536M
cpus: '2.0'
reservations:
memory: 768M
cpus: '0.5'
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.paperless.rule=Host(`paperless.sj98.duckdns.org`)"
- "traefik.http.routers.paperless.entrypoints=websecure"
- "traefik.http.routers.paperless.tls.certresolver=leresolver"
- "traefik.http.services.paperless.loadbalancer.server.port=8000"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=paperless"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
stirling-pdf:
image: frooodle/s-pdf:0.18.1
volumes:
- stirling_pdf_data:/configs
environment:
- DOCKER_ENABLE_SECURITY=false
- INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
- LANGS=en_US
networks:
- traefik-public
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 1536M
cpus: '2.0'
reservations:
memory: 768M
cpus: '0.5'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.pdf.rule=Host(`pdf.sj98.duckdns.org`)"
- "traefik.http.routers.pdf.entrypoints=websecure"
- "traefik.http.routers.pdf.tls.certresolver=leresolver"
- "traefik.http.services.pdf.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=pdf"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
searxng:
image: searxng/searxng:2024.11.20-e9f6095cc
volumes:
- searxng_data:/etc/searxng
environment:
- SEARXNG_BASE_URL=https://search.sj98.duckdns.org/
networks:
- traefik-public
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/healthz"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 1536M
cpus: '2.0'
reservations:
memory: 512M
cpus: '0.5'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.searxng.rule=Host(`search.sj98.duckdns.org`)"
- "traefik.http.routers.searxng.entrypoints=websecure"
- "traefik.http.routers.searxng.tls.certresolver=leresolver"
- "traefik.http.services.searxng.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=search"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
watchtower:
image: containrrr/watchtower:1.7.1
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- DOCKER_API_VERSION=1.44
command: --cleanup --interval 86400
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 256M
cpus: '0.25'
reservations:
memory: 64M
cpus: '0.05'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
tsdproxy:
image: almeidapaulopt/tsdproxy:v0.5.1
networks:
- traefik-public
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /srv/tsdproxy/config/tsdproxy.yaml:/config/tsdproxy.yaml:ro
- /srv/tsdproxy/data:/data
deploy:
resources:
limits:
memory: 256M
cpus: '0.25'
reservations:
memory: 64M
cpus: '0.05'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
labels:
- "traefik.enable=true"
- "traefik.http.routers.tsdproxy.rule=Host(`tsdproxy.sj98.duckdns.org`)"
- "traefik.http.routers.tsdproxy.entrypoints=websecure"
- "traefik.http.routers.tsdproxy.tls.certresolver=leresolver"
- "traefik.http.services.tsdproxy.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=tsdproxy"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"

View File

@@ -0,0 +1,104 @@
version: '3.8'
networks:
traefik-public:
external: true
gitea-internal:
driver: overlay
attachable: true
volumes:
gitea_data:
gitea_db_data:
secrets:
gitea_db_password:
external: true
services:
gitea:
image: gitea/gitea:latest
volumes:
- gitea_data:/data
networks:
- traefik-public
- gitea-internal
ports:
- "2222:22"
environment:
- USER_UID=1000
- USER_GID=1000
- GITEA__database__DB_TYPE=postgres
- GITEA__database__HOST=gitea-db:5432
- GITEA__database__NAME=gitea
- GITEA__database__USER=gitea
- GITEA__database__PASSWD_FILE=/run/secrets/gitea_db_password
- GITEA__server__DOMAIN=git.sj98.duckdns.org
- GITEA__server__ROOT_URL=https://git.sj98.duckdns.org
- GITEA__server__SSH_DOMAIN=git.sj98.duckdns.org
- GITEA__server__SSH_PORT=2222
- GITEA__service__DISABLE_REGISTRATION=false
secrets:
- gitea_db_password
depends_on:
- gitea-db
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://localhost:3000 || exit 1"]
interval: 30s
timeout: 10s
retries: 3
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 1G
cpus: '1.0'
reservations:
memory: 256M
cpus: '0.2'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
labels:
- "traefik.enable=true"
- "traefik.http.routers.gitea.rule=Host(`git.sj98.duckdns.org`)"
- "traefik.http.routers.gitea.entrypoints=websecure"
- "traefik.http.routers.gitea.tls.certresolver=leresolver"
- "traefik.http.services.gitea.loadbalancer.server.port=3000"
- "traefik.docker.network=traefik-public"
gitea-db:
image: postgres:15-alpine
volumes:
- gitea_db_data:/var/lib/postgresql/data
networks:
- gitea-internal
environment:
- POSTGRES_USER=gitea
- POSTGRES_PASSWORD_FILE=/run/secrets/gitea_db_password
- POSTGRES_DB=gitea
secrets:
- gitea_db_password
healthcheck:
test: ["CMD-SHELL", "pg_isready -U gitea"]
interval: 30s
timeout: 5s
retries: 3
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 128M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3

View File

@@ -0,0 +1,170 @@
version: '3.8'
networks:
traefik-public:
external: true
homelab-backend:
driver: overlay
volumes:
tsdproxy_config:
tsdproxy_data:
komodo_data:
komodo_mongo_data:
services:
komodo-mongo:
image: mongo:7
volumes:
- komodo_mongo_data:/data/db
networks:
- homelab-backend
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 512M
cpus: '1.0'
reservations:
memory: 128M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
komodo-core:
image: ghcr.io/moghtech/komodo:latest
depends_on:
- komodo-mongo
environment:
- KOMODO_DATABASE_ADDRESS=komodo-mongo:27017
volumes:
- komodo_data:/config
networks:
- traefik-public
- homelab-backend
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 512M
cpus: '1.0'
reservations:
memory: 128M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
labels:
- "traefik.enable=true"
- "traefik.http.routers.komodo.rule=Host(`komodo.sj98.duckdns.org`)"
- "traefik.http.routers.komodo.entrypoints=websecure"
- "traefik.http.routers.komodo.tls.certresolver=leresolver"
- "traefik.http.services.komodo.loadbalancer.server.port=9120"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=komodo"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
komodo-periphery:
image: ghcr.io/moghtech/komodo-periphery:latest
environment:
- PERIPHERY_Id=periphery-{{.Node.Hostname}}
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
mode: global
resources:
limits:
memory: 128M
cpus: '0.5'
reservations:
memory: 32M
cpus: '0.05'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
watchtower:
image: containrrr/watchtower:1.7.1
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
- DOCKER_API_VERSION=1.44
command: --cleanup --interval 86400
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 256M
cpus: '0.25'
reservations:
memory: 64M
cpus: '0.05'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
tsdproxy:
image: almeidapaulopt/tsdproxy:v0.5.1
networks:
- traefik-public
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- tsdproxy_config:/config
- tsdproxy_data:/data
deploy:
resources:
limits:
memory: 256M
cpus: '0.25'
reservations:
memory: 64M
cpus: '0.05'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
labels:
- "traefik.enable=true"
- "traefik.http.routers.tsdproxy.rule=Host(`tsdproxy.sj98.duckdns.org`)"
- "traefik.http.routers.tsdproxy.entrypoints=websecure"
- "traefik.http.routers.tsdproxy.tls.certresolver=leresolver"
- "traefik.http.services.tsdproxy.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=tsdproxy"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"

View File

@@ -0,0 +1,5 @@
# Please replace claim-xxxxxxxxxxxx with your actual Plex claim token.
PLEX_CLAIM=claim-xxxxxxxxxxxx
# The ADVERTISE_IP is currently hardcoded in the docker-compose file.
# You may want to review it and change it to your actual IP address.

View File

@@ -0,0 +1,235 @@
version: '3.9'
networks:
traefik-public:
external: true
media-backend:
driver: overlay
volumes:
plex_config:
jellyfin_config:
immich_upload:
immich_model_cache:
immich_db:
immich_redis:
homarr_config:
services:
homarr:
image: ghcr.io/homarr-labs/homarr:1.43.0
networks:
- traefik-public
- media-backend
volumes:
- homarr_config:/app/data
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
- TZ=America/Chicago
deploy:
placement:
constraints:
- node.labels.leader == true
- node.role == manager
labels:
- "traefik.enable=true"
- "traefik.http.routers.homarr-router.rule=Host(`homarr.sj98.duckdns.org`)"
- "traefik.http.routers.homarr-router.entrypoints=websecure"
- "traefik.http.routers.homarr-router.tls.certresolver=leresolver"
- "traefik.http.services.homarr.loadbalancer.server.port=7575"
- "traefik.docker.network=traefik-public"
resources:
limits:
memory: 512M
cpus: '1.0'
reservations:
memory: 128M
cpus: '0.2'
restart_policy:
condition: on-failure
max_attempts: 3
plex:
image: plexinc/pms-docker:latest
hostname: plex
networks:
- traefik-public
- media-backend
volumes:
- plex_config:/config
- /mnt/media:/media:ro
environment:
- TZ=America/Chicago
- PLEX_CLAIM=${PLEX_CLAIM}
- ADVERTISE_IP=http://192.168.1.196:32400/
deploy:
placement:
constraints:
- node.role == manager
labels:
- "traefik.enable=true"
- "traefik.http.routers.plex-router.rule=Host(`plex.sj98.duckdns.org`)"
- "traefik.http.routers.plex-router.entrypoints=websecure"
- "traefik.http.routers.plex-router.tls.certresolver=leresolver"
- "traefik.http.services.plex.loadbalancer.server.port=32400"
- "traefik.docker.network=traefik-public"
resources:
limits:
memory: 1G
cpus: '2.0'
reservations:
memory: 512M
cpus: '0.5'
restart_policy:
condition: on-failure
max_attempts: 3
jellyfin:
image: jellyfin/jellyfin:latest
networks:
- traefik-public
- media-backend
volumes:
- jellyfin_config:/config
- /mnt/media:/media:ro
environment:
- TZ=America/Chicago
deploy:
placement:
constraints:
- node.role == manager
labels:
- "traefik.enable=true"
- "traefik.http.routers.jellyfin-router.rule=Host(`jellyfin.sj98.duckdns.org`)"
- "traefik.http.routers.jellyfin-router.entrypoints=websecure"
- "traefik.http.routers.jellyfin-router.tls.certresolver=leresolver"
- "traefik.http.services.jellyfin.loadbalancer.server.port=8096"
- "traefik.docker.network=traefik-public"
resources:
limits:
memory: 1G
cpus: '2.0'
reservations:
memory: 512M
cpus: '0.5'
restart_policy:
condition: on-failure
max_attempts: 3
immich-server:
image: ghcr.io/immich-app/immich-server:release
networks:
- traefik-public
- media-backend
volumes:
- immich_upload:/usr/src/app/upload
- /mnt/media/Photos:/usr/src/app/upload/library:rw
- /etc/localtime:/etc/localtime:ro
environment:
- DB_HOSTNAME=immich-db
- DB_USERNAME=immich
- DB_PASSWORD=immich
- DB_DATABASE_NAME=immich
- REDIS_HOSTNAME=immich-redis
- TZ=America/Chicago
- IMMICH_MEDIA_LOCATION=/usr/src/app/upload/library
depends_on:
- immich-redis
- immich-db
deploy:
placement:
constraints:
- node.role == manager
labels:
- "traefik.enable=true"
- "traefik.http.routers.immich-server-router.rule=Host(`immich.sj98.duckdns.org`)"
- "traefik.http.routers.immich-server-router.entrypoints=websecure"
- "traefik.http.routers.immich-server-router.tls.certresolver=leresolver"
- "traefik.http.services.immich-server.loadbalancer.server.port=2283"
- "traefik.docker.network=traefik-public"
# Immich-specific headers and settings
- "traefik.http.routers.immich-server-router.middlewares=immich-headers"
- "traefik.http.middlewares.immich-headers.headers.customrequestheaders.X-Forwarded-Proto=https"
- "traefik.http.services.immich-server.loadbalancer.passhostheader=true"
resources:
limits:
memory: 2G
cpus: '2.0'
reservations:
memory: 1G
cpus: '0.5'
restart_policy:
condition: on-failure
max_attempts: 3
immich-machine-learning:
image: ghcr.io/immich-app/immich-machine-learning:release
networks:
- media-backend
volumes:
- immich_model_cache:/cache
environment:
- TZ=America/Chicago
depends_on:
- immich-server
deploy:
placement:
constraints:
- node.labels.heavy == true
- node.labels.ai == true
resources:
limits:
memory: 4G
cpus: '4.0'
reservations:
memory: 2G
cpus: '2.0'
restart_policy:
condition: on-failure
max_attempts: 3
immich-redis:
image: redis:7-alpine
networks:
- media-backend
volumes:
- immich_redis:/data
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 256M
cpus: '0.5'
reservations:
memory: 64M
cpus: '0.1'
restart_policy:
condition: on-failure
max_attempts: 3
immich-db:
image: tensorchord/pgvecto-rs:pg14-v0.2.0
networks:
- media-backend
volumes:
- immich_db:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=immich
- POSTGRES_USER=immich
- POSTGRES_DB=immich
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 512M
cpus: '1.0'
reservations:
memory: 256M
cpus: '0.25'
restart_policy:
condition: on-failure
max_attempts: 3

View File

@@ -0,0 +1,233 @@
version: '3.8'
networks:
traefik-public:
external: true
monitoring:
driver: overlay
volumes:
prometheus_data:
grafana_data:
alertmanager_data:
secrets:
grafana_admin_password:
external: true
configs:
prometheus_config:
external: true
name: prometheus.yml
services:
prometheus:
image: prom/prometheus:v3.0.1
volumes:
- prometheus_data:/prometheus
configs:
- source: prometheus_config
target: /etc/prometheus/prometheus.yml
networks:
- monitoring
- traefik-public
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:9090/-/healthy"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 2G
cpus: '1.0'
reservations:
memory: 512M
cpus: '0.25'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.prometheus.rule=Host(`prometheus.sj98.duckdns.org`)"
- "traefik.http.routers.prometheus.entrypoints=websecure"
- "traefik.http.routers.prometheus.tls.certresolver=leresolver"
- "traefik.http.services.prometheus.loadbalancer.server.port=9090"
- "traefik.docker.network=traefik-public"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
grafana:
image: grafana/grafana:11.3.1
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SERVER_ROOT_URL=https://grafana.sj98.duckdns.org
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password
secrets:
- grafana_admin_password
networks:
- monitoring
- traefik-public
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/api/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 1G
cpus: '1.0'
reservations:
memory: 256M
cpus: '0.25'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.grafana.rule=Host(`grafana.sj98.duckdns.org`)"
- "traefik.http.routers.grafana.entrypoints=websecure"
- "traefik.http.routers.grafana.tls.certresolver=leresolver"
- "traefik.http.services.grafana.loadbalancer.server.port=3000"
- "traefik.docker.network=traefik-public"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
alertmanager:
image: prom/alertmanager:v0.27.0
volumes:
- alertmanager_data:/alertmanager
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
networks:
- monitoring
- traefik-public
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:9093/-/healthy"]
interval: 30s
timeout: 5s
retries: 3
start_period: 15s
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 256M
cpus: '0.25'
reservations:
memory: 64M
cpus: '0.05'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
labels:
- "traefik.enable=true"
- "traefik.http.routers.alertmanager.rule=Host(`alertmanager.sj98.duckdns.org`)"
- "traefik.http.routers.alertmanager.entrypoints=websecure"
- "traefik.http.routers.alertmanager.tls.certresolver=leresolver"
- "traefik.http.services.alertmanager.loadbalancer.server.port=9093"
- "traefik.docker.network=traefik-public"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
node-exporter:
image: prom/node-exporter:v1.8.2
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
networks:
- monitoring
deploy:
mode: global
resources:
limits:
memory: 128M
cpus: '0.2'
reservations:
memory: 32M
cpus: '0.05'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "2"
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.50.0
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
command:
- '--docker_only=true'
- '--housekeeping_interval=30s'
networks:
- monitoring
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/healthz"]
interval: 30s
timeout: 5s
retries: 3
deploy:
mode: global
resources:
limits:
memory: 256M
cpus: '0.3'
reservations:
memory: 64M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "2"

View File

@@ -0,0 +1,54 @@
version: '3.8'
networks:
traefik-public:
external: true
volumes:
n8n_data:
services:
n8n:
image: n8nio/n8n:latest
volumes:
- n8n_data:/home/node/.n8n
- /var/run/docker.sock:/var/run/docker.sock
networks:
- traefik-public
environment:
- N8N_HOST=n8n.sj98.duckdns.org
- N8N_PROTOCOL=https
- NODE_ENV=production
- WEBHOOK_URL=https://n8n.sj98.duckdns.org/
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://localhost:5678/healthz || exit 1"]
interval: 30s
timeout: 10s
retries: 3
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 1G
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
labels:
- "traefik.enable=true"
- "traefik.http.routers.n8n.rule=Host(`n8n.sj98.duckdns.org`)"
- "traefik.http.routers.n8n.entrypoints=websecure"
- "traefik.http.routers.n8n.tls.certresolver=leresolver"
- "traefik.http.services.n8n.loadbalancer.server.port=5678"
- "traefik.docker.network=traefik-public"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"

View File

@@ -0,0 +1,110 @@
version: '3.8'
networks:
traefik-public:
external: true
secrets:
duckdns_token:
external: true
volumes:
traefik_letsencrypt:
external: true
configs:
traefik_yml:
external: true
name: traefik.yml
services:
traefik:
image: traefik:v3.2.3
ports:
- "80:80"
- "443:443"
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- traefik_letsencrypt:/letsencrypt
networks:
- traefik-public
secrets:
- duckdns_token
configs:
- source: traefik_yml
target: /etc/traefik/traefik.yml
healthcheck:
test: ["CMD", "traefik", "healthcheck", "--ping"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
deploy:
mode: replicated
replicas: 2
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 128M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
order: start-first
labels:
- "traefik.enable=true"
- "traefik.http.routers.traefik.rule=Host(`traefik.sj98.duckdns.org`)"
- "traefik.http.routers.traefik.entrypoints=websecure"
- "traefik.http.routers.traefik.tls.certresolver=leresolver"
- "traefik.http.routers.traefik.service=api@internal"
- "traefik.http.services.traefik.loadbalancer.server.port=8080"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
whoami:
image: traefik/whoami:v1.10
networks:
- traefik-public
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:80/health"]
interval: 30s
timeout: 5s
retries: 3
deploy:
resources:
limits:
memory: 64M
cpus: '0.1'
reservations:
memory: 16M
cpus: '0.01'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
labels:
- "traefik.enable=true"
- "traefik.http.routers.whoami.rule=Host(`whoami.sj98.duckdns.org`)"
- "traefik.http.routers.whoami.entrypoints=websecure"
- "traefik.http.routers.whoami.tls.certresolver=leresolver"
- "traefik.http.services.whoami.loadbalancer.server.port=80"
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "2"

View File

@@ -0,0 +1,38 @@
version: '3.8'
networks:
monitoring:
external: true
services:
node-exporter:
image: prom/node-exporter:v1.8.2
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
volumes:
- '/proc:/host/proc:ro'
- '/sys:/host/sys:ro'
- '/:/rootfs:ro,rslave'
networks:
- monitoring
deploy:
mode: global
resources:
limits:
memory: 128M
cpus: '0.2'
reservations:
memory: 32M
cpus: '0.05'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "2"

View File

@@ -0,0 +1,133 @@
version: '3.8'
networks:
traefik-public:
external: true
portainer-agent:
driver: overlay
attachable: true
volumes:
portainer_data:
services:
portainer:
image: portainer/portainer-ce:2.21.4
command: -H tcp://tasks.agent:9001 --tlsskipverify
ports:
- "9000:9000"
- "9443:9443"
volumes:
- portainer_data:/data
networks:
- traefik-public
- portainer-agent
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:9000/api/status"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.25'
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.portainer.rule=Host(`portainer.sj98.duckdns.org`)"
- "traefik.http.routers.portainer.entrypoints=websecure"
- "traefik.http.routers.portainer.tls.certresolver=leresolver"
- "traefik.http.services.portainer.loadbalancer.server.port=9000"
- "traefik.docker.network=traefik-public"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
# Linux agent
agent:
image: portainer/agent:2.21.4
environment:
AGENT_CLUSTER_ADDR: tasks.agent
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/lib/docker/volumes:/var/lib/docker/volumes
networks:
- portainer-agent
deploy:
mode: global
placement:
constraints:
- node.platform.os == linux
resources:
limits:
memory: 128M
cpus: '0.25'
reservations:
memory: 64M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "2"
# Windows agent (optional - only deploys if Windows node exists)
agent-windows:
image: portainer/agent:2.21.4
environment:
AGENT_CLUSTER_ADDR: tasks.agent
volumes:
- type: npipe
source: \\\\.\\pipe\\docker_engine
target: \\\\.\\pipe\\docker_engine
- type: bind
source: C:\\ProgramData\\docker\\volumes
target: C:\\ProgramData\\docker\\volumes
networks:
portainer-agent:
aliases:
- agent
deploy:
mode: global
placement:
constraints:
- node.platform.os == windows
resources:
limits:
memory: 128M
cpus: '0.25'
reservations:
memory: 64M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "2"

View File

@@ -0,0 +1,4 @@
# Please replace these with your actual credentials
POSTGRES_PASSWORD=nextcloud
NEXTCLOUD_ADMIN_USER=admin
NEXTCLOUD_ADMIN_PASSWORD=password

View File

@@ -0,0 +1,112 @@
version: '3.9'
networks:
traefik-public:
external: true
productivity-backend:
driver: overlay
volumes:
nextcloud_data:
nextcloud_db:
nextcloud_redis:
services:
nextcloud-db:
image: postgres:15-alpine
volumes:
- nextcloud_db:/var/lib/postgresql/data
environment:
- POSTGRES_DB=nextcloud
- POSTGRES_USER=nextcloud
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD} # Replace with a secure password in production
networks:
- productivity-backend
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 1G
cpus: '1.0'
reservations:
memory: 256M
cpus: '0.25'
restart_policy:
condition: on-failure
nextcloud-redis:
image: redis:7-alpine
volumes:
- nextcloud_redis:/data
networks:
- productivity-backend
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 256M
cpus: '0.5'
reservations:
memory: 64M
cpus: '0.1'
restart_policy:
condition: on-failure
nextcloud:
image: nextcloud:30.0.8
volumes:
- nextcloud_data:/var/www/html
environment:
- POSTGRES_HOST=nextcloud-db
- POSTGRES_DB=nextcloud
- POSTGRES_USER=nextcloud
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD} # Replace with a secure password in production
- REDIS_HOST=nextcloud-redis
- NEXTCLOUD_ADMIN_USER=${NEXTCLOUD_ADMIN_USER} # Replace with your desired admin username
- NEXTCLOUD_ADMIN_PASSWORD=${NEXTCLOUD_ADMIN_PASSWORD} # Replace with a secure password
- NEXTCLOUD_TRUSTED_DOMAINS=nextcloud.sj98.duckdns.org
- OVERWRITEPROTOCOL=https
- OVERWRITEHOST=nextcloud.sj98.duckdns.org
- TRUSTED_PROXIES=172.16.0.0/12
depends_on:
- nextcloud-db
- nextcloud-redis
networks:
- traefik-public
- productivity-backend
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 2G
reservations:
memory: 512M
restart_policy:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.http.routers.nextcloud.rule=Host(`nextcloud.sj98.duckdns.org`)"
- "traefik.http.routers.nextcloud.entrypoints=websecure"
- "traefik.http.routers.nextcloud.tls.certresolver=leresolver"
- "traefik.http.services.nextcloud.loadbalancer.server.port=80"
- "traefik.docker.network=traefik-public"
# Nextcloud-specific middlewares
- "traefik.http.routers.nextcloud.middlewares=nextcloud-chain"
- "traefik.http.middlewares.nextcloud-chain.chain.middlewares=nextcloud-caldav,nextcloud-headers"
# CalDAV/CardDAV redirect
- "traefik.http.middlewares.nextcloud-caldav.redirectregex.regex=^https://(.*)/.well-known/(card|cal)dav"
- "traefik.http.middlewares.nextcloud-caldav.redirectregex.replacement=https://$$1/remote.php/dav/"
- "traefik.http.middlewares.nextcloud-caldav.redirectregex.permanent=true"
# Security headers
- "traefik.http.middlewares.nextcloud-headers.headers.stsSeconds=31536000"
- "traefik.http.middlewares.nextcloud-headers.headers.stsIncludeSubdomains=true"
- "traefik.http.middlewares.nextcloud-headers.headers.stsPreload=true"
- "traefik.http.middlewares.nextcloud-headers.headers.forceSTSHeader=true"
- "traefik.http.middlewares.nextcloud-headers.headers.customFrameOptionsValue=SAMEORIGIN"
- "traefik.http.middlewares.nextcloud-headers.headers.customResponseHeaders.X-Robots-Tag=noindex,nofollow"

View File

@@ -0,0 +1,253 @@
version: '3.8'
networks:
traefik-public:
external: true
homelab-backend:
driver: overlay
volumes:
paperless_data:
paperless_media:
paperless_db:
paperless_redis:
stirling_pdf_data:
searxng_data:
secrets:
paperless_db_password:
external: true
paperless_secret_key:
external: true
services:
paperless-redis:
image: redis:7-alpine
volumes:
- paperless_redis:/data
networks:
- homelab-backend
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 30s
timeout: 3s
retries: 3
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 256M
cpus: '0.5'
reservations:
memory: 64M
cpus: '0.1'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
paperless-db:
image: postgres:15-alpine
volumes:
- paperless_db:/var/lib/postgresql/data
networks:
- homelab-backend
environment:
- POSTGRES_DB=paperless
- POSTGRES_USER=paperless
- POSTGRES_PASSWORD_FILE=/run/secrets/paperless_db_password
secrets:
- paperless_db_password
healthcheck:
test: ["CMD-SHELL", "pg_isready -U paperless"]
interval: 30s
timeout: 5s
retries: 3
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 512M
cpus: '1.0'
reservations:
memory: 256M
cpus: '0.25'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
paperless:
image: ghcr.io/paperless-ngx/paperless-ngx:2.19.3
volumes:
- paperless_data:/usr/src/paperless/data
- paperless_media:/usr/src/paperless/media
environment:
- PAPERLESS_REDIS=redis://paperless-redis:6379
- PAPERLESS_DBHOST=paperless-db
- PAPERLESS_DBNAME=paperless
- PAPERLESS_DBUSER=paperless
- PAPERLESS_DBPASS_FILE=/run/secrets/paperless_db_password
- PAPERLESS_URL=https://paperless.sj98.duckdns.org
- PAPERLESS_SECRET_KEY_FILE=/run/secrets/paperless_secret_key
- TZ=America/Chicago
secrets:
- paperless_db_password
- paperless_secret_key
depends_on:
- paperless-redis
- paperless-db
networks:
- traefik-public
- homelab-backend
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/api/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 90s
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 1536M
cpus: '2.0'
reservations:
memory: 768M
cpus: '0.5'
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.paperless.rule=Host(`paperless.sj98.duckdns.org`)"
- "traefik.http.routers.paperless.entrypoints=websecure"
- "traefik.http.routers.paperless.tls.certresolver=leresolver"
- "traefik.http.services.paperless.loadbalancer.server.port=8000"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=paperless"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
stirling-pdf:
image: frooodle/s-pdf:0.18.1
volumes:
- stirling_pdf_data:/configs
environment:
- DOCKER_ENABLE_SECURITY=false
- INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
- LANGS=en_US
networks:
- traefik-public
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 1536M
cpus: '2.0'
reservations:
memory: 768M
cpus: '0.5'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.pdf.rule=Host(`pdf.sj98.duckdns.org`)"
- "traefik.http.routers.pdf.entrypoints=websecure"
- "traefik.http.routers.pdf.tls.certresolver=leresolver"
- "traefik.http.services.pdf.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=pdf"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
searxng:
image: searxng/searxng:2024.11.20-e9f6095cc
volumes:
- searxng_data:/etc/searxng
environment:
- SEARXNG_BASE_URL=https://search.sj98.duckdns.org/
networks:
- traefik-public
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/healthz"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
deploy:
placement:
constraints:
- node.labels.leader == true
resources:
limits:
memory: 1536M
cpus: '2.0'
reservations:
memory: 512M
cpus: '0.5'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
labels:
- "traefik.enable=true"
- "traefik.http.routers.searxng.rule=Host(`search.sj98.duckdns.org`)"
- "traefik.http.routers.searxng.entrypoints=websecure"
- "traefik.http.routers.searxng.tls.certresolver=leresolver"
- "traefik.http.services.searxng.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
- "tsdproxy.enable=true"
- "tsdproxy.name=search"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"

View File

@@ -0,0 +1,45 @@
version: '3.8'
networks:
traefik-public:
external: true
services:
dozzle:
image: amir20/dozzle:v8.14.6
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
networks:
- traefik-public
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/healthcheck"]
interval: 30s
timeout: 5s
retries: 3
deploy:
placement:
constraints:
- node.role == manager
resources:
limits:
memory: 256M
cpus: '0.25'
reservations:
memory: 64M
cpus: '0.05'
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
labels:
- "traefik.enable=true"
- "traefik.http.routers.dozzle.rule=Host(`dozzle.sj98.duckdns.org`)"
- "traefik.http.routers.dozzle.entrypoints=websecure"
- "traefik.http.routers.dozzle.tls.certresolver=leresolver"
- "traefik.http.services.dozzle.loadbalancer.server.port=8080"
- "traefik.docker.network=traefik-public"
logging:
driver: "json-file"
options:
max-size: "5m"
max-file: "2"

View File

@@ -0,0 +1,2 @@
# Please replace with your actual TSDPROXY_AUTHKEY
TSDPROXY_AUTHKEY=tskey-auth-kUFWCyDau321CNTRL-Vdt9PFUDUqAb7iQYLvCjqAkhcnq3aTTtg

View File

@@ -0,0 +1,32 @@
version: '3.9'
networks:
traefik-public:
external: true
volumes:
tsdproxydata:
services:
tsdproxy:
image: almeidapaulopt/tsdproxy:latest
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- tsdproxydata:/data
environment:
- TSDPROXY_AUTHKEY=${TSDPROXY_AUTHKEY}
- DOCKER_HOST=unix:///var/run/docker.sock
networks:
- traefik-public
deploy:
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
labels:
- "traefik.enable=true"
- "traefik.http.routers.tsdproxy.rule=Host(`proxy.sj98.duckdns.org`)"
- "traefik.http.routers.tsdproxy.entrypoints=websecure"
- "traefik.http.routers.tsdproxy.tls.certresolver=leresolver"
- "traefik.http.services.tsdproxy.loadbalancer.server.port=8080"

View File

@@ -0,0 +1,29 @@
version: '3.8'
services:
traefik:
image: traefik:v2.10
command:
- --api.insecure=false
- --providers.docker=true
- --entrypoints.web.address=:80
- --entrypoints.websecure.address=:443
- --certificatesresolvers.leresolver.acme.email=sterlenjohnson6@gmail.com
- --certificatesresolvers.leresolver.acme.storage=/letsencrypt/acme.json
- --certificatesresolvers.leresolver.acme.dnschallenge=true
- --certificatesresolvers.leresolver.acme.dnschallenge.provider=duckdns
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /letsencrypt:/letsencrypt
deploy:
mode: replicated
replicas: 2
placement:
constraints: [node.role == manager]
networks:
- webnet
networks:
webnet:
driver: overlay

View File

@@ -0,0 +1,54 @@
# traefik.yml - static configuration (file provider)
checkNewVersion: true
sendAnonymousUsage: false
log:
level: INFO
api:
dashboard: true
insecure: false # set to true only for quick local testing (not recommended for public)
# single entryPoints section (merged)
entryPoints:
web:
address: ":80"
http:
redirections:
entryPoint:
to: websecure
scheme: https
# optional timeouts can live under transport as well (kept only on websecure below)
websecure:
address: ":443"
http:
tls:
certResolver: leresolver
transport:
respondingTimeouts:
# keep these large if you expect long uploads/downloads or long-lived requests
readTimeout: 600s
writeTimeout: 600s
idleTimeout: 600s
providers:
swarm:
endpoint: "unix:///var/run/docker.sock"
certificatesResolvers:
leresolver:
acme:
email: "sterlenjohnson6@gmail.com"
storage: "/letsencrypt/acme.json"
# DNS-01, using DuckDNS provider
dnsChallenge:
provider: duckdns
delayBeforeCheck: 60s
# Usually unnecessary to specify "resolvers" unless you have special internal resolvers.
# If you DO need Traefik to use specific DNS servers for the challenge, make sure
# the container has network access to them and that they will answer public DNS queries.
resolvers:
- "192.168.1.196:53"
- "192.168.1.245:53"
- "192.168.1.62:53"

View File

@@ -0,0 +1,13 @@
[Unit]
Description=Daily Restic Backup
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/workspace/homelab/scripts/backup_daily.sh
User=root
Group=root
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,11 @@
[Unit]
Description=Daily Restic Backup Timer
Requires=restic-backup.service
[Timer]
OnCalendar=daily
OnCalendar=02:00
Persistent=true
[Install]
WantedBy=timers.target