7.6 KiB
Home Lab Improvements - Deployment Guide
This guide provides step-by-step instructions for deploying all the homelab improvements.
Table of Contents
- Network Upgrade
- Storage Enhancements
- Service Consolidation
- Security Hardening
- Monitoring & Automation
- Backup Strategy
Prerequisites
- SSH access to all nodes
- Root/sudo privileges
- Docker Swarm cluster operational
- Backblaze B2 account (for backups)
1. Network Upgrade
1.1 Install 2.5 Gb PoE Switch
Hardware: Netgear GS110EMX or equivalent
Steps:
- Power down affected nodes
- Install new switch
- Connect all 2.5 Gb nodes (Ryzen .81, Acer .57)
- Connect 1 Gb nodes (Pi 4 .245, Time Capsule .153)
- Power on and verify link speeds
Verification:
# On each node, check link speed:
ethtool eth0 | grep Speed
1.2 Configure VLANs
Script: /workspace/homelab/scripts/vlan_firewall.sh
Steps:
- Create VLAN 10 (Management): 192.168.10.0/24
- Create VLAN 20 (Services): 192.168.20.0/24
- Configure router ACLs using the firewall script
Verification:
# Check VLAN configuration
ip -d link show
# Test VLAN isolation
ping 192.168.10.1 # from VLAN 20 (should fail for restricted ports)
1.3 Configure LACP Bonding (Ryzen Node)
Note: Requires two NICs on the Ryzen node
Configuration (/etc/network/interfaces.d/bond0.cfg):
auto bond0
iface bond0 inet static
address 192.168.1.81
netmask 255.255.255.0
gateway 192.168.1.1
bond-mode 802.3ad
bond-miimon 100
bond-slaves eth0 eth1
Apply:
sudo systemctl restart networking
2. Storage Enhancements
2.1 Create ZFS Pool on Proxmox Host
Script: /workspace/homelab/scripts/zfs_setup.sh
Steps:
- SSH to Proxmox host (192.168.1.57)
- Identify SSD devices:
lsblk - Update script with correct device names
- Run:
sudo bash /workspace/homelab/scripts/zfs_setup.sh
Verification:
zpool status tank
zfs list
2.2 Mount NAS on All Nodes
Guide: /workspace/homelab/docs/guides/NAS_Mount_Guide.md
Steps:
- Follow the NAS Mount Guide for each node
- Create credentials file
- Add to
/etc/fstab - Mount:
sudo mount -a
Verification:
df -h | grep /mnt/nas
ls -la /mnt/nas
2.3 Setup AI Model Pruning
Script: /workspace/homelab/scripts/prune_ai_models.sh
Steps:
- Update MODEL_DIR path in script
- Make executable:
chmod +x /workspace/homelab/scripts/prune_ai_models.sh - Add to cron:
crontab -e0 3 * * * /workspace/homelab/scripts/prune_ai_models.sh
Verification:
# Test run
sudo /workspace/homelab/scripts/prune_ai_models.sh
# Check cron logs
grep CRON /var/log/syslog
3. Service Consolidation
3.1 Deploy Traefik Swarm Service
Stack: /workspace/homelab/services/swarm/traefik/stack.yml
Steps:
- Review and update stack.yml if needed
- Deploy:
docker stack deploy -c /workspace/homelab/services/swarm/traefik/stack.yml traefik - Remove standalone Traefik on Pi 4
Verification:
docker service ls | grep traefik
docker service ps traefik_traefik
curl -I http://192.168.1.196
3.2 Deploy Caddy Fallback (Pi Zero)
Location: /workspace/homelab/services/standalone/Caddy/
Steps:
- SSH to Pi Zero (192.168.1.62)
- Copy Caddy files to node
- Run:
docker-compose up -d
Verification:
docker ps | grep caddy
curl http://192.168.1.62:8080
3.3 Add Health Checks
Guide: /workspace/homelab/docs/guides/health_checks.md
Steps:
- Review health check examples
- Update service stack files for critical containers
- Redeploy services:
docker stack deploy ...
Verification:
docker ps --filter "health=healthy"
docker inspect <container> | jq '.[0].State.Health'
4. Security Hardening
4.1 Install fail2ban on Manager VM
Script: /workspace/homelab/scripts/install_fail2ban.sh
Steps:
- SSH to manager VM (192.168.1.196)
- Run:
sudo bash /workspace/homelab/scripts/install_fail2ban.sh
Verification:
sudo fail2ban-client status
sudo fail2ban-client status sshd
sudo tail -f /var/log/fail2ban.log
4.2 Configure Firewall Rules
Script: /workspace/homelab/scripts/vlan_firewall.sh
Steps:
- Review script and adjust VLANs/ports as needed
- Run:
sudo bash /workspace/homelab/scripts/vlan_firewall.sh - Configure router ACLs via web UI
Verification:
sudo iptables -L -n -v
# Test port accessibility from different VLANs
4.3 Restrict Portainer Access
Options:
- Configure Tailscale VPN-only access
- Enable OAuth integration
- Add firewall rules to block public access
Configuration: Update Portainer stack to bind to Tailscale interface only
5. Monitoring & Automation
5.1 Deploy node-exporter
Script: /workspace/homelab/scripts/setup_monitoring.sh
Steps:
- Run:
sudo bash /workspace/homelab/scripts/setup_monitoring.sh - Wait for deployment to complete
Verification:
docker service ps monitoring_node-exporter
curl http://192.168.1.196:9100/metrics
5.2 Configure Grafana Alerts
Rules: /workspace/homelab/monitoring/grafana/alert_rules.yml
Steps:
- The setup script copies alert rules to Grafana
- Login to Grafana UI
- Navigate to Alerting > Alert Rules
- Verify rules are loaded
Verification:
- Check Grafana UI for alert rules
- Trigger test alert (e.g., high CPU load)
6. Backup Strategy
6.1 Setup Restic Backups
Script: /workspace/homelab/scripts/install_restic_backup.sh
Steps:
- Create Backblaze B2 bucket
- Get B2 account ID and key
- Update
/workspace/homelab/scripts/backup_daily.shwith credentials - Run:
sudo bash /workspace/homelab/scripts/install_restic_backup.sh
Verification:
sudo systemctl status restic-backup.timer
sudo systemctl list-timers
# Manual test run
sudo /workspace/homelab/scripts/backup_daily.sh
6.2 Verify Backups
# Check snapshots
export RESTIC_REPOSITORY="b2:your-bucket:/backups"
export RESTIC_PASSWORD="your_password"
restic snapshots
# Restore test
restic restore latest --target /tmp/restore-test
Rollback Procedures
If network upgrade fails:
- Reconnect to old switch
- Remove VLAN configurations
- Restart networking:
sudo systemctl restart networking
If ZFS pool creation fails:
- Destroy pool:
sudo zpool destroy tank - Verify data on SSDs before retrying
If Traefik Swarm migration fails:
- Restart standalone Traefik on Pi 4
- Remove Swarm service:
docker service rm traefik_traefik
If backups fail:
- Check B2 credentials
- Verify network connectivity
- Check restic logs:
/var/log/restic_backup.log
Post-Deployment Checklist
- All nodes have 2.5 Gb connectivity
- VLANs configured and isolated
- ZFS pool created and healthy
- NAS mounted on all nodes
- Traefik Swarm service running with 2 replicas
- Caddy fallback operational
- fail2ban protecting manager VM
- Firewall rules active
- node-exporter running on all nodes
- Grafana alerts configured
- Restic backups running daily
- Health checks added to critical services
Support & Troubleshooting
Refer to individual guide files for detailed troubleshooting:
For script issues, check logs in /var/log/ and Docker logs: docker service logs <service>