# Home Lab Improvements - Deployment Guide This guide provides step-by-step instructions for deploying all the homelab improvements. ## Table of Contents 1. [Network Upgrade](#network-upgrade) 2. [Storage Enhancements](#storage-enhancements) 3. [Service Consolidation](#service-consolidation) 4. [Security Hardening](#security-hardening) 5. [Monitoring & Automation](#monitoring--automation) 6. [Backup Strategy](#backup-strategy) --- ## Prerequisites - SSH access to all nodes - Root/sudo privileges - Docker Swarm cluster operational - Backblaze B2 account (for backups) --- ## 1. Network Upgrade ### 1.1 Install 2.5 Gb PoE Switch **Hardware**: Netgear GS110EMX or equivalent **Steps**: 1. Power down affected nodes 2. Install new switch 3. Connect all 2.5 Gb nodes (Ryzen .81, Acer .57) 4. Connect 1 Gb nodes (Pi 4 .245, Time Capsule .153) 5. Power on and verify link speeds **Verification**: ```bash # On each node, check link speed: ethtool eth0 | grep Speed ``` ### 1.2 Configure VLANs **Script**: `/workspace/homelab/scripts/vlan_firewall.sh` **Steps**: 1. Create VLAN 10 (Management): 192.168.10.0/24 2. Create VLAN 20 (Services): 192.168.20.0/24 3. Configure router ACLs using the firewall script **Verification**: ```bash # Check VLAN configuration ip -d link show # Test VLAN isolation ping 192.168.10.1 # from VLAN 20 (should fail for restricted ports) ``` ### 1.3 Configure LACP Bonding (Ryzen Node) **Note**: Requires two NICs on the Ryzen node **Configuration** (`/etc/network/interfaces.d/bond0.cfg`): ``` auto bond0 iface bond0 inet static address 192.168.1.81 netmask 255.255.255.0 gateway 192.168.1.1 bond-mode 802.3ad bond-miimon 100 bond-slaves eth0 eth1 ``` **Apply**: ```bash sudo systemctl restart networking ``` --- ## 2. Storage Enhancements ### 2.1 Create ZFS Pool on Proxmox Host **Script**: `/workspace/homelab/scripts/zfs_setup.sh` **Steps**: 1. SSH to Proxmox host (192.168.1.57) 2. Identify SSD devices: `lsblk` 3. Update script with correct device names 4. Run: `sudo bash /workspace/homelab/scripts/zfs_setup.sh` **Verification**: ```bash zpool status tank zfs list ``` ### 2.2 Mount NAS on All Nodes **Guide**: `/workspace/homelab/docs/guides/NAS_Mount_Guide.md` **Steps**: 1. Follow the NAS Mount Guide for each node 2. Create credentials file 3. Add to `/etc/fstab` 4. Mount: `sudo mount -a` **Verification**: ```bash df -h | grep /mnt/nas ls -la /mnt/nas ``` ### 2.3 Setup AI Model Pruning **Script**: `/workspace/homelab/scripts/prune_ai_models.sh` **Steps**: 1. Update MODEL_DIR path in script 2. Make executable: `chmod +x /workspace/homelab/scripts/prune_ai_models.sh` 3. Add to cron: `crontab -e` ``` 0 3 * * * /workspace/homelab/scripts/prune_ai_models.sh ``` **Verification**: ```bash # Test run sudo /workspace/homelab/scripts/prune_ai_models.sh # Check cron logs grep CRON /var/log/syslog ``` --- ## 3. Service Consolidation ### 3.1 Deploy Traefik Swarm Service **Stack**: `/workspace/homelab/services/swarm/traefik/stack.yml` **Steps**: 1. Review and update stack.yml if needed 2. Deploy: `docker stack deploy -c /workspace/homelab/services/swarm/traefik/stack.yml traefik` 3. Remove standalone Traefik on Pi 4 **Verification**: ```bash docker service ls | grep traefik docker service ps traefik_traefik curl -I http://192.168.1.196 ``` ### 3.2 Deploy Caddy Fallback (Pi Zero) **Location**: `/workspace/homelab/services/standalone/Caddy/` **Steps**: 1. SSH to Pi Zero (192.168.1.62) 2. Copy Caddy files to node 3. Run: `docker-compose up -d` **Verification**: ```bash docker ps | grep caddy curl http://192.168.1.62:8080 ``` ### 3.3 Add Health Checks **Guide**: `/workspace/homelab/docs/guides/health_checks.md` **Steps**: 1. Review health check examples 2. Update service stack files for critical containers 3. Redeploy services: `docker stack deploy ...` **Verification**: ```bash docker ps --filter "health=healthy" docker inspect | jq '.[0].State.Health' ``` --- ## 4. Security Hardening ### 4.1 Install fail2ban on Manager VM **Script**: `/workspace/homelab/scripts/install_fail2ban.sh` **Steps**: 1. SSH to manager VM (192.168.1.196) 2. Run: `sudo bash /workspace/homelab/scripts/install_fail2ban.sh` **Verification**: ```bash sudo fail2ban-client status sudo fail2ban-client status sshd sudo tail -f /var/log/fail2ban.log ``` ### 4.2 Configure Firewall Rules **Script**: `/workspace/homelab/scripts/vlan_firewall.sh` **Steps**: 1. Review script and adjust VLANs/ports as needed 2. Run: `sudo bash /workspace/homelab/scripts/vlan_firewall.sh` 3. Configure router ACLs via web UI **Verification**: ```bash sudo iptables -L -n -v # Test port accessibility from different VLANs ``` ### 4.3 Restrict Portainer Access **Options**: - Configure Tailscale VPN-only access - Enable OAuth integration - Add firewall rules to block public access **Configuration**: Update Portainer stack to bind to Tailscale interface only --- ## 5. Monitoring & Automation ### 5.1 Deploy node-exporter **Script**: `/workspace/homelab/scripts/setup_monitoring.sh` **Steps**: 1. Run: `sudo bash /workspace/homelab/scripts/setup_monitoring.sh` 2. Wait for deployment to complete **Verification**: ```bash docker service ps monitoring_node-exporter curl http://192.168.1.196:9100/metrics ``` ### 5.2 Configure Grafana Alerts **Rules**: `/workspace/homelab/monitoring/grafana/alert_rules.yml` **Steps**: 1. The setup script copies alert rules to Grafana 2. Login to Grafana UI 3. Navigate to Alerting > Alert Rules 4. Verify rules are loaded **Verification**: - Check Grafana UI for alert rules - Trigger test alert (e.g., high CPU load) --- ## 6. Backup Strategy ### 6.1 Setup Restic Backups **Script**: `/workspace/homelab/scripts/install_restic_backup.sh` **Steps**: 1. Create Backblaze B2 bucket 2. Get B2 account ID and key 3. Update `/workspace/homelab/scripts/backup_daily.sh` with credentials 4. Run: `sudo bash /workspace/homelab/scripts/install_restic_backup.sh` **Verification**: ```bash sudo systemctl status restic-backup.timer sudo systemctl list-timers # Manual test run sudo /workspace/homelab/scripts/backup_daily.sh ``` ### 6.2 Verify Backups ```bash # Check snapshots export RESTIC_REPOSITORY="b2:your-bucket:/backups" export RESTIC_PASSWORD="your_password" restic snapshots # Restore test restic restore latest --target /tmp/restore-test ``` --- ## Rollback Procedures ### If network upgrade fails: - Reconnect to old switch - Remove VLAN configurations - Restart networking: `sudo systemctl restart networking` ### If ZFS pool creation fails: - Destroy pool: `sudo zpool destroy tank` - Verify data on SSDs before retrying ### If Traefik Swarm migration fails: - Restart standalone Traefik on Pi 4 - Remove Swarm service: `docker service rm traefik_traefik` ### If backups fail: - Check B2 credentials - Verify network connectivity - Check restic logs: `/var/log/restic_backup.log` --- ## Post-Deployment Checklist - [ ] All nodes have 2.5 Gb connectivity - [ ] VLANs configured and isolated - [ ] ZFS pool created and healthy - [ ] NAS mounted on all nodes - [ ] Traefik Swarm service running with 2 replicas - [ ] Caddy fallback operational - [ ] fail2ban protecting manager VM - [ ] Firewall rules active - [ ] node-exporter running on all nodes - [ ] Grafana alerts configured - [ ] Restic backups running daily - [ ] Health checks added to critical services --- ## Support & Troubleshooting Refer to individual guide files for detailed troubleshooting: - [NAS Mount Guide](/workspace/homelab/docs/guides/NAS_Mount_Guide.md) - [Health Checks Guide](/workspace/homelab/docs/guides/health_checks.md) - [Homelab Configuration](/workspace/homelab/docs/guides/Homelab.md) For script issues, check logs in `/var/log/` and Docker logs: `docker service logs `