Files
Homelab/docs/guides/DEPLOYMENT_GUIDE.md

7.6 KiB

Home Lab Improvements - Deployment Guide

This guide provides step-by-step instructions for deploying all the homelab improvements.

Table of Contents

  1. Network Upgrade
  2. Storage Enhancements
  3. Service Consolidation
  4. Security Hardening
  5. Monitoring & Automation
  6. Backup Strategy

Prerequisites

  • SSH access to all nodes
  • Root/sudo privileges
  • Docker Swarm cluster operational
  • Backblaze B2 account (for backups)

1. Network Upgrade

1.1 Install 2.5 Gb PoE Switch

Hardware: Netgear GS110EMX or equivalent

Steps:

  1. Power down affected nodes
  2. Install new switch
  3. Connect all 2.5 Gb nodes (Ryzen .81, Acer .57)
  4. Connect 1 Gb nodes (Pi 4 .245, Time Capsule .153)
  5. Power on and verify link speeds

Verification:

# On each node, check link speed:
ethtool eth0 | grep Speed

1.2 Configure VLANs

Script: /workspace/homelab/scripts/vlan_firewall.sh

Steps:

  1. Create VLAN 10 (Management): 192.168.10.0/24
  2. Create VLAN 20 (Services): 192.168.20.0/24
  3. Configure router ACLs using the firewall script

Verification:

# Check VLAN configuration
ip -d link show

# Test VLAN isolation
ping 192.168.10.1  # from VLAN 20 (should fail for restricted ports)

1.3 Configure LACP Bonding (Ryzen Node)

Note: Requires two NICs on the Ryzen node

Configuration (/etc/network/interfaces.d/bond0.cfg):

auto bond0
iface bond0 inet static
    address 192.168.1.81
    netmask 255.255.255.0
    gateway 192.168.1.1
    bond-mode 802.3ad
    bond-miimon 100
    bond-slaves eth0 eth1

Apply:

sudo systemctl restart networking

2. Storage Enhancements

2.1 Create ZFS Pool on Proxmox Host

Script: /workspace/homelab/scripts/zfs_setup.sh

Steps:

  1. SSH to Proxmox host (192.168.1.57)
  2. Identify SSD devices: lsblk
  3. Update script with correct device names
  4. Run: sudo bash /workspace/homelab/scripts/zfs_setup.sh

Verification:

zpool status tank
zfs list

2.2 Mount NAS on All Nodes

Guide: /workspace/homelab/docs/guides/NAS_Mount_Guide.md

Steps:

  1. Follow the NAS Mount Guide for each node
  2. Create credentials file
  3. Add to /etc/fstab
  4. Mount: sudo mount -a

Verification:

df -h | grep /mnt/nas
ls -la /mnt/nas

2.3 Setup AI Model Pruning

Script: /workspace/homelab/scripts/prune_ai_models.sh

Steps:

  1. Update MODEL_DIR path in script
  2. Make executable: chmod +x /workspace/homelab/scripts/prune_ai_models.sh
  3. Add to cron: crontab -e
    0 3 * * * /workspace/homelab/scripts/prune_ai_models.sh
    

Verification:

# Test run
sudo /workspace/homelab/scripts/prune_ai_models.sh

# Check cron logs
grep CRON /var/log/syslog

3. Service Consolidation

3.1 Deploy Traefik Swarm Service

Stack: /workspace/homelab/services/swarm/traefik/stack.yml

Steps:

  1. Review and update stack.yml if needed
  2. Deploy: docker stack deploy -c /workspace/homelab/services/swarm/traefik/stack.yml traefik
  3. Remove standalone Traefik on Pi 4

Verification:

docker service ls | grep traefik
docker service ps traefik_traefik
curl -I http://192.168.1.196

3.2 Deploy Caddy Fallback (Pi Zero)

Location: /workspace/homelab/services/standalone/Caddy/

Steps:

  1. SSH to Pi Zero (192.168.1.62)
  2. Copy Caddy files to node
  3. Run: docker-compose up -d

Verification:

docker ps | grep caddy
curl http://192.168.1.62:8080

3.3 Add Health Checks

Guide: /workspace/homelab/docs/guides/health_checks.md

Steps:

  1. Review health check examples
  2. Update service stack files for critical containers
  3. Redeploy services: docker stack deploy ...

Verification:

docker ps --filter "health=healthy"
docker inspect <container> | jq '.[0].State.Health'

4. Security Hardening

4.1 Install fail2ban on Manager VM

Script: /workspace/homelab/scripts/install_fail2ban.sh

Steps:

  1. SSH to manager VM (192.168.1.196)
  2. Run: sudo bash /workspace/homelab/scripts/install_fail2ban.sh

Verification:

sudo fail2ban-client status
sudo fail2ban-client status sshd
sudo tail -f /var/log/fail2ban.log

4.2 Configure Firewall Rules

Script: /workspace/homelab/scripts/vlan_firewall.sh

Steps:

  1. Review script and adjust VLANs/ports as needed
  2. Run: sudo bash /workspace/homelab/scripts/vlan_firewall.sh
  3. Configure router ACLs via web UI

Verification:

sudo iptables -L -n -v
# Test port accessibility from different VLANs

4.3 Restrict Portainer Access

Options:

  • Configure Tailscale VPN-only access
  • Enable OAuth integration
  • Add firewall rules to block public access

Configuration: Update Portainer stack to bind to Tailscale interface only


5. Monitoring & Automation

5.1 Deploy node-exporter

Script: /workspace/homelab/scripts/setup_monitoring.sh

Steps:

  1. Run: sudo bash /workspace/homelab/scripts/setup_monitoring.sh
  2. Wait for deployment to complete

Verification:

docker service ps monitoring_node-exporter
curl http://192.168.1.196:9100/metrics

5.2 Configure Grafana Alerts

Rules: /workspace/homelab/monitoring/grafana/alert_rules.yml

Steps:

  1. The setup script copies alert rules to Grafana
  2. Login to Grafana UI
  3. Navigate to Alerting > Alert Rules
  4. Verify rules are loaded

Verification:

  • Check Grafana UI for alert rules
  • Trigger test alert (e.g., high CPU load)

6. Backup Strategy

6.1 Setup Restic Backups

Script: /workspace/homelab/scripts/install_restic_backup.sh

Steps:

  1. Create Backblaze B2 bucket
  2. Get B2 account ID and key
  3. Update /workspace/homelab/scripts/backup_daily.sh with credentials
  4. Run: sudo bash /workspace/homelab/scripts/install_restic_backup.sh

Verification:

sudo systemctl status restic-backup.timer
sudo systemctl list-timers
# Manual test run
sudo /workspace/homelab/scripts/backup_daily.sh

6.2 Verify Backups

# Check snapshots
export RESTIC_REPOSITORY="b2:your-bucket:/backups"
export RESTIC_PASSWORD="your_password"
restic snapshots

# Restore test
restic restore latest --target /tmp/restore-test

Rollback Procedures

If network upgrade fails:

  • Reconnect to old switch
  • Remove VLAN configurations
  • Restart networking: sudo systemctl restart networking

If ZFS pool creation fails:

  • Destroy pool: sudo zpool destroy tank
  • Verify data on SSDs before retrying

If Traefik Swarm migration fails:

  • Restart standalone Traefik on Pi 4
  • Remove Swarm service: docker service rm traefik_traefik

If backups fail:

  • Check B2 credentials
  • Verify network connectivity
  • Check restic logs: /var/log/restic_backup.log

Post-Deployment Checklist

  • All nodes have 2.5 Gb connectivity
  • VLANs configured and isolated
  • ZFS pool created and healthy
  • NAS mounted on all nodes
  • Traefik Swarm service running with 2 replicas
  • Caddy fallback operational
  • fail2ban protecting manager VM
  • Firewall rules active
  • node-exporter running on all nodes
  • Grafana alerts configured
  • Restic backups running daily
  • Health checks added to critical services

Support & Troubleshooting

Refer to individual guide files for detailed troubleshooting:

For script issues, check logs in /var/log/ and Docker logs: docker service logs <service>