Files
Homelab/docs/guides/DEPLOYMENT_GUIDE.md

330 lines
7.6 KiB
Markdown

# Home Lab Improvements - Deployment Guide
This guide provides step-by-step instructions for deploying all the homelab improvements.
## Table of Contents
1. [Network Upgrade](#network-upgrade)
2. [Storage Enhancements](#storage-enhancements)
3. [Service Consolidation](#service-consolidation)
4. [Security Hardening](#security-hardening)
5. [Monitoring & Automation](#monitoring--automation)
6. [Backup Strategy](#backup-strategy)
---
## Prerequisites
- SSH access to all nodes
- Root/sudo privileges
- Docker Swarm cluster operational
- Backblaze B2 account (for backups)
---
## 1. Network Upgrade
### 1.1 Install 2.5 Gb PoE Switch
**Hardware**: Netgear GS110EMX or equivalent
**Steps**:
1. Power down affected nodes
2. Install new switch
3. Connect all 2.5 Gb nodes (Ryzen .81, Acer .57)
4. Connect 1 Gb nodes (Pi 4 .245, Time Capsule .153)
5. Power on and verify link speeds
**Verification**:
```bash
# On each node, check link speed:
ethtool eth0 | grep Speed
```
### 1.2 Configure VLANs
**Script**: `/workspace/homelab/scripts/vlan_firewall.sh`
**Steps**:
1. Create VLAN 10 (Management): 192.168.10.0/24
2. Create VLAN 20 (Services): 192.168.20.0/24
3. Configure router ACLs using the firewall script
**Verification**:
```bash
# Check VLAN configuration
ip -d link show
# Test VLAN isolation
ping 192.168.10.1 # from VLAN 20 (should fail for restricted ports)
```
### 1.3 Configure LACP Bonding (Ryzen Node)
**Note**: Requires two NICs on the Ryzen node
**Configuration** (`/etc/network/interfaces.d/bond0.cfg`):
```
auto bond0
iface bond0 inet static
address 192.168.1.81
netmask 255.255.255.0
gateway 192.168.1.1
bond-mode 802.3ad
bond-miimon 100
bond-slaves eth0 eth1
```
**Apply**:
```bash
sudo systemctl restart networking
```
---
## 2. Storage Enhancements
### 2.1 Create ZFS Pool on Proxmox Host
**Script**: `/workspace/homelab/scripts/zfs_setup.sh`
**Steps**:
1. SSH to Proxmox host (192.168.1.57)
2. Identify SSD devices: `lsblk`
3. Update script with correct device names
4. Run: `sudo bash /workspace/homelab/scripts/zfs_setup.sh`
**Verification**:
```bash
zpool status tank
zfs list
```
### 2.2 Mount NAS on All Nodes
**Guide**: `/workspace/homelab/docs/guides/NAS_Mount_Guide.md`
**Steps**:
1. Follow the NAS Mount Guide for each node
2. Create credentials file
3. Add to `/etc/fstab`
4. Mount: `sudo mount -a`
**Verification**:
```bash
df -h | grep /mnt/nas
ls -la /mnt/nas
```
### 2.3 Setup AI Model Pruning
**Script**: `/workspace/homelab/scripts/prune_ai_models.sh`
**Steps**:
1. Update MODEL_DIR path in script
2. Make executable: `chmod +x /workspace/homelab/scripts/prune_ai_models.sh`
3. Add to cron: `crontab -e`
```
0 3 * * * /workspace/homelab/scripts/prune_ai_models.sh
```
**Verification**:
```bash
# Test run
sudo /workspace/homelab/scripts/prune_ai_models.sh
# Check cron logs
grep CRON /var/log/syslog
```
---
## 3. Service Consolidation
### 3.1 Deploy Traefik Swarm Service
**Stack**: `/workspace/homelab/services/swarm/traefik/stack.yml`
**Steps**:
1. Review and update stack.yml if needed
2. Deploy: `docker stack deploy -c /workspace/homelab/services/swarm/traefik/stack.yml traefik`
3. Remove standalone Traefik on Pi 4
**Verification**:
```bash
docker service ls | grep traefik
docker service ps traefik_traefik
curl -I http://192.168.1.196
```
### 3.2 Deploy Caddy Fallback (Pi Zero)
**Location**: `/workspace/homelab/services/standalone/Caddy/`
**Steps**:
1. SSH to Pi Zero (192.168.1.62)
2. Copy Caddy files to node
3. Run: `docker-compose up -d`
**Verification**:
```bash
docker ps | grep caddy
curl http://192.168.1.62:8080
```
### 3.3 Add Health Checks
**Guide**: `/workspace/homelab/docs/guides/health_checks.md`
**Steps**:
1. Review health check examples
2. Update service stack files for critical containers
3. Redeploy services: `docker stack deploy ...`
**Verification**:
```bash
docker ps --filter "health=healthy"
docker inspect <container> | jq '.[0].State.Health'
```
---
## 4. Security Hardening
### 4.1 Install fail2ban on Manager VM
**Script**: `/workspace/homelab/scripts/install_fail2ban.sh`
**Steps**:
1. SSH to manager VM (192.168.1.196)
2. Run: `sudo bash /workspace/homelab/scripts/install_fail2ban.sh`
**Verification**:
```bash
sudo fail2ban-client status
sudo fail2ban-client status sshd
sudo tail -f /var/log/fail2ban.log
```
### 4.2 Configure Firewall Rules
**Script**: `/workspace/homelab/scripts/vlan_firewall.sh`
**Steps**:
1. Review script and adjust VLANs/ports as needed
2. Run: `sudo bash /workspace/homelab/scripts/vlan_firewall.sh`
3. Configure router ACLs via web UI
**Verification**:
```bash
sudo iptables -L -n -v
# Test port accessibility from different VLANs
```
### 4.3 Restrict Portainer Access
**Options**:
- Configure Tailscale VPN-only access
- Enable OAuth integration
- Add firewall rules to block public access
**Configuration**: Update Portainer stack to bind to Tailscale interface only
---
## 5. Monitoring & Automation
### 5.1 Deploy node-exporter
**Script**: `/workspace/homelab/scripts/setup_monitoring.sh`
**Steps**:
1. Run: `sudo bash /workspace/homelab/scripts/setup_monitoring.sh`
2. Wait for deployment to complete
**Verification**:
```bash
docker service ps monitoring_node-exporter
curl http://192.168.1.196:9100/metrics
```
### 5.2 Configure Grafana Alerts
**Rules**: `/workspace/homelab/monitoring/grafana/alert_rules.yml`
**Steps**:
1. The setup script copies alert rules to Grafana
2. Login to Grafana UI
3. Navigate to Alerting > Alert Rules
4. Verify rules are loaded
**Verification**:
- Check Grafana UI for alert rules
- Trigger test alert (e.g., high CPU load)
---
## 6. Backup Strategy
### 6.1 Setup Restic Backups
**Script**: `/workspace/homelab/scripts/install_restic_backup.sh`
**Steps**:
1. Create Backblaze B2 bucket
2. Get B2 account ID and key
3. Update `/workspace/homelab/scripts/backup_daily.sh` with credentials
4. Run: `sudo bash /workspace/homelab/scripts/install_restic_backup.sh`
**Verification**:
```bash
sudo systemctl status restic-backup.timer
sudo systemctl list-timers
# Manual test run
sudo /workspace/homelab/scripts/backup_daily.sh
```
### 6.2 Verify Backups
```bash
# Check snapshots
export RESTIC_REPOSITORY="b2:your-bucket:/backups"
export RESTIC_PASSWORD="your_password"
restic snapshots
# Restore test
restic restore latest --target /tmp/restore-test
```
---
## Rollback Procedures
### If network upgrade fails:
- Reconnect to old switch
- Remove VLAN configurations
- Restart networking: `sudo systemctl restart networking`
### If ZFS pool creation fails:
- Destroy pool: `sudo zpool destroy tank`
- Verify data on SSDs before retrying
### If Traefik Swarm migration fails:
- Restart standalone Traefik on Pi 4
- Remove Swarm service: `docker service rm traefik_traefik`
### If backups fail:
- Check B2 credentials
- Verify network connectivity
- Check restic logs: `/var/log/restic_backup.log`
---
## Post-Deployment Checklist
- [ ] All nodes have 2.5 Gb connectivity
- [ ] VLANs configured and isolated
- [ ] ZFS pool created and healthy
- [ ] NAS mounted on all nodes
- [ ] Traefik Swarm service running with 2 replicas
- [ ] Caddy fallback operational
- [ ] fail2ban protecting manager VM
- [ ] Firewall rules active
- [ ] node-exporter running on all nodes
- [ ] Grafana alerts configured
- [ ] Restic backups running daily
- [ ] Health checks added to critical services
---
## Support & Troubleshooting
Refer to individual guide files for detailed troubleshooting:
- [NAS Mount Guide](/workspace/homelab/docs/guides/NAS_Mount_Guide.md)
- [Health Checks Guide](/workspace/homelab/docs/guides/health_checks.md)
- [Homelab Configuration](/workspace/homelab/docs/guides/Homelab.md)
For script issues, check logs in `/var/log/` and Docker logs: `docker service logs <service>`