Initial commit: homelab configuration and documentation
This commit is contained in:
329
docs/guides/DEPLOYMENT_GUIDE.md
Normal file
329
docs/guides/DEPLOYMENT_GUIDE.md
Normal file
@@ -0,0 +1,329 @@
|
||||
# Home Lab Improvements - Deployment Guide
|
||||
|
||||
This guide provides step-by-step instructions for deploying all the homelab improvements.
|
||||
|
||||
## Table of Contents
|
||||
1. [Network Upgrade](#network-upgrade)
|
||||
2. [Storage Enhancements](#storage-enhancements)
|
||||
3. [Service Consolidation](#service-consolidation)
|
||||
4. [Security Hardening](#security-hardening)
|
||||
5. [Monitoring & Automation](#monitoring--automation)
|
||||
6. [Backup Strategy](#backup-strategy)
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
- SSH access to all nodes
|
||||
- Root/sudo privileges
|
||||
- Docker Swarm cluster operational
|
||||
- Backblaze B2 account (for backups)
|
||||
|
||||
---
|
||||
|
||||
## 1. Network Upgrade
|
||||
|
||||
### 1.1 Install 2.5 Gb PoE Switch
|
||||
**Hardware**: Netgear GS110EMX or equivalent
|
||||
|
||||
**Steps**:
|
||||
1. Power down affected nodes
|
||||
2. Install new switch
|
||||
3. Connect all 2.5 Gb nodes (Ryzen .81, Acer .57)
|
||||
4. Connect 1 Gb nodes (Pi 4 .245, Time Capsule .153)
|
||||
5. Power on and verify link speeds
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
# On each node, check link speed:
|
||||
ethtool eth0 | grep Speed
|
||||
```
|
||||
|
||||
### 1.2 Configure VLANs
|
||||
**Script**: `/workspace/homelab/scripts/vlan_firewall.sh`
|
||||
|
||||
**Steps**:
|
||||
1. Create VLAN 10 (Management): 192.168.10.0/24
|
||||
2. Create VLAN 20 (Services): 192.168.20.0/24
|
||||
3. Configure router ACLs using the firewall script
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
# Check VLAN configuration
|
||||
ip -d link show
|
||||
|
||||
# Test VLAN isolation
|
||||
ping 192.168.10.1 # from VLAN 20 (should fail for restricted ports)
|
||||
```
|
||||
|
||||
### 1.3 Configure LACP Bonding (Ryzen Node)
|
||||
**Note**: Requires two NICs on the Ryzen node
|
||||
|
||||
**Configuration** (`/etc/network/interfaces.d/bond0.cfg`):
|
||||
```
|
||||
auto bond0
|
||||
iface bond0 inet static
|
||||
address 192.168.1.81
|
||||
netmask 255.255.255.0
|
||||
gateway 192.168.1.1
|
||||
bond-mode 802.3ad
|
||||
bond-miimon 100
|
||||
bond-slaves eth0 eth1
|
||||
```
|
||||
|
||||
**Apply**:
|
||||
```bash
|
||||
sudo systemctl restart networking
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Storage Enhancements
|
||||
|
||||
### 2.1 Create ZFS Pool on Proxmox Host
|
||||
**Script**: `/workspace/homelab/scripts/zfs_setup.sh`
|
||||
|
||||
**Steps**:
|
||||
1. SSH to Proxmox host (192.168.1.57)
|
||||
2. Identify SSD devices: `lsblk`
|
||||
3. Update script with correct device names
|
||||
4. Run: `sudo bash /workspace/homelab/scripts/zfs_setup.sh`
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
zpool status tank
|
||||
zfs list
|
||||
```
|
||||
|
||||
### 2.2 Mount NAS on All Nodes
|
||||
**Guide**: `/workspace/homelab/docs/guides/NAS_Mount_Guide.md`
|
||||
|
||||
**Steps**:
|
||||
1. Follow the NAS Mount Guide for each node
|
||||
2. Create credentials file
|
||||
3. Add to `/etc/fstab`
|
||||
4. Mount: `sudo mount -a`
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
df -h | grep /mnt/nas
|
||||
ls -la /mnt/nas
|
||||
```
|
||||
|
||||
### 2.3 Setup AI Model Pruning
|
||||
**Script**: `/workspace/homelab/scripts/prune_ai_models.sh`
|
||||
|
||||
**Steps**:
|
||||
1. Update MODEL_DIR path in script
|
||||
2. Make executable: `chmod +x /workspace/homelab/scripts/prune_ai_models.sh`
|
||||
3. Add to cron: `crontab -e`
|
||||
```
|
||||
0 3 * * * /workspace/homelab/scripts/prune_ai_models.sh
|
||||
```
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
# Test run
|
||||
sudo /workspace/homelab/scripts/prune_ai_models.sh
|
||||
|
||||
# Check cron logs
|
||||
grep CRON /var/log/syslog
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Service Consolidation
|
||||
|
||||
### 3.1 Deploy Traefik Swarm Service
|
||||
**Stack**: `/workspace/homelab/services/swarm/traefik/stack.yml`
|
||||
|
||||
**Steps**:
|
||||
1. Review and update stack.yml if needed
|
||||
2. Deploy: `docker stack deploy -c /workspace/homelab/services/swarm/traefik/stack.yml traefik`
|
||||
3. Remove standalone Traefik on Pi 4
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
docker service ls | grep traefik
|
||||
docker service ps traefik_traefik
|
||||
curl -I http://192.168.1.196
|
||||
```
|
||||
|
||||
### 3.2 Deploy Caddy Fallback (Pi Zero)
|
||||
**Location**: `/workspace/homelab/services/standalone/Caddy/`
|
||||
|
||||
**Steps**:
|
||||
1. SSH to Pi Zero (192.168.1.62)
|
||||
2. Copy Caddy files to node
|
||||
3. Run: `docker-compose up -d`
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
docker ps | grep caddy
|
||||
curl http://192.168.1.62:8080
|
||||
```
|
||||
|
||||
### 3.3 Add Health Checks
|
||||
**Guide**: `/workspace/homelab/docs/guides/health_checks.md`
|
||||
|
||||
**Steps**:
|
||||
1. Review health check examples
|
||||
2. Update service stack files for critical containers
|
||||
3. Redeploy services: `docker stack deploy ...`
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
docker ps --filter "health=healthy"
|
||||
docker inspect <container> | jq '.[0].State.Health'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Security Hardening
|
||||
|
||||
### 4.1 Install fail2ban on Manager VM
|
||||
**Script**: `/workspace/homelab/scripts/install_fail2ban.sh`
|
||||
|
||||
**Steps**:
|
||||
1. SSH to manager VM (192.168.1.196)
|
||||
2. Run: `sudo bash /workspace/homelab/scripts/install_fail2ban.sh`
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
sudo fail2ban-client status
|
||||
sudo fail2ban-client status sshd
|
||||
sudo tail -f /var/log/fail2ban.log
|
||||
```
|
||||
|
||||
### 4.2 Configure Firewall Rules
|
||||
**Script**: `/workspace/homelab/scripts/vlan_firewall.sh`
|
||||
|
||||
**Steps**:
|
||||
1. Review script and adjust VLANs/ports as needed
|
||||
2. Run: `sudo bash /workspace/homelab/scripts/vlan_firewall.sh`
|
||||
3. Configure router ACLs via web UI
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
sudo iptables -L -n -v
|
||||
# Test port accessibility from different VLANs
|
||||
```
|
||||
|
||||
### 4.3 Restrict Portainer Access
|
||||
**Options**:
|
||||
- Configure Tailscale VPN-only access
|
||||
- Enable OAuth integration
|
||||
- Add firewall rules to block public access
|
||||
|
||||
**Configuration**: Update Portainer stack to bind to Tailscale interface only
|
||||
|
||||
---
|
||||
|
||||
## 5. Monitoring & Automation
|
||||
|
||||
### 5.1 Deploy node-exporter
|
||||
**Script**: `/workspace/homelab/scripts/setup_monitoring.sh`
|
||||
|
||||
**Steps**:
|
||||
1. Run: `sudo bash /workspace/homelab/scripts/setup_monitoring.sh`
|
||||
2. Wait for deployment to complete
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
docker service ps monitoring_node-exporter
|
||||
curl http://192.168.1.196:9100/metrics
|
||||
```
|
||||
|
||||
### 5.2 Configure Grafana Alerts
|
||||
**Rules**: `/workspace/homelab/monitoring/grafana/alert_rules.yml`
|
||||
|
||||
**Steps**:
|
||||
1. The setup script copies alert rules to Grafana
|
||||
2. Login to Grafana UI
|
||||
3. Navigate to Alerting > Alert Rules
|
||||
4. Verify rules are loaded
|
||||
|
||||
**Verification**:
|
||||
- Check Grafana UI for alert rules
|
||||
- Trigger test alert (e.g., high CPU load)
|
||||
|
||||
---
|
||||
|
||||
## 6. Backup Strategy
|
||||
|
||||
### 6.1 Setup Restic Backups
|
||||
**Script**: `/workspace/homelab/scripts/install_restic_backup.sh`
|
||||
|
||||
**Steps**:
|
||||
1. Create Backblaze B2 bucket
|
||||
2. Get B2 account ID and key
|
||||
3. Update `/workspace/homelab/scripts/backup_daily.sh` with credentials
|
||||
4. Run: `sudo bash /workspace/homelab/scripts/install_restic_backup.sh`
|
||||
|
||||
**Verification**:
|
||||
```bash
|
||||
sudo systemctl status restic-backup.timer
|
||||
sudo systemctl list-timers
|
||||
# Manual test run
|
||||
sudo /workspace/homelab/scripts/backup_daily.sh
|
||||
```
|
||||
|
||||
### 6.2 Verify Backups
|
||||
```bash
|
||||
# Check snapshots
|
||||
export RESTIC_REPOSITORY="b2:your-bucket:/backups"
|
||||
export RESTIC_PASSWORD="your_password"
|
||||
restic snapshots
|
||||
|
||||
# Restore test
|
||||
restic restore latest --target /tmp/restore-test
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### If network upgrade fails:
|
||||
- Reconnect to old switch
|
||||
- Remove VLAN configurations
|
||||
- Restart networking: `sudo systemctl restart networking`
|
||||
|
||||
### If ZFS pool creation fails:
|
||||
- Destroy pool: `sudo zpool destroy tank`
|
||||
- Verify data on SSDs before retrying
|
||||
|
||||
### If Traefik Swarm migration fails:
|
||||
- Restart standalone Traefik on Pi 4
|
||||
- Remove Swarm service: `docker service rm traefik_traefik`
|
||||
|
||||
### If backups fail:
|
||||
- Check B2 credentials
|
||||
- Verify network connectivity
|
||||
- Check restic logs: `/var/log/restic_backup.log`
|
||||
|
||||
---
|
||||
|
||||
## Post-Deployment Checklist
|
||||
|
||||
- [ ] All nodes have 2.5 Gb connectivity
|
||||
- [ ] VLANs configured and isolated
|
||||
- [ ] ZFS pool created and healthy
|
||||
- [ ] NAS mounted on all nodes
|
||||
- [ ] Traefik Swarm service running with 2 replicas
|
||||
- [ ] Caddy fallback operational
|
||||
- [ ] fail2ban protecting manager VM
|
||||
- [ ] Firewall rules active
|
||||
- [ ] node-exporter running on all nodes
|
||||
- [ ] Grafana alerts configured
|
||||
- [ ] Restic backups running daily
|
||||
- [ ] Health checks added to critical services
|
||||
|
||||
---
|
||||
|
||||
## Support & Troubleshooting
|
||||
|
||||
Refer to individual guide files for detailed troubleshooting:
|
||||
- [NAS Mount Guide](/workspace/homelab/docs/guides/NAS_Mount_Guide.md)
|
||||
- [Health Checks Guide](/workspace/homelab/docs/guides/health_checks.md)
|
||||
- [Homelab Configuration](/workspace/homelab/docs/guides/Homelab.md)
|
||||
|
||||
For script issues, check logs in `/var/log/` and Docker logs: `docker service logs <service>`
|
||||
Reference in New Issue
Block a user