292 lines
8.3 KiB
Markdown
292 lines
8.3 KiB
Markdown
# Home Lab Improvements - Complete Implementation
|
|
|
|
This repository contains all the configurations, scripts, and documentation for comprehensive homelab improvements.
|
|
|
|
## 📋 Overview
|
|
|
|
A complete implementation plan for upgrading a home lab infrastructure with focus on:
|
|
- Network performance and segmentation
|
|
- Storage redundancy and performance
|
|
- Service resilience and high availability
|
|
- Security hardening
|
|
- Comprehensive monitoring
|
|
- Automated backups
|
|
|
|
## 💡 Homelab Improvement Guide
|
|
|
|
For recommendations on how to improve the efficiency, reliability, and security of your homelab, please see the [Homelab Improvement Guide](./docs/guides/IMPROVEMENT_GUIDE.md).
|
|
|
|
## 🗂️ Repository Structure
|
|
|
|
```
|
|
/workspace/homelab/
|
|
├── docs/
|
|
│ └── guides/
|
|
│ ├── Homelab.md # Main homelab configuration
|
|
│ ├── DEPLOYMENT_GUIDE.md # Step-by-step deployment instructions
|
|
│ ├── NAS_Mount_Guide.md # NAS mounting procedures
|
|
│ ├── health_checks.md # Health check configurations
|
|
│ └── DNS_SETUP.md # Cloudflare & Pi-hole DNS Configuration
|
|
├── scripts/
|
|
│ ├── zfs_setup.sh # ZFS pool creation
|
|
│ ├── prune_ai_models.sh # AI model cache cleanup
|
|
│ ├── install_fail2ban.sh # Security installation
|
|
│ ├── vlan_firewall.sh # VLAN/firewall configuration
|
|
│ ├── setup_monitoring.sh # Monitoring deployment
|
|
│ ├── backup_daily.sh # Restic backup script
|
|
│ ├── install_restic_backup.sh # Backup system installation
|
|
│ ├── deploy_all.sh # Master deployment orchestrator
|
|
│ ├── validate_deployment.sh # Deployment validation
|
|
│ ├── network_performance_test.sh # Network speed testing
|
|
│ ├── setup_log_rotation.sh # Log rotation config
|
|
│ └── quick_status.sh # Quick health dashboard
|
|
├── services/
|
|
│ ├── swarm/
|
|
│ │ ├── traefik/
|
|
│ │ │ └── stack.yml # Traefik HA configuration
|
|
│ │ └── stacks/
|
|
│ │ └── node-exporter-stack.yml
|
|
│ └── standalone/
|
|
│ └── Caddy/
|
|
│ ├── docker-compose.yml # Fallback proxy
|
|
│ ├── Caddyfile # Caddy configuration
|
|
│ └── maintenance.html # Maintenance page
|
|
├── security/
|
|
│ └── fail2ban/
|
|
│ ├── jail.local # Jail configuration
|
|
│ └── filter.d/ # Custom filters
|
|
├── monitoring/
|
|
│ └── grafana/
|
|
│ └── alert_rules.yml # Alert definitions
|
|
└── systemd/
|
|
├── restic-backup.service # Backup service
|
|
└── restic-backup.timer # Backup schedule
|
|
```
|
|
|
|
## 🤖 Automation Tools
|
|
|
|
### Master Deployment Script
|
|
```bash
|
|
# Deploy all improvements with guided prompts
|
|
sudo bash /workspace/homelab/scripts/deploy_all.sh
|
|
```
|
|
|
|
### Quick Status Dashboard
|
|
```bash
|
|
# Get instant overview of homelab health
|
|
bash /workspace/homelab/scripts/quick_status.sh
|
|
```
|
|
|
|
### Validation & Testing
|
|
```bash
|
|
# Validate deployment
|
|
bash /workspace/homelab/scripts/validate_deployment.sh
|
|
|
|
# Test network performance
|
|
bash /workspace/homelab/scripts/network_performance_test.sh
|
|
```
|
|
|
|
### Log Management
|
|
```bash
|
|
# Setup automatic log rotation
|
|
sudo bash /workspace/homelab/scripts/setup_log_rotation.sh
|
|
```
|
|
|
|
---
|
|
|
|
## 🚀 Quick Start
|
|
|
|
1. **Review the main configuration**:
|
|
```bash
|
|
cat /workspace/homelab/docs/guides/Homelab.md
|
|
```
|
|
|
|
2. **Follow the deployment guide**:
|
|
```bash
|
|
cat /workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md
|
|
```
|
|
|
|
3. **Make scripts executable**:
|
|
```bash
|
|
chmod +x /workspace/homelab/scripts/*.sh
|
|
```
|
|
|
|
## 📦 Components
|
|
|
|
### Network Improvements
|
|
- **2.5 Gb PoE managed switch** (Netgear GS110EMX recommended)
|
|
- **VLAN segmentation** (Management VLAN 10, Services VLAN 20)
|
|
- **LACP bonding** on Ryzen node for 5 Gb aggregated bandwidth
|
|
|
|
### Storage Enhancements
|
|
- **ZFS pool** on Proxmox host with compression and snapshots
|
|
- **Dedicated NAS** with RAID-6 and SSD cache
|
|
- **Automated pruning** of AI model caches
|
|
|
|
### Service Resilience
|
|
- **Traefik HA**: 2 replicas in Docker Swarm
|
|
- **Caddy fallback**: Lightweight backup reverse proxy
|
|
- **Health checks**: Auto-restart for critical services
|
|
- **Volume separation**: Performance-optimized storage
|
|
|
|
### Security Hardening
|
|
- **fail2ban**: Protection for SSH, Portainer, Traefik
|
|
- **VLAN firewall rules**: Inter-VLAN traffic control
|
|
- **VPN-only access**: Portainer restricted to Tailscale
|
|
- **2FA/OAuth**: Enhanced authentication
|
|
|
|
### Monitoring & Automation
|
|
- **node-exporter**: System metrics on all nodes
|
|
- **Grafana alerts**: CPU, RAM, disk, uptime monitoring
|
|
- **Home Assistant backups**: Automated to NAS
|
|
- **Tailscale metrics**: VPN health monitoring
|
|
|
|
### Backup Strategy
|
|
- **Restic**: Encrypted backups to Backblaze B2
|
|
- **Daily schedule**: Systemd timer at 02:00 AM
|
|
- **Retention policy**: 7 daily, 4 weekly, 12 monthly
|
|
- **Auto-pruning**: Keeps repository clean
|
|
|
|
## 🔧 Installation Order
|
|
|
|
Follow this sequence to minimize downtime:
|
|
|
|
1. **Network Upgrade** (requires brief downtime)
|
|
- Install new switch
|
|
- Configure VLANs
|
|
- Setup LACP bonding
|
|
|
|
2. **Storage Enhancements**
|
|
- Create ZFS pool
|
|
- Mount NAS shares
|
|
- Setup pruning cron
|
|
|
|
3. **Service Consolidation**
|
|
- Deploy Traefik Swarm service
|
|
- Deploy Caddy fallback
|
|
- Add health checks
|
|
|
|
4. **Security Hardening**
|
|
- Install fail2ban
|
|
- Configure firewall rules
|
|
- Restrict Portainer access
|
|
|
|
5. **Monitoring & Automation**
|
|
- Deploy node-exporter
|
|
- Configure Grafana alerts
|
|
- Setup Home Assistant backups
|
|
|
|
6. **Backup Strategy**
|
|
- Install restic
|
|
- Configure B2 repository
|
|
- Enable systemd timer
|
|
|
|
## ✅ Verification
|
|
|
|
After deployment, verify each component:
|
|
|
|
```bash
|
|
# Network
|
|
ethtool eth0 | grep Speed
|
|
ip -d link show
|
|
|
|
# Storage
|
|
zpool status tank
|
|
df -h | grep /mnt/nas
|
|
|
|
# Services
|
|
docker service ls
|
|
docker ps --filter "health=healthy"
|
|
|
|
# Security
|
|
sudo fail2ban-client status
|
|
sudo iptables -L -n -v
|
|
|
|
# Monitoring
|
|
curl http://192.168.1.196:9100/metrics
|
|
|
|
# Backups
|
|
sudo systemctl status restic-backup.timer
|
|
```
|
|
|
|
## 🛡️ Security Notes
|
|
|
|
- Update all placeholder credentials in scripts
|
|
- Store B2 credentials securely (consider using secrets management)
|
|
- Review firewall rules before applying
|
|
- Test fail2ban rules to avoid lockouts
|
|
- Keep backup encryption password safe
|
|
|
|
## 📊 Monitoring Access
|
|
|
|
- **Grafana**: http://192.168.1.196:3000
|
|
- **Portainer**: http://192.168.1.196:9000 (VPN only)
|
|
- **Prometheus**: http://192.168.1.196:9090
|
|
- **node-exporter**: http://<node-ip>:9100/metrics
|
|
|
|
## 🔄 Maintenance
|
|
|
|
### Daily
|
|
- Automated restic backups at 02:00 AM
|
|
- AI model cache pruning at 03:00 AM
|
|
- fail2ban monitoring
|
|
|
|
### Weekly
|
|
- Review Grafana alerts
|
|
- Check backup snapshots
|
|
- Monitor disk usage
|
|
|
|
### Monthly
|
|
- Restic repository integrity check (auto on 1st)
|
|
- Review security logs
|
|
- Update Docker images
|
|
|
|
## 🆘 Disaster Recovery
|
|
|
|
Comprehensive disaster recovery procedures are documented in:
|
|
- [DISASTER_RECOVERY.md](/workspace/homelab/docs/guides/DISASTER_RECOVERY.md)
|
|
|
|
Quick recovery for common scenarios:
|
|
- **Node failure**: Services auto-reschedule to healthy nodes
|
|
- **Manager down**: Promote worker to manager
|
|
- **Storage failure**: Restore from restic backups
|
|
- **Complete disaster**: Full rebuild from B2 backups (~2 hours)
|
|
|
|
### Emergency Backup Restore
|
|
```bash
|
|
# Install restic
|
|
sudo apt-get install restic
|
|
|
|
# Configure and restore
|
|
export RESTIC_REPOSITORY="b2:bucket:/backups"
|
|
export RESTIC_PASSWORD="your_password"
|
|
restic restore latest --target /tmp/restore
|
|
```
|
|
|
|
---
|
|
|
|
## 🆘 Troubleshooting
|
|
|
|
Common issues and solutions are documented in:
|
|
- [DEPLOYMENT_GUIDE.md](/workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md) - Rollback procedures
|
|
- [NAS_Mount_Guide.md](/workspace/homelab/docs/guides/NAS_Mount_Guide.md) - Mount issues
|
|
- Individual script comments - Script-specific troubleshooting
|
|
|
|
## 📝 License
|
|
|
|
This is a personal homelab configuration. Use and modify as needed for your own setup.
|
|
|
|
## 🙏 Acknowledgments
|
|
|
|
Based on best practices from:
|
|
- Docker Swarm documentation
|
|
- Traefik documentation
|
|
- Restic backup documentation
|
|
- Home Assistant community
|
|
- r/homelab community
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-11-21
|
|
**Configuration Version**: 2.0
|