Initial commit: homelab configuration and documentation

This commit is contained in:
2025-11-29 19:03:14 +00:00
commit 0769ca6888
72 changed files with 7806 additions and 0 deletions

286
README.md Normal file
View File

@@ -0,0 +1,286 @@
# Home Lab Improvements - Complete Implementation
This repository contains all the configurations, scripts, and documentation for comprehensive homelab improvements.
## 📋 Overview
A complete implementation plan for upgrading a home lab infrastructure with focus on:
- Network performance and segmentation
- Storage redundancy and performance
- Service resilience and high availability
- Security hardening
- Comprehensive monitoring
- Automated backups
## 🗂️ Repository Structure
```
/workspace/homelab/
├── docs/
│ └── guides/
│ ├── Homelab.md # Main homelab configuration
│ ├── DEPLOYMENT_GUIDE.md # Step-by-step deployment instructions
│ ├── NAS_Mount_Guide.md # NAS mounting procedures
│ └── health_checks.md # Health check configurations
├── scripts/
│ ├── zfs_setup.sh # ZFS pool creation
│ ├── prune_ai_models.sh # AI model cache cleanup
│ ├── install_fail2ban.sh # Security installation
│ ├── vlan_firewall.sh # VLAN/firewall configuration
│ ├── setup_monitoring.sh # Monitoring deployment
│ ├── backup_daily.sh # Restic backup script
│ ├── install_restic_backup.sh # Backup system installation
│ ├── deploy_all.sh # Master deployment orchestrator
│ ├── validate_deployment.sh # Deployment validation
│ ├── network_performance_test.sh # Network speed testing
│ ├── setup_log_rotation.sh # Log rotation config
│ └── quick_status.sh # Quick health dashboard
├── services/
│ ├── swarm/
│ │ ├── traefik/
│ │ │ └── stack.yml # Traefik HA configuration
│ │ └── stacks/
│ │ └── node-exporter-stack.yml
│ └── standalone/
│ └── Caddy/
│ ├── docker-compose.yml # Fallback proxy
│ ├── Caddyfile # Caddy configuration
│ └── maintenance.html # Maintenance page
├── security/
│ └── fail2ban/
│ ├── jail.local # Jail configuration
│ └── filter.d/ # Custom filters
├── monitoring/
│ └── grafana/
│ └── alert_rules.yml # Alert definitions
└── systemd/
├── restic-backup.service # Backup service
└── restic-backup.timer # Backup schedule
```
## 🤖 Automation Tools
### Master Deployment Script
```bash
# Deploy all improvements with guided prompts
sudo bash /workspace/homelab/scripts/deploy_all.sh
```
### Quick Status Dashboard
```bash
# Get instant overview of homelab health
bash /workspace/homelab/scripts/quick_status.sh
```
### Validation & Testing
```bash
# Validate deployment
bash /workspace/homelab/scripts/validate_deployment.sh
# Test network performance
bash /workspace/homelab/scripts/network_performance_test.sh
```
### Log Management
```bash
# Setup automatic log rotation
sudo bash /workspace/homelab/scripts/setup_log_rotation.sh
```
---
## 🚀 Quick Start
1. **Review the main configuration**:
```bash
cat /workspace/homelab/docs/guides/Homelab.md
```
2. **Follow the deployment guide**:
```bash
cat /workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md
```
3. **Make scripts executable**:
```bash
chmod +x /workspace/homelab/scripts/*.sh
```
## 📦 Components
### Network Improvements
- **2.5 Gb PoE managed switch** (Netgear GS110EMX recommended)
- **VLAN segmentation** (Management VLAN 10, Services VLAN 20)
- **LACP bonding** on Ryzen node for 5 Gb aggregated bandwidth
### Storage Enhancements
- **ZFS pool** on Proxmox host with compression and snapshots
- **Dedicated NAS** with RAID-6 and SSD cache
- **Automated pruning** of AI model caches
### Service Resilience
- **Traefik HA**: 2 replicas in Docker Swarm
- **Caddy fallback**: Lightweight backup reverse proxy
- **Health checks**: Auto-restart for critical services
- **Volume separation**: Performance-optimized storage
### Security Hardening
- **fail2ban**: Protection for SSH, Portainer, Traefik
- **VLAN firewall rules**: Inter-VLAN traffic control
- **VPN-only access**: Portainer restricted to Tailscale
- **2FA/OAuth**: Enhanced authentication
### Monitoring & Automation
- **node-exporter**: System metrics on all nodes
- **Grafana alerts**: CPU, RAM, disk, uptime monitoring
- **Home Assistant backups**: Automated to NAS
- **Tailscale metrics**: VPN health monitoring
### Backup Strategy
- **Restic**: Encrypted backups to Backblaze B2
- **Daily schedule**: Systemd timer at 02:00 AM
- **Retention policy**: 7 daily, 4 weekly, 12 monthly
- **Auto-pruning**: Keeps repository clean
## 🔧 Installation Order
Follow this sequence to minimize downtime:
1. **Network Upgrade** (requires brief downtime)
- Install new switch
- Configure VLANs
- Setup LACP bonding
2. **Storage Enhancements**
- Create ZFS pool
- Mount NAS shares
- Setup pruning cron
3. **Service Consolidation**
- Deploy Traefik Swarm service
- Deploy Caddy fallback
- Add health checks
4. **Security Hardening**
- Install fail2ban
- Configure firewall rules
- Restrict Portainer access
5. **Monitoring & Automation**
- Deploy node-exporter
- Configure Grafana alerts
- Setup Home Assistant backups
6. **Backup Strategy**
- Install restic
- Configure B2 repository
- Enable systemd timer
## ✅ Verification
After deployment, verify each component:
```bash
# Network
ethtool eth0 | grep Speed
ip -d link show
# Storage
zpool status tank
df -h | grep /mnt/nas
# Services
docker service ls
docker ps --filter "health=healthy"
# Security
sudo fail2ban-client status
sudo iptables -L -n -v
# Monitoring
curl http://192.168.1.196:9100/metrics
# Backups
sudo systemctl status restic-backup.timer
```
## 🛡️ Security Notes
- Update all placeholder credentials in scripts
- Store B2 credentials securely (consider using secrets management)
- Review firewall rules before applying
- Test fail2ban rules to avoid lockouts
- Keep backup encryption password safe
## 📊 Monitoring Access
- **Grafana**: http://192.168.1.196:3000
- **Portainer**: http://192.168.1.196:9000 (VPN only)
- **Prometheus**: http://192.168.1.196:9090
- **node-exporter**: http://<node-ip>:9100/metrics
## 🔄 Maintenance
### Daily
- Automated restic backups at 02:00 AM
- AI model cache pruning at 03:00 AM
- fail2ban monitoring
### Weekly
- Review Grafana alerts
- Check backup snapshots
- Monitor disk usage
### Monthly
- Restic repository integrity check (auto on 1st)
- Review security logs
- Update Docker images
## 🆘 Disaster Recovery
Comprehensive disaster recovery procedures are documented in:
- [DISASTER_RECOVERY.md](/workspace/homelab/docs/guides/DISASTER_RECOVERY.md)
Quick recovery for common scenarios:
- **Node failure**: Services auto-reschedule to healthy nodes
- **Manager down**: Promote worker to manager
- **Storage failure**: Restore from restic backups
- **Complete disaster**: Full rebuild from B2 backups (~2 hours)
### Emergency Backup Restore
```bash
# Install restic
sudo apt-get install restic
# Configure and restore
export RESTIC_REPOSITORY="b2:bucket:/backups"
export RESTIC_PASSWORD="your_password"
restic restore latest --target /tmp/restore
```
---
## 🆘 Troubleshooting
Common issues and solutions are documented in:
- [DEPLOYMENT_GUIDE.md](/workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md) - Rollback procedures
- [NAS_Mount_Guide.md](/workspace/homelab/docs/guides/NAS_Mount_Guide.md) - Mount issues
- Individual script comments - Script-specific troubleshooting
## 📝 License
This is a personal homelab configuration. Use and modify as needed for your own setup.
## 🙏 Acknowledgments
Based on best practices from:
- Docker Swarm documentation
- Traefik documentation
- Restic backup documentation
- Home Assistant community
- r/homelab community
---
**Last Updated**: 2025-11-21
**Configuration Version**: 2.0