# Home Lab Improvements - Complete Implementation This repository contains all the configurations, scripts, and documentation for comprehensive homelab improvements. ## 📋 Overview A complete implementation plan for upgrading a home lab infrastructure with focus on: - Network performance and segmentation - Storage redundancy and performance - Service resilience and high availability - Security hardening - Comprehensive monitoring - Automated backups ## 🗂️ Repository Structure ``` /workspace/homelab/ ├── docs/ │ └── guides/ │ ├── Homelab.md # Main homelab configuration │ ├── DEPLOYMENT_GUIDE.md # Step-by-step deployment instructions │ ├── NAS_Mount_Guide.md # NAS mounting procedures │ ├── health_checks.md # Health check configurations │ └── DNS_SETUP.md # Cloudflare & Pi-hole DNS Configuration ├── scripts/ │ ├── zfs_setup.sh # ZFS pool creation │ ├── prune_ai_models.sh # AI model cache cleanup │ ├── install_fail2ban.sh # Security installation │ ├── vlan_firewall.sh # VLAN/firewall configuration │ ├── setup_monitoring.sh # Monitoring deployment │ ├── backup_daily.sh # Restic backup script │ ├── install_restic_backup.sh # Backup system installation │ ├── deploy_all.sh # Master deployment orchestrator │ ├── validate_deployment.sh # Deployment validation │ ├── network_performance_test.sh # Network speed testing │ ├── setup_log_rotation.sh # Log rotation config │ └── quick_status.sh # Quick health dashboard ├── services/ │ ├── swarm/ │ │ ├── traefik/ │ │ │ └── stack.yml # Traefik HA configuration │ │ └── stacks/ │ │ └── node-exporter-stack.yml │ └── standalone/ │ └── Caddy/ │ ├── docker-compose.yml # Fallback proxy │ ├── Caddyfile # Caddy configuration │ └── maintenance.html # Maintenance page ├── security/ │ └── fail2ban/ │ ├── jail.local # Jail configuration │ └── filter.d/ # Custom filters ├── monitoring/ │ └── grafana/ │ └── alert_rules.yml # Alert definitions └── systemd/ ├── restic-backup.service # Backup service └── restic-backup.timer # Backup schedule ``` ## 🤖 Automation Tools ### Master Deployment Script ```bash # Deploy all improvements with guided prompts sudo bash /workspace/homelab/scripts/deploy_all.sh ``` ### Quick Status Dashboard ```bash # Get instant overview of homelab health bash /workspace/homelab/scripts/quick_status.sh ``` ### Validation & Testing ```bash # Validate deployment bash /workspace/homelab/scripts/validate_deployment.sh # Test network performance bash /workspace/homelab/scripts/network_performance_test.sh ``` ### Log Management ```bash # Setup automatic log rotation sudo bash /workspace/homelab/scripts/setup_log_rotation.sh ``` --- ## 🚀 Quick Start 1. **Review the main configuration**: ```bash cat /workspace/homelab/docs/guides/Homelab.md ``` 2. **Follow the deployment guide**: ```bash cat /workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md ``` 3. **Make scripts executable**: ```bash chmod +x /workspace/homelab/scripts/*.sh ``` ## 📦 Components ### Network Improvements - **2.5 Gb PoE managed switch** (Netgear GS110EMX recommended) - **VLAN segmentation** (Management VLAN 10, Services VLAN 20) - **LACP bonding** on Ryzen node for 5 Gb aggregated bandwidth ### Storage Enhancements - **ZFS pool** on Proxmox host with compression and snapshots - **Dedicated NAS** with RAID-6 and SSD cache - **Automated pruning** of AI model caches ### Service Resilience - **Traefik HA**: 2 replicas in Docker Swarm - **Caddy fallback**: Lightweight backup reverse proxy - **Health checks**: Auto-restart for critical services - **Volume separation**: Performance-optimized storage ### Security Hardening - **fail2ban**: Protection for SSH, Portainer, Traefik - **VLAN firewall rules**: Inter-VLAN traffic control - **VPN-only access**: Portainer restricted to Tailscale - **2FA/OAuth**: Enhanced authentication ### Monitoring & Automation - **node-exporter**: System metrics on all nodes - **Grafana alerts**: CPU, RAM, disk, uptime monitoring - **Home Assistant backups**: Automated to NAS - **Tailscale metrics**: VPN health monitoring ### Backup Strategy - **Restic**: Encrypted backups to Backblaze B2 - **Daily schedule**: Systemd timer at 02:00 AM - **Retention policy**: 7 daily, 4 weekly, 12 monthly - **Auto-pruning**: Keeps repository clean ## 🔧 Installation Order Follow this sequence to minimize downtime: 1. **Network Upgrade** (requires brief downtime) - Install new switch - Configure VLANs - Setup LACP bonding 2. **Storage Enhancements** - Create ZFS pool - Mount NAS shares - Setup pruning cron 3. **Service Consolidation** - Deploy Traefik Swarm service - Deploy Caddy fallback - Add health checks 4. **Security Hardening** - Install fail2ban - Configure firewall rules - Restrict Portainer access 5. **Monitoring & Automation** - Deploy node-exporter - Configure Grafana alerts - Setup Home Assistant backups 6. **Backup Strategy** - Install restic - Configure B2 repository - Enable systemd timer ## ✅ Verification After deployment, verify each component: ```bash # Network ethtool eth0 | grep Speed ip -d link show # Storage zpool status tank df -h | grep /mnt/nas # Services docker service ls docker ps --filter "health=healthy" # Security sudo fail2ban-client status sudo iptables -L -n -v # Monitoring curl http://192.168.1.196:9100/metrics # Backups sudo systemctl status restic-backup.timer ``` ## 🛡️ Security Notes - Update all placeholder credentials in scripts - Store B2 credentials securely (consider using secrets management) - Review firewall rules before applying - Test fail2ban rules to avoid lockouts - Keep backup encryption password safe ## 📊 Monitoring Access - **Grafana**: http://192.168.1.196:3000 - **Portainer**: http://192.168.1.196:9000 (VPN only) - **Prometheus**: http://192.168.1.196:9090 - **node-exporter**: http://:9100/metrics ## 🔄 Maintenance ### Daily - Automated restic backups at 02:00 AM - AI model cache pruning at 03:00 AM - fail2ban monitoring ### Weekly - Review Grafana alerts - Check backup snapshots - Monitor disk usage ### Monthly - Restic repository integrity check (auto on 1st) - Review security logs - Update Docker images ## 🆘 Disaster Recovery Comprehensive disaster recovery procedures are documented in: - [DISASTER_RECOVERY.md](/workspace/homelab/docs/guides/DISASTER_RECOVERY.md) Quick recovery for common scenarios: - **Node failure**: Services auto-reschedule to healthy nodes - **Manager down**: Promote worker to manager - **Storage failure**: Restore from restic backups - **Complete disaster**: Full rebuild from B2 backups (~2 hours) ### Emergency Backup Restore ```bash # Install restic sudo apt-get install restic # Configure and restore export RESTIC_REPOSITORY="b2:bucket:/backups" export RESTIC_PASSWORD="your_password" restic restore latest --target /tmp/restore ``` --- ## 🆘 Troubleshooting Common issues and solutions are documented in: - [DEPLOYMENT_GUIDE.md](/workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md) - Rollback procedures - [NAS_Mount_Guide.md](/workspace/homelab/docs/guides/NAS_Mount_Guide.md) - Mount issues - Individual script comments - Script-specific troubleshooting ## 📝 License This is a personal homelab configuration. Use and modify as needed for your own setup. ## 🙏 Acknowledgments Based on best practices from: - Docker Swarm documentation - Traefik documentation - Restic backup documentation - Home Assistant community - r/homelab community --- **Last Updated**: 2025-11-21 **Configuration Version**: 2.0