Initial commit: homelab configuration and documentation
This commit is contained in:
286
README.md
Normal file
286
README.md
Normal file
@@ -0,0 +1,286 @@
|
||||
# Home Lab Improvements - Complete Implementation
|
||||
|
||||
This repository contains all the configurations, scripts, and documentation for comprehensive homelab improvements.
|
||||
|
||||
## 📋 Overview
|
||||
|
||||
A complete implementation plan for upgrading a home lab infrastructure with focus on:
|
||||
- Network performance and segmentation
|
||||
- Storage redundancy and performance
|
||||
- Service resilience and high availability
|
||||
- Security hardening
|
||||
- Comprehensive monitoring
|
||||
- Automated backups
|
||||
|
||||
## 🗂️ Repository Structure
|
||||
|
||||
```
|
||||
/workspace/homelab/
|
||||
├── docs/
|
||||
│ └── guides/
|
||||
│ ├── Homelab.md # Main homelab configuration
|
||||
│ ├── DEPLOYMENT_GUIDE.md # Step-by-step deployment instructions
|
||||
│ ├── NAS_Mount_Guide.md # NAS mounting procedures
|
||||
│ └── health_checks.md # Health check configurations
|
||||
├── scripts/
|
||||
│ ├── zfs_setup.sh # ZFS pool creation
|
||||
│ ├── prune_ai_models.sh # AI model cache cleanup
|
||||
│ ├── install_fail2ban.sh # Security installation
|
||||
│ ├── vlan_firewall.sh # VLAN/firewall configuration
|
||||
│ ├── setup_monitoring.sh # Monitoring deployment
|
||||
│ ├── backup_daily.sh # Restic backup script
|
||||
│ ├── install_restic_backup.sh # Backup system installation
|
||||
│ ├── deploy_all.sh # Master deployment orchestrator
|
||||
│ ├── validate_deployment.sh # Deployment validation
|
||||
│ ├── network_performance_test.sh # Network speed testing
|
||||
│ ├── setup_log_rotation.sh # Log rotation config
|
||||
│ └── quick_status.sh # Quick health dashboard
|
||||
├── services/
|
||||
│ ├── swarm/
|
||||
│ │ ├── traefik/
|
||||
│ │ │ └── stack.yml # Traefik HA configuration
|
||||
│ │ └── stacks/
|
||||
│ │ └── node-exporter-stack.yml
|
||||
│ └── standalone/
|
||||
│ └── Caddy/
|
||||
│ ├── docker-compose.yml # Fallback proxy
|
||||
│ ├── Caddyfile # Caddy configuration
|
||||
│ └── maintenance.html # Maintenance page
|
||||
├── security/
|
||||
│ └── fail2ban/
|
||||
│ ├── jail.local # Jail configuration
|
||||
│ └── filter.d/ # Custom filters
|
||||
├── monitoring/
|
||||
│ └── grafana/
|
||||
│ └── alert_rules.yml # Alert definitions
|
||||
└── systemd/
|
||||
├── restic-backup.service # Backup service
|
||||
└── restic-backup.timer # Backup schedule
|
||||
```
|
||||
|
||||
## 🤖 Automation Tools
|
||||
|
||||
### Master Deployment Script
|
||||
```bash
|
||||
# Deploy all improvements with guided prompts
|
||||
sudo bash /workspace/homelab/scripts/deploy_all.sh
|
||||
```
|
||||
|
||||
### Quick Status Dashboard
|
||||
```bash
|
||||
# Get instant overview of homelab health
|
||||
bash /workspace/homelab/scripts/quick_status.sh
|
||||
```
|
||||
|
||||
### Validation & Testing
|
||||
```bash
|
||||
# Validate deployment
|
||||
bash /workspace/homelab/scripts/validate_deployment.sh
|
||||
|
||||
# Test network performance
|
||||
bash /workspace/homelab/scripts/network_performance_test.sh
|
||||
```
|
||||
|
||||
### Log Management
|
||||
```bash
|
||||
# Setup automatic log rotation
|
||||
sudo bash /workspace/homelab/scripts/setup_log_rotation.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
1. **Review the main configuration**:
|
||||
```bash
|
||||
cat /workspace/homelab/docs/guides/Homelab.md
|
||||
```
|
||||
|
||||
2. **Follow the deployment guide**:
|
||||
```bash
|
||||
cat /workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md
|
||||
```
|
||||
|
||||
3. **Make scripts executable**:
|
||||
```bash
|
||||
chmod +x /workspace/homelab/scripts/*.sh
|
||||
```
|
||||
|
||||
## 📦 Components
|
||||
|
||||
### Network Improvements
|
||||
- **2.5 Gb PoE managed switch** (Netgear GS110EMX recommended)
|
||||
- **VLAN segmentation** (Management VLAN 10, Services VLAN 20)
|
||||
- **LACP bonding** on Ryzen node for 5 Gb aggregated bandwidth
|
||||
|
||||
### Storage Enhancements
|
||||
- **ZFS pool** on Proxmox host with compression and snapshots
|
||||
- **Dedicated NAS** with RAID-6 and SSD cache
|
||||
- **Automated pruning** of AI model caches
|
||||
|
||||
### Service Resilience
|
||||
- **Traefik HA**: 2 replicas in Docker Swarm
|
||||
- **Caddy fallback**: Lightweight backup reverse proxy
|
||||
- **Health checks**: Auto-restart for critical services
|
||||
- **Volume separation**: Performance-optimized storage
|
||||
|
||||
### Security Hardening
|
||||
- **fail2ban**: Protection for SSH, Portainer, Traefik
|
||||
- **VLAN firewall rules**: Inter-VLAN traffic control
|
||||
- **VPN-only access**: Portainer restricted to Tailscale
|
||||
- **2FA/OAuth**: Enhanced authentication
|
||||
|
||||
### Monitoring & Automation
|
||||
- **node-exporter**: System metrics on all nodes
|
||||
- **Grafana alerts**: CPU, RAM, disk, uptime monitoring
|
||||
- **Home Assistant backups**: Automated to NAS
|
||||
- **Tailscale metrics**: VPN health monitoring
|
||||
|
||||
### Backup Strategy
|
||||
- **Restic**: Encrypted backups to Backblaze B2
|
||||
- **Daily schedule**: Systemd timer at 02:00 AM
|
||||
- **Retention policy**: 7 daily, 4 weekly, 12 monthly
|
||||
- **Auto-pruning**: Keeps repository clean
|
||||
|
||||
## 🔧 Installation Order
|
||||
|
||||
Follow this sequence to minimize downtime:
|
||||
|
||||
1. **Network Upgrade** (requires brief downtime)
|
||||
- Install new switch
|
||||
- Configure VLANs
|
||||
- Setup LACP bonding
|
||||
|
||||
2. **Storage Enhancements**
|
||||
- Create ZFS pool
|
||||
- Mount NAS shares
|
||||
- Setup pruning cron
|
||||
|
||||
3. **Service Consolidation**
|
||||
- Deploy Traefik Swarm service
|
||||
- Deploy Caddy fallback
|
||||
- Add health checks
|
||||
|
||||
4. **Security Hardening**
|
||||
- Install fail2ban
|
||||
- Configure firewall rules
|
||||
- Restrict Portainer access
|
||||
|
||||
5. **Monitoring & Automation**
|
||||
- Deploy node-exporter
|
||||
- Configure Grafana alerts
|
||||
- Setup Home Assistant backups
|
||||
|
||||
6. **Backup Strategy**
|
||||
- Install restic
|
||||
- Configure B2 repository
|
||||
- Enable systemd timer
|
||||
|
||||
## ✅ Verification
|
||||
|
||||
After deployment, verify each component:
|
||||
|
||||
```bash
|
||||
# Network
|
||||
ethtool eth0 | grep Speed
|
||||
ip -d link show
|
||||
|
||||
# Storage
|
||||
zpool status tank
|
||||
df -h | grep /mnt/nas
|
||||
|
||||
# Services
|
||||
docker service ls
|
||||
docker ps --filter "health=healthy"
|
||||
|
||||
# Security
|
||||
sudo fail2ban-client status
|
||||
sudo iptables -L -n -v
|
||||
|
||||
# Monitoring
|
||||
curl http://192.168.1.196:9100/metrics
|
||||
|
||||
# Backups
|
||||
sudo systemctl status restic-backup.timer
|
||||
```
|
||||
|
||||
## 🛡️ Security Notes
|
||||
|
||||
- Update all placeholder credentials in scripts
|
||||
- Store B2 credentials securely (consider using secrets management)
|
||||
- Review firewall rules before applying
|
||||
- Test fail2ban rules to avoid lockouts
|
||||
- Keep backup encryption password safe
|
||||
|
||||
## 📊 Monitoring Access
|
||||
|
||||
- **Grafana**: http://192.168.1.196:3000
|
||||
- **Portainer**: http://192.168.1.196:9000 (VPN only)
|
||||
- **Prometheus**: http://192.168.1.196:9090
|
||||
- **node-exporter**: http://<node-ip>:9100/metrics
|
||||
|
||||
## 🔄 Maintenance
|
||||
|
||||
### Daily
|
||||
- Automated restic backups at 02:00 AM
|
||||
- AI model cache pruning at 03:00 AM
|
||||
- fail2ban monitoring
|
||||
|
||||
### Weekly
|
||||
- Review Grafana alerts
|
||||
- Check backup snapshots
|
||||
- Monitor disk usage
|
||||
|
||||
### Monthly
|
||||
- Restic repository integrity check (auto on 1st)
|
||||
- Review security logs
|
||||
- Update Docker images
|
||||
|
||||
## 🆘 Disaster Recovery
|
||||
|
||||
Comprehensive disaster recovery procedures are documented in:
|
||||
- [DISASTER_RECOVERY.md](/workspace/homelab/docs/guides/DISASTER_RECOVERY.md)
|
||||
|
||||
Quick recovery for common scenarios:
|
||||
- **Node failure**: Services auto-reschedule to healthy nodes
|
||||
- **Manager down**: Promote worker to manager
|
||||
- **Storage failure**: Restore from restic backups
|
||||
- **Complete disaster**: Full rebuild from B2 backups (~2 hours)
|
||||
|
||||
### Emergency Backup Restore
|
||||
```bash
|
||||
# Install restic
|
||||
sudo apt-get install restic
|
||||
|
||||
# Configure and restore
|
||||
export RESTIC_REPOSITORY="b2:bucket:/backups"
|
||||
export RESTIC_PASSWORD="your_password"
|
||||
restic restore latest --target /tmp/restore
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🆘 Troubleshooting
|
||||
|
||||
Common issues and solutions are documented in:
|
||||
- [DEPLOYMENT_GUIDE.md](/workspace/homelab/docs/guides/DEPLOYMENT_GUIDE.md) - Rollback procedures
|
||||
- [NAS_Mount_Guide.md](/workspace/homelab/docs/guides/NAS_Mount_Guide.md) - Mount issues
|
||||
- Individual script comments - Script-specific troubleshooting
|
||||
|
||||
## 📝 License
|
||||
|
||||
This is a personal homelab configuration. Use and modify as needed for your own setup.
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
Based on best practices from:
|
||||
- Docker Swarm documentation
|
||||
- Traefik documentation
|
||||
- Restic backup documentation
|
||||
- Home Assistant community
|
||||
- r/homelab community
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-11-21
|
||||
**Configuration Version**: 2.0
|
||||
Reference in New Issue
Block a user