Initial commit: homelab configuration and documentation

This commit is contained in:
2025-11-29 19:03:14 +00:00
commit 0769ca6888
72 changed files with 7806 additions and 0 deletions

270
docs/guides/Homelab.md Normal file
View File

@@ -0,0 +1,270 @@
# HOMELAB CONFIGURATION SUMMARY — UPDATED 2025-10-31
## NETWORK INFRASTRUCTURE
Main Router: TP-Link BE9300 (2.5 Gb WAN + 4× 2.5 Gb LAN)
Secondary Router: Linksys WRT3200ACM (OpenWRT)
Managed Switch: TP-Link TL-SG608E (1 Gb)
Additional: Apple AirPort Time Capsule (192.168.1.153)
Backbone Speed: 2.5 Gb core / 1 Gb secondary
DNS Architecture: 3× Pi-hole + 3× Unbound (192.168.1.196, .245, .62) with local recursive forwarding
VPN: Tailscale (Pi 4 as exit node)
Reverse Proxy: Traefik (on .196; planned Swarm takeover)
LAN Subnet: 192.168.1.0/24
Notes: Rate-limit prevention on Pi-hole instances, Unbound local caching to accelerate DNS queries
---
## NODE OVERVIEW
192.168.1.81 — Ryzen 3700X Node
• CPU: AMD 8C/16T
• RAM: 6480 GB Current 2 of 4 3200 32gb 4x8gb 3600 availible
• GPU: RTX 4060 Ti
• Network: 2.5 GbE onboard
• Role: Docker Swarm Worker (label=heavy)
• Function: AI compute (LM Studio, Llama.cpp, OpenWebUI, Ollama planned)
• OS: Windows 11 + WSL2 / Fedora (Dual Boot)
• Notes: Primary compute node for high-performance AI workloads. Both OS installations act as interchangeable swarm nodes with the same label.
192.168.1.57 — Acer Aspire R14 (Proxmox Host)
• CPU: Intel i5-6200U (2C/4T)
---
## NETWORK UPGRADE & VLAN
* **Switch**: Install a 2.5Gb PoE managed switch (e.g., Netgear GS110EMX).
* **VLANs**: Create VLAN10 for management, VLAN20 for services. Add router ACLs to isolate traffic.
* **LACP**: Bond two NICs on the Ryzen node for 5Gb aggregated link.
## STORAGE ENHANCEMENTS
* Deploy a dedicated NAS (e.g., Synology DS920+) with RAID6 and SSD cache.
* On Proxmox host, create ZFS pool `tank` on local SSDs (`zpool create tank /dev/sda /dev/sdb`).
* Mount NAS shares on all nodes (`/mnt/nas`).
* Add cron job to prune unused AI model caches.
## SERVICE CONSOLIDATION & RESILIENCE
* Convert standalone Traefik on Pi4 to a DockerSwarm service with 2 replicas.
* Deploy fallback Caddy on PiZero with a static maintenance page.
* Add healthcheck sidecars to critical containers (Portainer, OpenWebUI).
* Separate persistent volumes per stack (AI models on SSD, Nextcloud on NAS).
## SECURITY HARDENING
* Enable router firewall ACLs for interVLAN traffic (allow only required ports).
* Install `fail2ban` on the manager VM.
* Restrict Portainer UI to VPNonly access and enable 2FA/OAuth.
## MONITORING & AUTOMATION
* Deploy `node-exporter` on Proxmox host.
* Create Grafana alerts for CPU >80%, RAM >85%, disk >80%.
* Add HomeAssistant backup automation to NAS.
* Integrate Tailscale metrics via `tailscale_exporter`.
## OFFSITE BACKUP STRATEGY
* Install `restic` on manager VM and initialise Backblaze B2 repo.
* Daily backup script (`/usr/local/bin/backup_daily.sh`) for HA config, Portainer DB, important volumes.
* Systemd timer to run at 02:00AM.
---
• RAM: 8 GB
• Network: 2.5 GbE via USB adapter
• Role: Proxmox Host
• Function: Virtualization host for Apps VM (.196) and OMV (.70)
• Storage: Local SSDs + OMV shared volumes
• Notes: Lightweight node for VMs and containerized storage services
192.168.1.196 — Apps Manager VM (on Acer Proxmox)
CPU: 4
RAM: 4 GB min 6 GB max
• Role: Docker Swarm Manager (label=manager)
• Function: Pi-hole + Unbound + Portainer UI + Traefik reverse proxy
• Architecture: x86 (virtualized)
• Notes: Central orchestration, DNS control, and reverse proxy; Portainer agent installed for remote swarm management
192.168.1.70 — OMV Instance (on Acer)
CPU 2
RAM: 2 GB min 4 GB max
• Role: Network Attached Storage
• Function: Shared Docker volumes, media, VM backups
• Stack: OpenMediaVault 7.x
• Architecture: x86
• Planned: Receive SMB3-reshares from Time Capsule (.153)
• Storage: Docker volumes for AI models, backup directories, and media
• Notes: Central NAS for swarm and LLM storage
192.168.1.245 — Raspberry Pi 4 (8 GB)
• CPU: ARM Quad-Core
• RAM: 8 GB
• Network: 1 GbE
• Role: Docker Swarm Leader (label=leader)
• Function: Home Assistant OS + Portainer Agent + HAOS-based Unbound (via Ubuntu container)
• Standalone Services: Traefik (currently standalone), HAOS Unbound
• Notes: Central smart home automation hub; swarm leader for container orchestration; plan for Swarm Traefik to take over existing Traefik instance
192.168.1.62 — Raspberry Pi Zero 2 W
• CPU: ARM Quad-Core
• RAM: 512 MB
• Network: 100 Mb Ethernet
• Role: Docker Swarm Worker (label=light)
• Function: Lightweight DNS + Pi-hole + Unbound + auxiliary containers
• Notes: Low-power node for background jobs, DNS redundancy, and monitoring tasks
192.168.1.153 — Apple AirPort Time Capsule
• Network: 1 GbE via WRT3200ACM
• Role: Backup storage and SMB bridge
• Function: Time Machine backups (SMB1)
• Planned: Reshare SMB1 → SMB3 via OMV (.70) for modern clients
• Notes: Source for macOS backups; will integrate into OMV NAS for consolidation
---
## DOCKER SWARM CLUSTER
Leader 192.168.1.245 (Pi 4, label=leader)
Manager 192.168.1.196 (Apps VM, label=manager)
Worker (Fedora) 192.168.1.81 (Ryzen, label=heavy)
Worker (Light) 192.168.1.62 (Pi Zero 2 W, label=light)
Cluster Functions:
• Distributed container orchestration across x86 + ARM
• High-availability DNS via Pi-hole + Unbound replicas
• Unified management and reverse proxy on the manager node
• Specific workload placement using node labels (heavy, leader, manager)
• AI/ML workloads pinned to the 'heavy' node for performance
• General application services pinned to the 'leader' node
• Core services like Traefik and Portainer pinned to the 'manager' node
---
## STACKS
### Networking Stack
**Traefik:** Reverse Proxy
**whoami:** Service for testing Traefik
### Monitoring Stack
**Prometheus:** Metrics collection
**Grafana:** Metrics visualization
**Alertmanager:** Alerting
**Node-exporter:** Node metrics exporter
**cAdvisor:** Container metrics exporter
### Tools Stack
**Portainer:** Swarm Management
**Dozzle:** Log viewing
**Lazydocker:** Terminal UI for Docker
**TSDProxy:** Tailscale Docker Proxy
**Watchtower:** Container Updates
### Application Stack
**OpenWebUI:** AI Frontend
**Paperless-ngx:** Document Management
**Stirling-PDF:** PDF utility
**SearXNG:** Metasearch engine
### Productivity Stack
**Nextcloud:** Cloud storage and collaboration
---
## SERVICES MAP
**Manager Node (.196):**
**Networking Stack:** Traefik
**Monitoring Stack:** Prometheus, Grafana
**Tools Stack:** Portainer, Dozzle, Lazydocker, TSDProxy, Watchtower
**Leader Node (.245):**
**Application Stack:** Paperless-ngx, Stirling-PDF, SearXNG
**Productivity Stack:** Nextcloud
**Heavy Worker Node (.81):**
**Application Stack:** OpenWebUI
**Light Worker Node (.62):**
**Networking Stack:** whoami
**Other Services:**
**VPN:** Tailscale (Pi4 exit node)
**Virtualization:** Proxmox VE (.57)
**Storage:** OMV NAS (.70) + Time Capsule (.153)
---
## STORAGE & BACKUPS
OMV (.70) — shared Docker volumes, LLM models, media, backup directories
Time Capsule (.153) — legacy SMB1 source; planned SMB3 reshare via OMV
External SSDs/HDDs — portable compute, LLM scratch storage, media archives
Time Machine clients — macOS systems
Planned Workflow:
• Mount Time Capsule SMB1 share in OMV via CIFS
• Reshare through OMV Samba as SMB3
• Sync critical backups to OMV and external drives
• AI models stored on NVMe + OMV volumes for high-speed access
---
## PERFORMANCE STRATEGY
• 2.5 Gb backbone: Ryzen (.81) + Acer (.57) nodes
• 1 Gb nodes: Pi 4 (.245) + Time Capsule (.153)
• 100 Mb node: Pi Zero 2 W (.62)
• ARM nodes for low-power/auxiliary tasks
• x86 nodes for AI, storage, and compute-intensive containers
• Swarm resource labeling for workload isolation
• DNS redundancy and rate-limit protection
• Unified monitoring via Portainer + Home Assistant
• GPU-intensive AI containers pinned to Ryzen node for efficiency
• Traefik migration plan: standalone .245 → Swarm-managed cluster routing
---
## NOTES
• Acer Proxmox hosts OMV (.70) and Apps Manager VM (.196)
• Ryzen (.81) dedicated to AI and heavy Docker tasks
• HAOS Pi 4 (.245) leader, automation hub, and temporary standalone Traefik
• DNS load balanced among .62, .196, and .245
• Time Capsule (.153) planned SMB1→SMB3 reshare via OMV
• Network speed distribution: Ryzen/Acer = 2.5 Gb, Pi 4/Time Capsule = 1 Gb, Pi Zero 2 W = 100 Mb
• LLM models stored on high-speed NVMe on Ryzen, backed up to OMV and external drives
• No personal identifiers included in this record
# END CONFIG
---
## SMART HOME INTEGRATION
### LIGHTING & CONTROLS
• Philips Hue
- Devices: Hue remote only (no bulbs)
- Connectivity: Zigbee
- Automation: Integrated into Home Assistant OS (.245)
- Notes: Remote used to trigger HAOS scenes and routines for other smart devices
• Govee Smart Lights & Sensors
- Devices: RGB LED strips, motion sensors, temperature/humidity sensors
- Connectivity: Wi-Fi
- Automation: Home Assistant via MQTT / cloud integration
- Notes: Motion-triggered lighting and environmental monitoring
• TP-Link / Tapo Smart Devices
- Devices: Tapo lightbulbs, Kasa smart power strip
- Connectivity: Wi-Fi
- Automation: Home Assistant + Kasa/Tapo integration
- Notes: Power scheduling and energy monitoring
### AUDIO & VIDEO
• TVs: Multiple 4K Smart TVs
- Platforms: Fire Stick, Apple devices, console inputs
- Connectivity: Ethernet (1 Gb) or Wi-Fi
- Automation: HAOS scenes, volume control, source switching
• Streaming & Consoles:
- Devices: Fire Stick, PS5, Nintendo Switch
- Connectivity: Ethernet or Wi-Fi
- Notes: Automated on/off with Home Assistant, media triggers
### SECURITY & SENSORS
• Vivint Security System
- Devices: Motion detectors, door/window sensors, cameras
- Connectivity: Proprietary protocol + cloud
- Automation: Home Assistant integrations for alerts and scene triggers
• Environmental Sensors
- Devices: Govee temperature/humidity, Tapo sensors
- Connectivity: Wi-Fi
- Automation: Trigger HVAC, lights, or notifications