Files
Homelab/docs/guides/Homelab.md

271 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# HOMELAB CONFIGURATION SUMMARY — UPDATED 2025-10-31
## NETWORK INFRASTRUCTURE
Main Router: TP-Link BE9300 (2.5 Gb WAN + 4× 2.5 Gb LAN)
Secondary Router: Linksys WRT3200ACM (OpenWRT)
Managed Switch: TP-Link TL-SG608E (1 Gb)
Additional: Apple AirPort Time Capsule (192.168.1.153)
Backbone Speed: 2.5 Gb core / 1 Gb secondary
DNS Architecture: 3× Pi-hole + 3× Unbound (192.168.1.196, .245, .62) with local recursive forwarding
VPN: Tailscale (Pi 4 as exit node)
Reverse Proxy: Traefik (on .196; planned Swarm takeover)
LAN Subnet: 192.168.1.0/24
Notes: Rate-limit prevention on Pi-hole instances, Unbound local caching to accelerate DNS queries
---
## NODE OVERVIEW
192.168.1.81 — Ryzen 3700X Node
• CPU: AMD 8C/16T
• RAM: 6480 GB Current 2 of 4 3200 32gb 4x8gb 3600 availible
• GPU: RTX 4060 Ti
• Network: 2.5 GbE onboard
• Role: Docker Swarm Worker (label=heavy)
• Function: AI compute (LM Studio, Llama.cpp, OpenWebUI, Ollama planned)
• OS: Windows 11 + WSL2 / Fedora (Dual Boot)
• Notes: Primary compute node for high-performance AI workloads. Both OS installations act as interchangeable swarm nodes with the same label.
192.168.1.57 — Acer Aspire R14 (Proxmox Host)
• CPU: Intel i5-6200U (2C/4T)
---
## NETWORK UPGRADE & VLAN
* **Switch**: Install a 2.5Gb PoE managed switch (e.g., Netgear GS110EMX).
* **VLANs**: Create VLAN10 for management, VLAN20 for services. Add router ACLs to isolate traffic.
* **LACP**: Bond two NICs on the Ryzen node for 5Gb aggregated link.
## STORAGE ENHANCEMENTS
* Deploy a dedicated NAS (e.g., Synology DS920+) with RAID6 and SSD cache.
* On Proxmox host, create ZFS pool `tank` on local SSDs (`zpool create tank /dev/sda /dev/sdb`).
* Mount NAS shares on all nodes (`/mnt/nas`).
* Add cron job to prune unused AI model caches.
## SERVICE CONSOLIDATION & RESILIENCE
* Convert standalone Traefik on Pi4 to a DockerSwarm service with 2 replicas.
* Deploy fallback Caddy on PiZero with a static maintenance page.
* Add healthcheck sidecars to critical containers (Portainer, OpenWebUI).
* Separate persistent volumes per stack (AI models on SSD, Nextcloud on NAS).
## SECURITY HARDENING
* Enable router firewall ACLs for interVLAN traffic (allow only required ports).
* Install `fail2ban` on the manager VM.
* Restrict Portainer UI to VPNonly access and enable 2FA/OAuth.
## MONITORING & AUTOMATION
* Deploy `node-exporter` on Proxmox host.
* Create Grafana alerts for CPU >80%, RAM >85%, disk >80%.
* Add HomeAssistant backup automation to NAS.
* Integrate Tailscale metrics via `tailscale_exporter`.
## OFFSITE BACKUP STRATEGY
* Install `restic` on manager VM and initialise Backblaze B2 repo.
* Daily backup script (`/usr/local/bin/backup_daily.sh`) for HA config, Portainer DB, important volumes.
* Systemd timer to run at 02:00AM.
---
• RAM: 8 GB
• Network: 2.5 GbE via USB adapter
• Role: Proxmox Host
• Function: Virtualization host for Apps VM (.196) and OMV (.70)
• Storage: Local SSDs + OMV shared volumes
• Notes: Lightweight node for VMs and containerized storage services
192.168.1.196 — Apps Manager VM (on Acer Proxmox)
CPU: 4
RAM: 4 GB min 6 GB max
• Role: Docker Swarm Manager (label=manager)
• Function: Pi-hole + Unbound + Portainer UI + Traefik reverse proxy
• Architecture: x86 (virtualized)
• Notes: Central orchestration, DNS control, and reverse proxy; Portainer agent installed for remote swarm management
192.168.1.70 — OMV Instance (on Acer)
CPU 2
RAM: 2 GB min 4 GB max
• Role: Network Attached Storage
• Function: Shared Docker volumes, media, VM backups
• Stack: OpenMediaVault 7.x
• Architecture: x86
• Planned: Receive SMB3-reshares from Time Capsule (.153)
• Storage: Docker volumes for AI models, backup directories, and media
• Notes: Central NAS for swarm and LLM storage
192.168.1.245 — Raspberry Pi 4 (8 GB)
• CPU: ARM Quad-Core
• RAM: 8 GB
• Network: 1 GbE
• Role: Docker Swarm Leader (label=leader)
• Function: Home Assistant OS + Portainer Agent + HAOS-based Unbound (via Ubuntu container)
• Standalone Services: Traefik (currently standalone), HAOS Unbound
• Notes: Central smart home automation hub; swarm leader for container orchestration; plan for Swarm Traefik to take over existing Traefik instance
192.168.1.62 — Raspberry Pi Zero 2 W
• CPU: ARM Quad-Core
• RAM: 512 MB
• Network: 100 Mb Ethernet
• Role: Docker Swarm Worker (label=light)
• Function: Lightweight DNS + Pi-hole + Unbound + auxiliary containers
• Notes: Low-power node for background jobs, DNS redundancy, and monitoring tasks
192.168.1.153 — Apple AirPort Time Capsule
• Network: 1 GbE via WRT3200ACM
• Role: Backup storage and SMB bridge
• Function: Time Machine backups (SMB1)
• Planned: Reshare SMB1 → SMB3 via OMV (.70) for modern clients
• Notes: Source for macOS backups; will integrate into OMV NAS for consolidation
---
## DOCKER SWARM CLUSTER
Leader 192.168.1.245 (Pi 4, label=leader)
Manager 192.168.1.196 (Apps VM, label=manager)
Worker (Fedora) 192.168.1.81 (Ryzen, label=heavy)
Worker (Light) 192.168.1.62 (Pi Zero 2 W, label=light)
Cluster Functions:
• Distributed container orchestration across x86 + ARM
• High-availability DNS via Pi-hole + Unbound replicas
• Unified management and reverse proxy on the manager node
• Specific workload placement using node labels (heavy, leader, manager)
• AI/ML workloads pinned to the 'heavy' node for performance
• General application services pinned to the 'leader' node
• Core services like Traefik and Portainer pinned to the 'manager' node
---
## STACKS
### Networking Stack
**Traefik:** Reverse Proxy
**whoami:** Service for testing Traefik
### Monitoring Stack
**Prometheus:** Metrics collection
**Grafana:** Metrics visualization
**Alertmanager:** Alerting
**Node-exporter:** Node metrics exporter
**cAdvisor:** Container metrics exporter
### Tools Stack
**Portainer:** Swarm Management
**Dozzle:** Log viewing
**Lazydocker:** Terminal UI for Docker
**TSDProxy:** Tailscale Docker Proxy
**Watchtower:** Container Updates
### Application Stack
**OpenWebUI:** AI Frontend
**Paperless-ngx:** Document Management
**Stirling-PDF:** PDF utility
**SearXNG:** Metasearch engine
### Productivity Stack
**Nextcloud:** Cloud storage and collaboration
---
## SERVICES MAP
**Manager Node (.196):**
**Networking Stack:** Traefik
**Monitoring Stack:** Prometheus, Grafana
**Tools Stack:** Portainer, Dozzle, Lazydocker, TSDProxy, Watchtower
**Leader Node (.245):**
**Application Stack:** Paperless-ngx, Stirling-PDF, SearXNG
**Productivity Stack:** Nextcloud
**Heavy Worker Node (.81):**
**Application Stack:** OpenWebUI
**Light Worker Node (.62):**
**Networking Stack:** whoami
**Other Services:**
**VPN:** Tailscale (Pi4 exit node)
**Virtualization:** Proxmox VE (.57)
**Storage:** OMV NAS (.70) + Time Capsule (.153)
---
## STORAGE & BACKUPS
OMV (.70) — shared Docker volumes, LLM models, media, backup directories
Time Capsule (.153) — legacy SMB1 source; planned SMB3 reshare via OMV
External SSDs/HDDs — portable compute, LLM scratch storage, media archives
Time Machine clients — macOS systems
Planned Workflow:
• Mount Time Capsule SMB1 share in OMV via CIFS
• Reshare through OMV Samba as SMB3
• Sync critical backups to OMV and external drives
• AI models stored on NVMe + OMV volumes for high-speed access
---
## PERFORMANCE STRATEGY
• 2.5 Gb backbone: Ryzen (.81) + Acer (.57) nodes
• 1 Gb nodes: Pi 4 (.245) + Time Capsule (.153)
• 100 Mb node: Pi Zero 2 W (.62)
• ARM nodes for low-power/auxiliary tasks
• x86 nodes for AI, storage, and compute-intensive containers
• Swarm resource labeling for workload isolation
• DNS redundancy and rate-limit protection
• Unified monitoring via Portainer + Home Assistant
• GPU-intensive AI containers pinned to Ryzen node for efficiency
• Traefik migration plan: standalone .245 → Swarm-managed cluster routing
---
## NOTES
• Acer Proxmox hosts OMV (.70) and Apps Manager VM (.196)
• Ryzen (.81) dedicated to AI and heavy Docker tasks
• HAOS Pi 4 (.245) leader, automation hub, and temporary standalone Traefik
• DNS load balanced among .62, .196, and .245
• Time Capsule (.153) planned SMB1→SMB3 reshare via OMV
• Network speed distribution: Ryzen/Acer = 2.5 Gb, Pi 4/Time Capsule = 1 Gb, Pi Zero 2 W = 100 Mb
• LLM models stored on high-speed NVMe on Ryzen, backed up to OMV and external drives
• No personal identifiers included in this record
# END CONFIG
---
## SMART HOME INTEGRATION
### LIGHTING & CONTROLS
• Philips Hue
- Devices: Hue remote only (no bulbs)
- Connectivity: Zigbee
- Automation: Integrated into Home Assistant OS (.245)
- Notes: Remote used to trigger HAOS scenes and routines for other smart devices
• Govee Smart Lights & Sensors
- Devices: RGB LED strips, motion sensors, temperature/humidity sensors
- Connectivity: Wi-Fi
- Automation: Home Assistant via MQTT / cloud integration
- Notes: Motion-triggered lighting and environmental monitoring
• TP-Link / Tapo Smart Devices
- Devices: Tapo lightbulbs, Kasa smart power strip
- Connectivity: Wi-Fi
- Automation: Home Assistant + Kasa/Tapo integration
- Notes: Power scheduling and energy monitoring
### AUDIO & VIDEO
• TVs: Multiple 4K Smart TVs
- Platforms: Fire Stick, Apple devices, console inputs
- Connectivity: Ethernet (1 Gb) or Wi-Fi
- Automation: HAOS scenes, volume control, source switching
• Streaming & Consoles:
- Devices: Fire Stick, PS5, Nintendo Switch
- Connectivity: Ethernet or Wi-Fi
- Notes: Automated on/off with Home Assistant, media triggers
### SECURITY & SENSORS
• Vivint Security System
- Devices: Motion detectors, door/window sensors, cameras
- Connectivity: Proprietary protocol + cloud
- Automation: Home Assistant integrations for alerts and scene triggers
• Environmental Sensors
- Devices: Govee temperature/humidity, Tapo sensors
- Connectivity: Wi-Fi
- Automation: Trigger HVAC, lights, or notifications