Add Pi-hole with AdGuard DOH/DOT integration, reorganize swarm stacks, add DNS/n8n docs

This commit is contained in:
2025-12-18 15:38:57 +00:00
parent 827f8bbf9d
commit f0c525d0df
44 changed files with 3013 additions and 486 deletions

49
docs/guides/DNS_SETUP.md Normal file
View File

@@ -0,0 +1,49 @@
# DNS Configuration Guide (Cloudflare & Pi-hole)
To ensure reliable connectivity to your Traefik Swarm services both internally and externally, a "Split Horizon" DNS strategy is used. This configuration ensures that internal clients resolve services to the local LAN IP, while external traffic (if configured) uses the public IP.
## 1. Cloudflare (Public DNS)
Cloudflare manages the public zone for `sterl.xyz`. This is required for:
1. **Let's Encrypt Wildcard Certificates**: Traefik uses the `CF_DNS_API_TOKEN` to create temporary TXT records for validation.
2. **External Access**: If you open ports 80/443 on your router, these records direct traffic to your home.
### Required Records
| Type | Name | Content | Proxy Status |
| :--- | :--- | :--- | :--- |
| **A** | `sterl.xyz` | `[Your Public IP]` | Proxied (Orange Cloud) optional* |
| **CNAME** | `*.sterl.xyz` | `sterl.xyz` | Proxied (Orange Cloud) optional* |
> **Note**: If `Proxied` is enabled, you benefit from Cloudflare's DDoS protection, but you will only see Cloudflare IPs in your logs unless `TrustedProxies` is configured in Traefik.
## 2. Pi-hole (Internal DNS)
For devices inside your home network (`192.168.1.0/24`), you must prevent them from going out to the internet just to come back in (NAT Loopback). Instead, Pi-hole should resolve these domains directly to the Docker Swarm Manager (Traefik).
### The "A Record Shift"
Instead of defining every single service (`grafana.sterl.xyz`, `plex.sterl.xyz`, etc.), we use a **Wildcard DNS Record** in Pi-hole.
**Configuration:**
1. Login to Pi-hole.
2. Go to **Local DNS** > **DNS Records**.
3. Add the follow records:
| Domain | IP Address | Description |
| :--- | :--- | :--- |
| `sterl.xyz` | `192.168.1.196` | Swarm Manager / Traefik Entrypoint |
| `*.sterl.xyz` | `192.168.1.196` | **Wildcard Catch-all** for all subdomains |
> **Important**: `192.168.1.196` is your designated Traefik entry point (Manager Node). Ensure Traefik is running on this node or reachable via the Swarm Ingress Mesh on this IP.
### Why this works
* **External Request**: `whoami.sterl.xyz` -> Cloudflare -> Public IP -> Router Port Forward (80/443) -> Traefik VIP.
* **Internal Request**: `whoami.sterl.xyz` -> Pi-hole -> `192.168.1.196` -> Traefik (Directly).
## 3. Verification
From a computer on your network, run:
```bash
nslookup whoami.sterl.xyz
```
**Expected Result**: `192.168.1.196` (The local LAN IP).
If you see a public IP, your Pi-hole configuration is not active or cached. Flush DNS keys (`ipconfig /flushdns` or `sudo systemd-resolve --flush-caches`).

View File

@@ -0,0 +1,203 @@
# n8n Troubleshooting Guide
## Connection Loss / Frequent Disconnects
### Problem
n8n UI shows "Connection Lost" errors and logs contain:
```
ValidationError: The 'X-Forwarded-For' header is set but the Express 'trust proxy' setting is false
```
### Root Cause
n8n is behind Traefik reverse proxy which sets `X-Forwarded-For` headers, but n8n's Express app doesn't trust the proxy by default. This breaks rate limiting and causes connection issues.
### Solution
Add these environment variables to n8n configuration:
```yaml
environment:
- N8N_SECURE_COOKIE=false # Required when TLS is terminated at reverse proxy
- N8N_METRICS=true # Enable metrics endpoint
```
**Status:** ✅ Fixed in current configuration
---
## Deprecation Warnings
### 1. SQLite Pool Size
**Warning:**
```
Running SQLite without a pool of read connections is deprecated
```
**Fix:**
```yaml
- DB_SQLITE_POOL_SIZE=10
```
### 2. Task Runners
**Warning:**
```
Running n8n without task runners is deprecated
```
**Fix:**
```yaml
- N8N_RUNNERS_ENABLED=true
```
### 3. Environment Variable Access
**Warning:**
```
The default value of N8N_BLOCK_ENV_ACCESS_IN_NODE will change from false to true
```
**Fix:**
```yaml
- N8N_BLOCK_ENV_ACCESS_IN_NODE=false # Allow Code Node to access env vars
```
### 4. Git Node Bare Repos
**Warning:**
```
Support for bare repositories in the Git Node will be removed
```
**Fix:**
```yaml
- N8N_GIT_NODE_DISABLE_BARE_REPOS=true
```
**Status:** ✅ All fixed in current configuration
---
## Resource Issues
### Out of Memory Errors
If n8n crashes or becomes unresponsive:
**Check current limits:**
```bash
docker service inspect n8n_n8n --format '{{json .Spec.TaskTemplate.Resources}}'
```
**Recommended settings:**
```yaml
resources:
limits:
memory: 2G # Increased from 1G
cpus: '1.0' # Increased from 0.5
reservations:
memory: 256M
cpus: '0.1'
```
---
## LM Studio Connection Issues
### Problem
Workflows fail to connect to LM Studio at `http://lm-studio:1234`
### Diagnostics
```bash
# Check if extra_hosts is configured
docker service inspect n8n_n8n | grep -A 5 ExtraHosts
# Test from n8n container
docker exec $(docker ps -q -f name=n8n) curl http://lm-studio:1234/v1/models
# Test direct connection
curl http://192.168.1.81:1234/v1/models
```
### Solution
Ensure `extra_hosts` is configured in n8n-stack.yml:
```yaml
extra_hosts:
- "lm-studio:192.168.1.81"
- "fedora:192.168.1.81"
```
---
## Deployment
### Apply Configuration Changes
```bash
# Update the stack
docker stack deploy -c /workspace/homelab/services/swarm/stacks/n8n-stack.yml n8n
# Watch service update
docker service ps n8n_n8n --no-trunc
# Check logs
docker service logs n8n_n8n --tail 50 --follow
```
### Verify Health
```bash
# Check service status
docker service ls | grep n8n
# Test health endpoint
curl https://n8n.sj98.duckdns.org/healthz
# Check Traefik routing
docker service logs traefik_traefik --tail 20 | grep n8n
```
---
## Common Workflow Issues
### Webhook Not Triggering
1. Ensure workflow is **active** (toggle in UI)
2. Check webhook URL format: `https://n8n.sj98.duckdns.org/webhook/<webhook-id>`
3. Verify Traefik routing in logs
4. Test with curl:
```bash
curl -X POST https://n8n.sj98.duckdns.org/webhook/health-check
```
### Execute Command Node Fails
Ensure Docker socket is mounted:
```yaml
volumes:
- /var/run/docker.sock:/var/run/docker.sock
```
### AI Node Timeout
Increase timeout in HTTP Request node options:
```json
{
"timeout": 30000 // 30 seconds
}
```
---
## Monitoring
### Check n8n Metrics
If `N8N_METRICS=true` is set:
```bash
curl http://localhost:5678/metrics
```
### View Execution History
1. Open n8n UI
2. Go to "Executions"
3. Filter by "Failed" to see errors
4. Click execution to see detailed logs
### Resource Usage
```bash
# Container stats
docker stats $(docker ps -q -f name=n8n)
# Service resource usage
docker service ps n8n_n8n
```