Docs: Add homelab improvement guide and update README

2025-12-27 19:15:05 -06:00
parent f0c525d0df
commit cf360234c1
2 changed files with 83 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -12,6 +12,10 @@ A complete implementation plan for upgrading a home lab infrastructure with focu
 - Comprehensive monitoring
 - Automated backups

+## 💡 Homelab Improvement Guide
+
+For recommendations on how to improve the efficiency, reliability, and security of your homelab, please see the [Homelab Improvement Guide](./docs/guides/IMPROVEMENT_GUIDE.md).
+
 ## 🗂️ Repository Structure

 ```
--- a/docs/guides/IMPROVEMENT_GUIDE.md
+++ b/docs/guides/IMPROVEMENT_GUIDE.md
@@ -0,0 +1,79 @@
+# Homelab Improvement Guide
+
+This guide provides recommendations for improving the efficiency, reliability, and security of your homelab.
+
+## 1. High Availability
+
+Your current setup has a single point of failure for several services due to placement constraints tying them to a single node. To improve high availability, we recommend the following:
+
+*   **Remove Single-Node Constraints:** In your Docker Swarm service definitions (`applications-stack.yml`, `monitoring-stack.yml`), remove the following placement constraints:
+    *   `node.labels.leader == true`
+    *   `node.role == manager`
+*   **Replicate Services:** Increase the replica count for critical services to at least `2`. This will ensure that the services remain available if a node goes down. For example, in your `applications-stack.yml`:
+
+    ```yaml
+    services:
+      paperless:
+        # ...
+        deploy:
+          replicas: 2
+          # ...
+    ```
+
+*   **Stateful Services:** For stateful services like databases, consider the following options:
+    *   **Distributed Database:** Use a database designed for high availability, such as Galera Cluster for MySQL or Patroni for PostgreSQL.
+    *   **Shared Storage:** Use a shared storage solution like NFS or GlusterFS that is accessible from all nodes in the swarm.
+
+## 2. Hardware Efficiency
+
+*   **Resource Limit Tuning:** Your current resource limits are a good starting point, but they can be optimized. Use your monitoring stack (Prometheus and Grafana) to analyze the actual resource usage of your services over time. Adjust the `limits` and `reservations` in your `docker-compose.yml` files to better match the actual usage. This will prevent over-provisioning and improve hardware utilization.
+
+*   **Node Affinity:** If you have nodes with specific hardware (e.g., GPUs), use node labels and placement constraints to schedule services on the appropriate nodes. For example:
+
+    ```yaml
+    services:
+      jellyfin:
+        # ...
+        deploy:
+          placement:
+            constraints:
+              - node.labels.gpu == true
+    ```
+
+## 3. Security
+
+*   **Secret Management:**
+    *   **Paperless Secret Key:** The `PAPERLESS_SECRET_KEY` in `applications-stack.yml` should be stored as a Docker secret.
+        1.  Create the secret:
+            ```bash
+            openssl rand -hex 32 | docker secret create paperless_secret_key -
+            ```
+        2.  Update your `applications-stack.yml`:
+            ```yaml
+            services:
+              paperless:
+                # ...
+                secrets:
+                  - paperless_secret_key
+                environment:
+                  # ...
+                  - PAPERLESS_SECRET_KEY_FILE: /run/secrets/paperless_secret_key
+            ```
+    *   **Backup Credentials:** The Backblaze B2 credentials in `backup_daily.sh` should be stored as Docker secrets. You can then mount these secrets into the container that runs the backup script.
+
+*   **Network Policies:** Implement Docker Swarm network policies to restrict traffic between services. This adds an extra layer of security to your homelab.
+
+## 4. Quality of Life
+
+*   **Automated Backup Verification:** Extend your `backup_daily.sh` script to include a step that automatically verifies the integrity of your backups. `restic check` can be used for this purpose.
+
+*   **Centralized Logging:** For easier log analysis, consider setting up a centralized logging solution like the ELK stack (Elasticsearch, Logstash, Kibana) or Grafana Loki.
+
+*   **Documentation:**
+    *   **Architecture Diagram:** Create a diagram of your network architecture and service dependencies. This will make it easier to understand and troubleshoot your homelab.
+    *   **Update `README.md`:** Add a link to this guide in your main `README.md` file.
+
+## 5. `tsdproxy`
+
+*   **Review Configuration:** The search results suggest that `tsdproxy` can be complex to set up in a multi-host Docker Swarm. Review your `tsdproxy` configuration to ensure it is working correctly. Check the `tsdproxy` logs for any errors.
+*   **Consult Documentation:** If you encounter issues, consult the official `tsdproxy` documentation and GitHub issues for troubleshooting tips.