Tellsanguis a45c3734a0 Ajout plugin zoom image et combinaison affichage PNG avec téléchargement PDF

- Install docusaurus-plugin-image-zoom for interactive image zoom
- Display PNG diagrams with zoom capability
- Add PDF download links below each diagram
- Keep both PNG (for display) and PDF (for download) versions
- Update both French and English documentation pages

2025-11-25 17:13:03 +01:00

4.3 KiB

Raw Blame History

sidebar_position
2

First Version: Single-Machine "HA" Homelab (Initial Project)

:::info Full English translation coming soon. :::

Introduction

Important note: This page describes the initial project I had planned to experiment with Kubernetes. This project evolved into a different final decision: a 3-node Proxmox cluster (see 3-Node Proxmox Cluster).

The initial idea was to create a transitional step toward a complete distributed infrastructure, experimenting with Kubernetes (K3S), Infrastructure as Code (OpenTofu/Terraform), Git, and CI/CD pipelines, while remaining on a single physical machine.

Objectives

Practical Learning

This single-machine infrastructure allows acquiring hands-on experience with:

Kubernetes (K3S): Installation, configuration, and management
Infrastructure as Code: OpenTofu/Terraform for declarative infrastructure
GitOps and CI/CD: Automated deployments with Forgejo Actions
Observability: Prometheus, Grafana, Loki stack

Network Architecture

📥 Download PDF

What Can Be Learned

This single-machine infrastructure allows acquiring essential skills:

Kubernetes deployments and management
DevOps practices (IaC, GitOps, CI/CD)
Monitoring and logging
Automation with Ansible and OpenTofu

Limitations of This Approach

1. No Real High Availability (HA)

Main limitation: With a single machine, there is no redundancy:

Single point of failure (SPOF)
Maintenance requires downtime
No automatic failover

2. Distributed Storage Impossible to Test

Critical limitation: Distributed storage (Ceph, Linstor DRBD, Longhorn with replication) requires at least 3 nodes:

Ceph: Requires 3 nodes minimum (ideally 5+) for quorum and replication
Linstor DRBD: Needs multiple nodes for synchronous data replication
Longhorn (replication): Cannot replicate data to other nodes

3. Limited Scalability

Cannot add worker nodes to increase capacity
Hardware limitations of single machine
No experience with horizontal auto-scaling

4. Simplified Network

All pods on the same physical machine
Negligible network latency
No multi-node CNI complexity

5. No Realistic Failure Simulation

Cannot simulate node failure
No automatic failover testing
No disaster recovery validation

Why Start with Single Machine?

Despite limitations, this approach has significant advantages:

1. Cost and Simplicity

Reduced investment (no need to buy 3-5 servers immediately)
Lower power consumption
Simplified maintenance

2. Progressive Learning Curve

Manageable complexity
Simplified debugging
Less costly mistakes

3. Architecture Validation

Test which services work well on K8S
Optimize resource configurations
Identify incompatibilities before scaling

4. Preparation for Evolution

This version serves as a foundation for the complete cluster:

Reusable IaC code
Tested and validated Kubernetes manifests
Operational CI/CD pipelines

Evolution Toward Real Cluster

Once stabilized, evolution toward multi-node cluster becomes natural:

Minimum for functional HA cluster:

3 nodes (1 control plane + 2 workers, or 3 mixed nodes)
Gigabit network switch
Distributed storage (Ceph ideally requires 5 nodes)

Migration strategy:

Add second node to existing cluster
Test pod distribution between nodes
Add third node to enable HA
Deploy Ceph or Linstor for distributed storage
Migrate critical workloads with replication

Conclusion

This single-machine "HA" version is an essential pedagogical step before deploying a real Kubernetes cluster:

Positive points:

Learn Kubernetes without multi-node complexity
Validate architecture and configurations
Reduced cost and simplified maintenance
Solid foundation to evolve toward complete cluster

Assumed limitations:

No real high availability
Distributed storage impossible to test (Ceph, Linstor)
Limited scalability
No realistic failure simulation

This approach allows methodical progression toward a complete cloud-native infrastructure while mastering each step of the process.

:::note Detailed English translation of this page is in progress. :::

4.3 KiB Raw Blame History