Commit graph

9 commits

Author SHA1 Message Date
Tellsanguis
8e33ae0c2d fix(terraform): Use linstor_storage for production VMs
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Successful in 17s
CD - Deploy Infrastructure / Deploy on pve1 (push) Failing after 1m24s
CD - Deploy Infrastructure / Deploy on pve2 (push) Failing after 2m28s
CD - Deploy Infrastructure / Deploy on pve3 (push) Successful in 2m29s
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
Configuration stockage:
- k3s-server-1 (acemagician): linstor_storage (HA)
- k3s-server-2 (elitedesk): linstor_storage (HA)
- etcd-witness (thinkpad): local-lvm (pas de satellite LINSTOR)

Clone depuis templates locaux (VMID 10000/10001/10002) vers stockage cible.
2025-11-26 18:53:39 +01:00
Tellsanguis
67d46bceac fix(terraform): Use local-lvm storage for VM disks
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Successful in 16s
CD - Deploy Infrastructure / Deploy on pve1 (push) Failing after 2m23s
CD - Deploy Infrastructure / Deploy on pve2 (push) Failing after 2m35s
CD - Deploy Infrastructure / Deploy on pve3 (push) Failing after 10s
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
LINSTOR ne supporte pas le clonage de VMs (ni full ni linked clone).
Solution: utiliser local-lvm pour les disques des VMs.

- Ajout variables k3s_server_X_storage_pool avec default local-lvm
- Mise à jour du workflow deploy.yml
- Retour à full_clone = true
2025-11-26 18:04:58 +01:00
Tellsanguis
2ccccc5ce1 fix(terraform): Configure storage for LINSTOR cluster topology
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Failing after 5s
CD - Deploy Infrastructure / Deploy on pve1 (push) Has been skipped
CD - Deploy Infrastructure / Deploy on pve2 (push) Has been skipped
CD - Deploy Infrastructure / Deploy on pve3 (push) Has been skipped
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
Storage configuration changes:
- Add full_clone=true for better LINSTOR compatibility
- Add replicate=1 to disk config for k3s servers on LINSTOR
- Configure etcd-witness to use local-lvm storage on thinkpad
- Add etcd_witness_storage_pool variable with local-lvm default

Fixes:
- etcd-witness now uses local storage since thinkpad is LINSTOR controller only
- k3s-server-1 and k3s-server-2 use LINSTOR replicated storage on acemagician/elitedesk
- Explicit replication flag helps LINSTOR create resources correctly
2025-11-26 17:51:03 +01:00
Tellsanguis
912e27c30f fix(cd): Add OpenTofu setup step to all deployment jobs
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Successful in 17s
CD - Deploy Infrastructure / Deploy on pve1 (push) Failing after 9s
CD - Deploy Infrastructure / Deploy on pve2 (push) Failing after 8s
CD - Deploy Infrastructure / Deploy on pve3 (push) Failing after 8s
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
Deployment jobs were failing with 'tofu: command not found'. Added Setup OpenTofu step to deploy-pve1, deploy-pve2, and deploy-pve3 jobs.
2025-11-13 20:03:49 +01:00
Tellsanguis
aaedb0db3a fix(cd): Replace reusable workflow with inline CI jobs
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Successful in 19s
CD - Deploy Infrastructure / Deploy on pve1 (push) Failing after 2s
CD - Deploy Infrastructure / Deploy on pve2 (push) Failing after 2s
CD - Deploy Infrastructure / Deploy on pve3 (push) Failing after 3s
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
Forgejo does not fully support reusable workflows (uses:). Duplicated the Terraform validation job directly in the CD workflow to avoid the blocking state.
2025-11-13 20:00:53 +01:00
Tellsanguis
9103a64669 fix(ci): Rename secrets to avoid FORGEJO_ prefix restriction
Some checks failed
CD - Deploy Infrastructure / ci (push) Waiting to run
CD - Deploy Infrastructure / Deploy on pve1 (push) Blocked by required conditions
CD - Deploy Infrastructure / Deploy on pve2 (push) Blocked by required conditions
CD - Deploy Infrastructure / Deploy on pve3 (push) Blocked by required conditions
CD - Deploy Infrastructure / Validate K3s Cluster (push) Blocked by required conditions
CD - Deploy Infrastructure / Deployment Notification (push) Blocked by required conditions
CI - Validation / Terraform Validation (push) Successful in 1m4s
CI - Validation / Ansible Validation (push) Has been cancelled
CI - Validation / Kubernetes Validation (push) Has been cancelled
CI - Validation / Security Scan (push) Has been cancelled
Forgejo does not allow secret names starting with FORGEJO_. Renamed:
- FORGEJO_TOKEN -> GIT_TOKEN
- FORGEJO_REPO_URL -> GIT_REPO_URL
2025-11-13 19:41:46 +01:00
Tellsanguis
739854a371 feat(ci): Update deployment workflow for Ubuntu 24.04 and LINSTOR
Some checks failed
CD - Deploy Infrastructure / ci (push) Waiting to run
CD - Deploy Infrastructure / Deploy on pve1 (push) Blocked by required conditions
CD - Deploy Infrastructure / Deploy on pve2 (push) Blocked by required conditions
CD - Deploy Infrastructure / Deploy on pve3 (push) Blocked by required conditions
CD - Deploy Infrastructure / Validate K3s Cluster (push) Blocked by required conditions
CD - Deploy Infrastructure / Deployment Notification (push) Blocked by required conditions
CI - Validation / Terraform Validation (push) Failing after 1m4s
CI - Validation / Ansible Validation (push) Successful in 1m27s
CI - Validation / Kubernetes Validation (push) Successful in 8s
CI - Validation / Security Scan (push) Successful in 17s
Update all deployment jobs to use Ubuntu 24.04 LTS template and LINSTOR storage backend for improved reliability.
2025-11-13 19:06:25 +01:00
Tellsanguis
78d3a46d31 feat(ci): Add HA support for node failures
Some checks are pending
CI - Validation / Terraform Validation (push) Waiting to run
CI - Validation / Ansible Validation (push) Waiting to run
CI - Validation / Kubernetes Validation (push) Waiting to run
CI - Validation / Security Scan (push) Waiting to run
Modified CI/CD workflows to gracefully handle Proxmox node failures:

CI Workflow (ci.yml):
- Terraform Plan only runs on main branch (faster CI on feature branches)
- Plan failures on unavailable nodes don't block validation
- Added warning message when plan fails

Deploy Workflow (deploy.yml):
- Added continue-on-error to all deploy jobs (pve1, pve2, pve3)
- Modified cluster validation to require 2/3 nodes (quorum)
- Enhanced deployment summary with success counter
- Exit codes: 0 if >=2 nodes, 1 if 1 node, 1 if 0 nodes

This ensures the infrastructure remains operational even when one
Proxmox node is down, maintaining HA principles.
2025-11-07 11:32:42 +01:00
Tellsanguis
850045e7ed feat: Initial commit 2025-11-07 09:33:38 +01:00