Commit graph

8 commits

Author SHA1 Message Date
Tellsanguis
eb4c28e413 feat(cicd): Ajouter gestion automatique des ressources DRBD Linstor
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Failing after 5s
CD - Deploy Infrastructure / Deploy on pve1 (push) Has been skipped
CD - Deploy Infrastructure / Deploy on pve2 (push) Has been skipped
CD - Deploy Infrastructure / Deploy on pve3 (push) Has been skipped
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
- Créer script Python pour gérer les ressources DRBD avant déploiement
  * Vérifie l'existence des ressources Linstor
  * Crée les ressources si nécessaire avec réplication
  * Augmente la taille si elle est insuffisante
  * Noms fixes: pm-a7f3c8e1 (VMID 1000) et pm-b4d2f9a3 (VMID 1001)

- Modifier workflow CI/CD pour intégrer le script Python
  * Ajouter étape de configuration SSH avec secret LINSTOR_SSH_PRIVATE_KEY
  * Exécuter le script avant tofu apply sur pve1 et pve2

- Corriger configuration Terraform des VMs
  * Ajouter vga { type = "std" } pour Standard VGA sur toutes les VMs
  * Ajouter cpu { type = "host" } pour meilleure performance
  * Ajouter replace_triggered_by pour détecter les changements de config
  * Ajouter force_create = true sur pve3 pour gérer VM existante

- Résoudre problèmes identifiés
  * "No Bootable Device" - Résolu avec Standard VGA et CPU host
  * "vmId already in use" - Résolu avec force_create sur etcd-witness
  * Détection des modifications de VM - Résolu avec replace_triggered_by

Documentation SSH créée dans cicd_backup/SETUP_SSH_LINSTOR.md
2025-11-27 18:53:23 +01:00
Tellsanguis
42be2b3b6b fix(terraform): Configure cluster nodes and storage
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Successful in 17s
CD - Deploy Infrastructure / Deploy on pve1 (push) Failing after 1m4s
CD - Deploy Infrastructure / Deploy on pve2 (push) Failing after 2m26s
CD - Deploy Infrastructure / Deploy on pve3 (push) Failing after 1m47s
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
2025-11-26 19:33:19 +01:00
Tellsanguis
912e27c30f fix(cd): Add OpenTofu setup step to all deployment jobs
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Successful in 17s
CD - Deploy Infrastructure / Deploy on pve1 (push) Failing after 9s
CD - Deploy Infrastructure / Deploy on pve2 (push) Failing after 8s
CD - Deploy Infrastructure / Deploy on pve3 (push) Failing after 8s
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
Deployment jobs were failing with 'tofu: command not found'. Added Setup OpenTofu step to deploy-pve1, deploy-pve2, and deploy-pve3 jobs.
2025-11-13 20:03:49 +01:00
Tellsanguis
aaedb0db3a fix(cd): Replace reusable workflow with inline CI jobs
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Successful in 19s
CD - Deploy Infrastructure / Deploy on pve1 (push) Failing after 2s
CD - Deploy Infrastructure / Deploy on pve2 (push) Failing after 2s
CD - Deploy Infrastructure / Deploy on pve3 (push) Failing after 3s
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
Forgejo does not fully support reusable workflows (uses:). Duplicated the Terraform validation job directly in the CD workflow to avoid the blocking state.
2025-11-13 20:00:53 +01:00
Tellsanguis
9103a64669 fix(ci): Rename secrets to avoid FORGEJO_ prefix restriction
Some checks failed
CD - Deploy Infrastructure / ci (push) Waiting to run
CD - Deploy Infrastructure / Deploy on pve1 (push) Blocked by required conditions
CD - Deploy Infrastructure / Deploy on pve2 (push) Blocked by required conditions
CD - Deploy Infrastructure / Deploy on pve3 (push) Blocked by required conditions
CD - Deploy Infrastructure / Validate K3s Cluster (push) Blocked by required conditions
CD - Deploy Infrastructure / Deployment Notification (push) Blocked by required conditions
CI - Validation / Terraform Validation (push) Successful in 1m4s
CI - Validation / Ansible Validation (push) Has been cancelled
CI - Validation / Kubernetes Validation (push) Has been cancelled
CI - Validation / Security Scan (push) Has been cancelled
Forgejo does not allow secret names starting with FORGEJO_. Renamed:
- FORGEJO_TOKEN -> GIT_TOKEN
- FORGEJO_REPO_URL -> GIT_REPO_URL
2025-11-13 19:41:46 +01:00
Tellsanguis
739854a371 feat(ci): Update deployment workflow for Ubuntu 24.04 and LINSTOR
Some checks failed
CD - Deploy Infrastructure / ci (push) Waiting to run
CD - Deploy Infrastructure / Deploy on pve1 (push) Blocked by required conditions
CD - Deploy Infrastructure / Deploy on pve2 (push) Blocked by required conditions
CD - Deploy Infrastructure / Deploy on pve3 (push) Blocked by required conditions
CD - Deploy Infrastructure / Validate K3s Cluster (push) Blocked by required conditions
CD - Deploy Infrastructure / Deployment Notification (push) Blocked by required conditions
CI - Validation / Terraform Validation (push) Failing after 1m4s
CI - Validation / Ansible Validation (push) Successful in 1m27s
CI - Validation / Kubernetes Validation (push) Successful in 8s
CI - Validation / Security Scan (push) Successful in 17s
Update all deployment jobs to use Ubuntu 24.04 LTS template and LINSTOR storage backend for improved reliability.
2025-11-13 19:06:25 +01:00
Tellsanguis
78d3a46d31 feat(ci): Add HA support for node failures
Some checks are pending
CI - Validation / Terraform Validation (push) Waiting to run
CI - Validation / Ansible Validation (push) Waiting to run
CI - Validation / Kubernetes Validation (push) Waiting to run
CI - Validation / Security Scan (push) Waiting to run
Modified CI/CD workflows to gracefully handle Proxmox node failures:

CI Workflow (ci.yml):
- Terraform Plan only runs on main branch (faster CI on feature branches)
- Plan failures on unavailable nodes don't block validation
- Added warning message when plan fails

Deploy Workflow (deploy.yml):
- Added continue-on-error to all deploy jobs (pve1, pve2, pve3)
- Modified cluster validation to require 2/3 nodes (quorum)
- Enhanced deployment summary with success counter
- Exit codes: 0 if >=2 nodes, 1 if 1 node, 1 if 0 nodes

This ensures the infrastructure remains operational even when one
Proxmox node is down, maintaining HA principles.
2025-11-07 11:32:42 +01:00
Tellsanguis
850045e7ed feat: Initial commit 2025-11-07 09:33:38 +01:00