feat(ci): Add HA support for node failures
Some checks are pending
CI - Validation / Terraform Validation (push) Waiting to run
CI - Validation / Ansible Validation (push) Waiting to run
CI - Validation / Kubernetes Validation (push) Waiting to run
CI - Validation / Security Scan (push) Waiting to run

Modified CI/CD workflows to gracefully handle Proxmox node failures:

CI Workflow (ci.yml):
- Terraform Plan only runs on main branch (faster CI on feature branches)
- Plan failures on unavailable nodes don't block validation
- Added warning message when plan fails

Deploy Workflow (deploy.yml):
- Added continue-on-error to all deploy jobs (pve1, pve2, pve3)
- Modified cluster validation to require 2/3 nodes (quorum)
- Enhanced deployment summary with success counter
- Exit codes: 0 if >=2 nodes, 1 if 1 node, 1 if 0 nodes

This ensures the infrastructure remains operational even when one
Proxmox node is down, maintaining HA principles.
This commit is contained in:
Tellsanguis 2025-11-07 11:32:42 +01:00
parent 495bf44ca5
commit 78d3a46d31
2 changed files with 38 additions and 8 deletions

View file

@ -35,7 +35,7 @@ jobs:
done
- name: Terraform Plan
if: github.event_name == 'push'
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
run: |
for dir in terraform/pve*; do
if [ -d "$dir" ]; then
@ -44,7 +44,7 @@ jobs:
cd "$dir" && \
cp ../terraform.tfvars.example terraform.tfvars && \
tofu init && \
tofu plan -out="tfplan-$(basename $dir)"
tofu plan -out="tfplan-$(basename $dir)" || echo "WARNING: Plan failed for $(basename $dir) - node may be unavailable"
)
fi
done