Modified CI/CD workflows to gracefully handle Proxmox node failures:
CI Workflow (ci.yml):
- Terraform Plan only runs on main branch (faster CI on feature branches)
- Plan failures on unavailable nodes don't block validation
- Added warning message when plan fails
Deploy Workflow (deploy.yml):
- Added continue-on-error to all deploy jobs (pve1, pve2, pve3)
- Modified cluster validation to require 2/3 nodes (quorum)
- Enhanced deployment summary with success counter
- Exit codes: 0 if >=2 nodes, 1 if 1 node, 1 if 0 nodes
This ensures the infrastructure remains operational even when one
Proxmox node is down, maintaining HA principles.
Added VM configuration objects for all three nodes:
- k3s_server_1_config: 6 cores, 12GB RAM, 100G disk
- k3s_server_2_config: 6 cores, 12GB RAM, 100G disk
- etcd_witness_config: 2 cores, 2GB RAM, 20G disk
Removed undeclared 'management_bridge' variable that was causing
warnings in terraform plan.
This allows terraform plan to execute successfully in CI with
the example configuration file.
The v3.0 provider is not yet available as a stable release in the
OpenTofu/Terraform registry. Downgraded to v2.9 which is the latest
stable version.
Also fixed minor yamllint issues in flux.yml:
- Added space after comment marker
- Removed trailing blank line
Fixed yamllint errors and warnings across all Ansible files:
- Reformatted long lines to stay within 80 character limit
- Standardized boolean values to use true/false instead of yes/no
- Fixed YAML folding syntax for multiline strings
- Removed erroneous triple quotes in k3s-server tasks
This resolves all yamllint issues reported by the CI pipeline.