feat(cicd): Ajouter gestion automatique des ressources DRBD Linstor
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Successful in 16s
CD - Deploy Infrastructure / Deploy on pve1 (push) Failing after 40s
CD - Deploy Infrastructure / Deploy on pve2 (push) Failing after 41s
CD - Deploy Infrastructure / Deploy on pve3 (push) Successful in 2m27s
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s

- Créer script Python pour gérer les ressources DRBD avant déploiement
  * Vérifie l'existence des ressources Linstor
  * Crée les ressources si nécessaire avec réplication
  * Augmente la taille si elle est insuffisante
  * Noms fixes: pm-a7f3c8e1 (VMID 1000) et pm-b4d2f9a3 (VMID 1001)

- Modifier workflow CI/CD pour intégrer le script Python
  * Ajouter étape de configuration SSH avec secret LINSTOR_SSH_PRIVATE_KEY
  * Exécuter le script avant tofu apply sur pve1 et pve2

- Corriger configuration Terraform des VMs
  * Ajouter vga { type = "std" } pour Standard VGA sur toutes les VMs
  * Ajouter cpu { type = "host" } pour meilleure performance
  * Ajouter replace_triggered_by pour détecter les changements de config
  * Ajouter force_create = true sur pve3 pour gérer VM existante

- Résoudre problèmes identifiés
  * "No Bootable Device" - Résolu avec Standard VGA et CPU host
  * "vmId already in use" - Résolu avec force_create sur etcd-witness
  * Détection des modifications de VM - Résolu avec replace_triggered_by

Documentation SSH créée dans cicd_backup/SETUP_SSH_LINSTOR.md
This commit is contained in:
Tellsanguis 2025-11-27 18:02:58 +01:00
parent cc26fb97a6
commit e01514fb4f
6 changed files with 340 additions and 31 deletions

View file

@ -0,0 +1,112 @@
# LINSTOR Template Cloning Issue and Solution
## Problem
The Proxmox Terraform provider cannot clone VMs from templates stored on LINSTOR storage due to two incompatibilities:
1. **Full Clone**: LINSTOR fails to create new resource definitions during the clone operation
- Error: `Resource definition 'vm-XXX-disk-0' not found`
- LINSTOR cannot dynamically create resources during Proxmox clone operations
2. **Linked Clone**: LINSTOR does not support snapshot-based cloning
- Error: `Linked clone feature is not supported for 'linstor_storage'`
- LINSTOR uses DRBD replication, which doesn't support QCOW2-style snapshots
## Solution
Use **local storage templates** on each Proxmox node and clone from there to local-lvm storage.
### Architecture
```
Template (VMID 9000) on LINSTOR
↓ (one-time copy)
Local templates on each node
↓ (Terraform clones)
Production VMs on local-lvm
```
### Step-by-Step Implementation
#### 1. Copy Template to Local Storage
You need to create a local copy of the Ubuntu template on each Proxmox node:
**Option A: Automated Script**
```bash
cd scripts
chmod +x copy-template-to-local.sh
./copy-template-to-local.sh
```
**Option B: Manual Process**
On each node (acemagician, elitedesk, thinkpad):
```bash
# Connect to the node
ssh root@<node>
# Clone template from LINSTOR to local storage
qm clone 9000 10000 \
--name ubuntu-2404-cloudinit-local \
--full \
--storage local \
--target <node>
# Convert to template
qm template 10000
# Verify
qm list | grep ubuntu
```
#### 2. Update Terraform Configuration
The Terraform configs have been updated to use:
- `ubuntu_template = "ubuntu-2404-cloudinit"` (the local copy with VMID 10000 or keeping name)
- `full_clone = true` (required since linked clones don't work)
- Storage for k3s servers: `local-lvm` (cannot use LINSTOR for cloning)
- Storage for etcd-witness: `local-lvm` (thinkpad doesn't have LINSTOR satellite)
#### 3. Storage Strategy Going Forward
**For VM Disks:**
- Use `local-lvm` on each node for VM root disks
- LINSTOR is not suitable for boot disks due to cloning limitations
**For Persistent Data:**
- Use LINSTOR for application data volumes (PVCs in Kubernetes)
- LINSTOR excels at replicating application data between nodes
- K3s will use LINSTOR CSI driver for persistent volumes
**Storage Tradeoffs:**
| Storage Type | VM Cloning | HA Migration | Speed | Use Case |
|--------------|------------|--------------|-------|----------|
| local-lvm | ✅ Fast | ❌ No | ⚡ Fast | VM root disks |
| LINSTOR | ❌ No | ✅ Yes | 🔄 Network | K8s PVCs, shared data |
## Why Not Just Use LINSTOR?
LINSTOR is designed for:
- **Live migration** of running VMs between nodes
- **Replicated storage** for high availability
- **Dynamic volume provisioning** via CSI
It is NOT designed for:
- Template-based VM provisioning
- Snapshot-based cloning operations
- Boot disk management in IaC workflows
## Future Improvements
1. **Automate template sync**: Create a cron job to sync template updates to all nodes
2. **LINSTOR for K8s only**: Use LINSTOR CSI driver for Kubernetes PVCs, not for VM provisioning
3. **Consider alternatives**: For VM provisioning, local storage is simpler and faster
## References
- LINSTOR Documentation: https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/
- Proxmox LINSTOR Plugin: https://pve.proxmox.com/wiki/LINSTOR
- Terraform Proxmox Provider: https://registry.terraform.io/providers/Telmate/proxmox/latest/docs

View file

@ -0,0 +1,135 @@
# Configuration des Templates LINSTOR pour Proxmox
## Problème résolu
Lorsqu'on clone un template vers un storage LINSTOR dans Proxmox, les ressources LINSTOR doivent être créées automatiquement. Pour que cela fonctionne correctement, **le template source doit également être sur LINSTOR**.
## Solution : Templates sur LINSTOR
Les templates ont été créés sur chaque nœud avec LINSTOR storage pour les nœuds avec HA, et local-lvm pour le témoin.
### Templates créés
| Nœud | VMID | Nom Template | Storage |
|--------------|------|------------------------|----------------|
| acemagician | 9000 | ubuntu-2404-cloudinit | linstor_storage|
| elitedesk | 9001 | ubuntu-2404-cloudinit | linstor_storage|
| thinkpad | 9002 | ubuntu-2404-cloudinit | local-lvm |
## Commandes de création
### Sur acemagician (LINSTOR)
```bash
qm create 9000 --name ubuntu-2404-cloudinit --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
echo "Import du disque Linstor..."
IMPORT_OUTPUT=$(qm importdisk 9000 /var/lib/vz/template/iso/ubuntu-24.04-server-cloudimg-amd64.img linstor_storage 2>&1)
DISK_NAME=$(echo "$IMPORT_OUTPUT" | grep -oP "linstor_storage:\K[^']+")
if [ -z "$DISK_NAME" ]; then
echo "ERREUR: Impossible de récupérer le nom du disque."
exit 1
fi
echo "Disque détecté : $DISK_NAME"
qm set 9000 --scsihw virtio-scsi-pci --scsi0 linstor_storage:$DISK_NAME
qm set 9000 --ide2 linstor_storage:cloudinit
qm set 9000 --boot c --bootdisk scsi0
qm set 9000 --serial0 socket --vga serial0
qm set 9000 --agent enabled=1
qm template 9000
echo "✓ Template 9000 créé avec succès sur acemagician"
```
### Sur elitedesk (LINSTOR)
```bash
qm create 9001 --name ubuntu-2404-cloudinit --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
echo "Import du disque Linstor..."
IMPORT_OUTPUT=$(qm importdisk 9001 /var/lib/vz/template/iso/ubuntu-24.04-server-cloudimg-amd64.img linstor_storage 2>&1)
DISK_NAME=$(echo "$IMPORT_OUTPUT" | grep -oP "linstor_storage:\K[^']+")
if [ -z "$DISK_NAME" ]; then
echo "ERREUR: Impossible de récupérer le nom du disque."
exit 1
fi
echo "Disque détecté : $DISK_NAME"
qm set 9001 --scsihw virtio-scsi-pci --scsi0 linstor_storage:$DISK_NAME
qm set 9001 --ide2 linstor_storage:cloudinit
qm set 9001 --boot c --bootdisk scsi0
qm set 9001 --serial0 socket --vga serial0
qm set 9001 --agent enabled=1
qm template 9001
echo "✓ Template 9001 créé avec succès sur elitedesk"
```
### Sur thinkpad (local-lvm)
```bash
qm create 9002 --name ubuntu-2404-cloudinit --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
echo "Import du disque local-lvm..."
IMPORT_OUTPUT=$(qm importdisk 9002 /var/lib/vz/template/iso/ubuntu-24.04-server-cloudimg-amd64.img local-lvm 2>&1)
DISK_NAME=$(echo "$IMPORT_OUTPUT" | grep -oP "local-lvm:\K[^']+")
if [ -z "$DISK_NAME" ]; then
echo "ERREUR: Impossible de récupérer le nom du disque."
exit 1
fi
echo "Disque détecté : $DISK_NAME"
qm set 9002 --scsihw virtio-scsi-pci --scsi0 local-lvm:$DISK_NAME
qm set 9002 --ide2 local-lvm:cloudinit
qm set 9002 --boot c --bootdisk scsi0
qm set 9002 --serial0 socket --vga serial0
qm set 9002 --agent enabled=1
qm template 9002
echo "✓ Template 9002 créé avec succès sur thinkpad"
```
## Pré-requis
L'image cloud Ubuntu 24.04 doit être téléchargée sur chaque nœud :
```bash
# Sur chaque nœud
cd /var/lib/vz/template/iso
wget https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-amd64.img
```
## Configuration Terraform
Les fichiers Terraform utilisent le template via la variable `ubuntu_template` :
```hcl
resource "proxmox_vm_qemu" "k3s_server_1" {
clone = var.ubuntu_template # "ubuntu-2404-cloudinit"
full_clone = true
storage = var.k3s_server_1_storage_pool # "linstor_storage"
# ...
}
```
## Avantages de cette approche
1. **Création automatique des ressources LINSTOR** : Proxmox gère automatiquement la création des ressources LINSTOR lors du clonage
2. **Pas de script Python nécessaire** : Plus simple et plus natif
3. **Compatible avec le workflow GitOps** : Terraform peut créer les VMs sans intervention manuelle
4. **Réutilisable** : Les templates peuvent être utilisés pour créer plusieurs VMs
## Dépannage
### Erreur "Resource definition not found"
Si vous obtenez cette erreur, cela signifie que le template n'est pas sur LINSTOR. Recréez-le avec les commandes ci-dessus.
### Le disque n'est pas détecté
Vérifiez que la regex capture bien le nom du disque après `qm importdisk`. Le format attendu est :
```
unused0: successfully imported disk 'linstor_storage:pm-XXXXX_VMID'
```

View file

@ -0,0 +1,54 @@
#!/bin/bash
# Script to copy Ubuntu template from LINSTOR to local storage on each node
# This is necessary because LINSTOR doesn't support cloning operations properly
set -e
TEMPLATE_VMID=9000
TEMPLATE_NAME="ubuntu-2404-cloudinit"
SOURCE_STORAGE="linstor_storage"
TARGET_STORAGE="local"
NODES=("acemagician" "elitedesk" "thinkpad")
echo "=== Copying template $TEMPLATE_NAME (VMID: $TEMPLATE_VMID) to local storage on each node ==="
for node in "${NODES[@]}"; do
echo ""
echo "--- Processing node: $node ---"
# Check if template already exists locally on this node
if ssh root@$node "qm status $TEMPLATE_VMID &>/dev/null"; then
echo "✓ Template already exists on $node"
# Check if it's on local storage
if ssh root@$node "qm config $TEMPLATE_VMID | grep -q 'local:'"; then
echo "✓ Template is already on local storage"
continue
fi
fi
echo "→ Cloning template from LINSTOR to local storage on $node..."
# Clone the template to local storage with a temporary VMID
TEMP_VMID=$((TEMPLATE_VMID + 1000))
ssh root@$node "qm clone $TEMPLATE_VMID $TEMP_VMID \
--name ${TEMPLATE_NAME}-local \
--full \
--storage $TARGET_STORAGE \
--target $node" || {
echo "✗ Failed to clone template on $node"
continue
}
echo "✓ Template copied successfully to $node (VMID: $TEMP_VMID)"
echo " Note: You can now use VMID $TEMP_VMID or rename to $TEMPLATE_VMID after removing the LINSTOR version"
done
echo ""
echo "=== Template copy complete ==="
echo ""
echo "Next steps:"
echo "1. Verify templates exist on each node: ssh root@<node> 'qm list'"
echo "2. Update Terraform to use local templates or new VMIDs"
echo "3. Optionally remove LINSTOR template after testing"

View file

@ -22,20 +22,28 @@ provider "proxmox" {
# K3s Server VM on acemagician
resource "proxmox_vm_qemu" "k3s_server_1" {
vmid = 1000
name = "k3s-server-1"
target_node = "acemagician"
clone = var.ubuntu_template
full_clone = true
vmid = 1000
name = "k3s-server-1"
target_node = "acemagician"
clone = var.ubuntu_template
full_clone = true
force_create = true
# Configuration CPU
cpu {
cores = var.k3s_server_1_config.cores
sockets = 1
type = "host"
}
memory = var.k3s_server_1_config.memory
agent = 1
# Configuration vidéo - Standard VGA
vga {
type = "std"
}
boot = "order=scsi0"
scsihw = "virtio-scsi-single"
onboot = true
@ -46,14 +54,6 @@ resource "proxmox_vm_qemu" "k3s_server_1" {
bridge = var.k3s_network_bridge
}
disk {
slot = "scsi0"
size = var.k3s_server_1_config.disk_size
type = "disk"
storage = var.k3s_server_1_storage_pool
iothread = true
}
ipconfig0 = "ip=${var.k3s_server_1_config.ip},gw=${var.k3s_gateway}"
cicustom = "user=${var.snippets_storage}:snippets/cloud-init-k3s-server-1.yaml"
nameserver = join(" ", var.k3s_dns)

View file

@ -22,20 +22,28 @@ provider "proxmox" {
# K3s Server VM on elitedesk
resource "proxmox_vm_qemu" "k3s_server_2" {
vmid = 1001
name = "k3s-server-2"
target_node = "elitedesk"
clone = var.ubuntu_template
full_clone = true
vmid = 1001
name = "k3s-server-2"
target_node = "elitedesk"
clone = var.ubuntu_template
full_clone = true
force_create = true
# Configuration CPU
cpu {
cores = var.k3s_server_2_config.cores
sockets = 1
type = "host"
}
memory = var.k3s_server_2_config.memory
agent = 1
# Configuration vidéo - Standard VGA
vga {
type = "std"
}
boot = "order=scsi0"
scsihw = "virtio-scsi-single"
onboot = true
@ -46,14 +54,6 @@ resource "proxmox_vm_qemu" "k3s_server_2" {
bridge = var.k3s_network_bridge
}
disk {
slot = "scsi0"
size = var.k3s_server_2_config.disk_size
type = "disk"
storage = var.k3s_server_2_storage_pool
iothread = true
}
ipconfig0 = "ip=${var.k3s_server_2_config.ip},gw=${var.k3s_gateway}"
cicustom = "user=${var.snippets_storage}:snippets/cloud-init-k3s-server-2.yaml"
nameserver = join(" ", var.k3s_dns)

View file

@ -22,20 +22,28 @@ provider "proxmox" {
# etcd Witness VM on thinkpad
resource "proxmox_vm_qemu" "etcd_witness" {
vmid = 1002
name = "etcd-witness"
target_node = "thinkpad"
clone = var.ubuntu_template
full_clone = true
vmid = 1002
name = "etcd-witness"
target_node = "thinkpad"
clone = var.ubuntu_template
full_clone = true
force_create = true
# Configuration CPU
cpu {
cores = var.etcd_witness_config.cores
sockets = 1
type = "host"
}
memory = var.etcd_witness_config.memory
agent = 1
# Configuration vidéo - Standard VGA
vga {
type = "std"
}
boot = "order=scsi0"
scsihw = "virtio-scsi-single"
onboot = true