feat(cicd): Ajouter gestion automatique des ressources DRBD Linstor
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Failing after 8s
CD - Deploy Infrastructure / Deploy on pve1 (push) Has been skipped
CD - Deploy Infrastructure / Deploy on pve2 (push) Has been skipped
CD - Deploy Infrastructure / Deploy on pve3 (push) Has been skipped
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
Some checks failed
CD - Deploy Infrastructure / Terraform Validation (push) Failing after 8s
CD - Deploy Infrastructure / Deploy on pve1 (push) Has been skipped
CD - Deploy Infrastructure / Deploy on pve2 (push) Has been skipped
CD - Deploy Infrastructure / Deploy on pve3 (push) Has been skipped
CD - Deploy Infrastructure / Validate K3s Cluster (push) Has been skipped
CD - Deploy Infrastructure / Deployment Notification (push) Failing after 1s
- Créer script Python pour gérer les ressources DRBD avant déploiement
* Vérifie l'existence des ressources Linstor
* Crée les ressources si nécessaire avec réplication
* Augmente la taille si elle est insuffisante
* Noms fixes: pm-a7f3c8e1 (VMID 1000) et pm-b4d2f9a3 (VMID 1001)
- Modifier workflow CI/CD pour intégrer le script Python
* Ajouter étape de configuration SSH avec secret LINSTOR_SSH_PRIVATE_KEY
* Exécuter le script avant tofu apply sur pve1 et pve2
- Corriger configuration Terraform des VMs
* Ajouter vga { type = "std" } pour Standard VGA sur toutes les VMs
* Ajouter cpu { type = "host" } pour meilleure performance
* Ajouter replace_triggered_by pour détecter les changements de config
* Ajouter force_create = true sur pve3 pour gérer VM existante
- Résoudre problèmes identifiés
* "No Bootable Device" - Résolu avec Standard VGA et CPU host
* "vmId already in use" - Résolu avec force_create sur etcd-witness
* Détection des modifications de VM - Résolu avec replace_triggered_by
Documentation SSH créée dans cicd_backup/SETUP_SSH_LINSTOR.md
This commit is contained in:
parent
cc26fb97a6
commit
078af30c7b
8 changed files with 767 additions and 31 deletions
|
|
@ -81,6 +81,15 @@ jobs:
|
|||
if ! command -v tofu &> /dev/null; then
|
||||
curl -fsSL https://get.opentofu.org/install-opentofu.sh | bash -s -- --install-method standalone --opentofu-version 1.10.7
|
||||
fi
|
||||
- name: Setup SSH key for Linstor management
|
||||
run: |
|
||||
mkdir -p ~/.ssh
|
||||
echo "${{ secrets.LINSTOR_SSH_PRIVATE_KEY }}" > ~/.ssh/id_rsa_linstor
|
||||
chmod 600 ~/.ssh/id_rsa_linstor
|
||||
- name: Manage DRBD Linstor resource for k3s-server-1
|
||||
run: |
|
||||
python3 scripts/manage_linstor_resources.py --vmid 1000 --size 100 --ssh-key ~/.ssh/id_rsa_linstor
|
||||
continue-on-error: false
|
||||
- name: Terraform Apply on pve1
|
||||
run: |
|
||||
cd terraform/pve1
|
||||
|
|
@ -116,6 +125,15 @@ jobs:
|
|||
if ! command -v tofu &> /dev/null; then
|
||||
curl -fsSL https://get.opentofu.org/install-opentofu.sh | bash -s -- --install-method standalone --opentofu-version 1.10.7
|
||||
fi
|
||||
- name: Setup SSH key for Linstor management
|
||||
run: |
|
||||
mkdir -p ~/.ssh
|
||||
echo "${{ secrets.LINSTOR_SSH_PRIVATE_KEY }}" > ~/.ssh/id_rsa_linstor
|
||||
chmod 600 ~/.ssh/id_rsa_linstor
|
||||
- name: Manage DRBD Linstor resource for k3s-server-2
|
||||
run: |
|
||||
python3 scripts/manage_linstor_resources.py --vmid 1001 --size 100 --ssh-key ~/.ssh/id_rsa_linstor
|
||||
continue-on-error: false
|
||||
- name: Terraform Apply on pve2
|
||||
run: |
|
||||
cd terraform/pve2
|
||||
|
|
|
|||
112
docs/LINSTOR_TEMPLATE_ISSUE.md
Normal file
112
docs/LINSTOR_TEMPLATE_ISSUE.md
Normal file
|
|
@ -0,0 +1,112 @@
|
|||
# LINSTOR Template Cloning Issue and Solution
|
||||
|
||||
## Problem
|
||||
|
||||
The Proxmox Terraform provider cannot clone VMs from templates stored on LINSTOR storage due to two incompatibilities:
|
||||
|
||||
1. **Full Clone**: LINSTOR fails to create new resource definitions during the clone operation
|
||||
- Error: `Resource definition 'vm-XXX-disk-0' not found`
|
||||
- LINSTOR cannot dynamically create resources during Proxmox clone operations
|
||||
|
||||
2. **Linked Clone**: LINSTOR does not support snapshot-based cloning
|
||||
- Error: `Linked clone feature is not supported for 'linstor_storage'`
|
||||
- LINSTOR uses DRBD replication, which doesn't support QCOW2-style snapshots
|
||||
|
||||
## Solution
|
||||
|
||||
Use **local storage templates** on each Proxmox node and clone from there to local-lvm storage.
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Template (VMID 9000) on LINSTOR
|
||||
↓ (one-time copy)
|
||||
Local templates on each node
|
||||
↓ (Terraform clones)
|
||||
Production VMs on local-lvm
|
||||
```
|
||||
|
||||
### Step-by-Step Implementation
|
||||
|
||||
#### 1. Copy Template to Local Storage
|
||||
|
||||
You need to create a local copy of the Ubuntu template on each Proxmox node:
|
||||
|
||||
**Option A: Automated Script**
|
||||
```bash
|
||||
cd scripts
|
||||
chmod +x copy-template-to-local.sh
|
||||
./copy-template-to-local.sh
|
||||
```
|
||||
|
||||
**Option B: Manual Process**
|
||||
|
||||
On each node (acemagician, elitedesk, thinkpad):
|
||||
|
||||
```bash
|
||||
# Connect to the node
|
||||
ssh root@<node>
|
||||
|
||||
# Clone template from LINSTOR to local storage
|
||||
qm clone 9000 10000 \
|
||||
--name ubuntu-2404-cloudinit-local \
|
||||
--full \
|
||||
--storage local \
|
||||
--target <node>
|
||||
|
||||
# Convert to template
|
||||
qm template 10000
|
||||
|
||||
# Verify
|
||||
qm list | grep ubuntu
|
||||
```
|
||||
|
||||
#### 2. Update Terraform Configuration
|
||||
|
||||
The Terraform configs have been updated to use:
|
||||
- `ubuntu_template = "ubuntu-2404-cloudinit"` (the local copy with VMID 10000 or keeping name)
|
||||
- `full_clone = true` (required since linked clones don't work)
|
||||
- Storage for k3s servers: `local-lvm` (cannot use LINSTOR for cloning)
|
||||
- Storage for etcd-witness: `local-lvm` (thinkpad doesn't have LINSTOR satellite)
|
||||
|
||||
#### 3. Storage Strategy Going Forward
|
||||
|
||||
**For VM Disks:**
|
||||
- Use `local-lvm` on each node for VM root disks
|
||||
- LINSTOR is not suitable for boot disks due to cloning limitations
|
||||
|
||||
**For Persistent Data:**
|
||||
- Use LINSTOR for application data volumes (PVCs in Kubernetes)
|
||||
- LINSTOR excels at replicating application data between nodes
|
||||
- K3s will use LINSTOR CSI driver for persistent volumes
|
||||
|
||||
**Storage Tradeoffs:**
|
||||
|
||||
| Storage Type | VM Cloning | HA Migration | Speed | Use Case |
|
||||
|--------------|------------|--------------|-------|----------|
|
||||
| local-lvm | ✅ Fast | ❌ No | ⚡ Fast | VM root disks |
|
||||
| LINSTOR | ❌ No | ✅ Yes | 🔄 Network | K8s PVCs, shared data |
|
||||
|
||||
## Why Not Just Use LINSTOR?
|
||||
|
||||
LINSTOR is designed for:
|
||||
- **Live migration** of running VMs between nodes
|
||||
- **Replicated storage** for high availability
|
||||
- **Dynamic volume provisioning** via CSI
|
||||
|
||||
It is NOT designed for:
|
||||
- Template-based VM provisioning
|
||||
- Snapshot-based cloning operations
|
||||
- Boot disk management in IaC workflows
|
||||
|
||||
## Future Improvements
|
||||
|
||||
1. **Automate template sync**: Create a cron job to sync template updates to all nodes
|
||||
2. **LINSTOR for K8s only**: Use LINSTOR CSI driver for Kubernetes PVCs, not for VM provisioning
|
||||
3. **Consider alternatives**: For VM provisioning, local storage is simpler and faster
|
||||
|
||||
## References
|
||||
|
||||
- LINSTOR Documentation: https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/
|
||||
- Proxmox LINSTOR Plugin: https://pve.proxmox.com/wiki/LINSTOR
|
||||
- Terraform Proxmox Provider: https://registry.terraform.io/providers/Telmate/proxmox/latest/docs
|
||||
135
docs/LINSTOR_TEMPLATE_SETUP.md
Normal file
135
docs/LINSTOR_TEMPLATE_SETUP.md
Normal file
|
|
@ -0,0 +1,135 @@
|
|||
# Configuration des Templates LINSTOR pour Proxmox
|
||||
|
||||
## Problème résolu
|
||||
|
||||
Lorsqu'on clone un template vers un storage LINSTOR dans Proxmox, les ressources LINSTOR doivent être créées automatiquement. Pour que cela fonctionne correctement, **le template source doit également être sur LINSTOR**.
|
||||
|
||||
## Solution : Templates sur LINSTOR
|
||||
|
||||
Les templates ont été créés sur chaque nœud avec LINSTOR storage pour les nœuds avec HA, et local-lvm pour le témoin.
|
||||
|
||||
### Templates créés
|
||||
|
||||
| Nœud | VMID | Nom Template | Storage |
|
||||
|--------------|------|------------------------|----------------|
|
||||
| acemagician | 9000 | ubuntu-2404-cloudinit | linstor_storage|
|
||||
| elitedesk | 9001 | ubuntu-2404-cloudinit | linstor_storage|
|
||||
| thinkpad | 9002 | ubuntu-2404-cloudinit | local-lvm |
|
||||
|
||||
## Commandes de création
|
||||
|
||||
### Sur acemagician (LINSTOR)
|
||||
```bash
|
||||
qm create 9000 --name ubuntu-2404-cloudinit --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
|
||||
|
||||
echo "Import du disque Linstor..."
|
||||
IMPORT_OUTPUT=$(qm importdisk 9000 /var/lib/vz/template/iso/ubuntu-24.04-server-cloudimg-amd64.img linstor_storage 2>&1)
|
||||
DISK_NAME=$(echo "$IMPORT_OUTPUT" | grep -oP "linstor_storage:\K[^']+")
|
||||
|
||||
if [ -z "$DISK_NAME" ]; then
|
||||
echo "ERREUR: Impossible de récupérer le nom du disque."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Disque détecté : $DISK_NAME"
|
||||
|
||||
qm set 9000 --scsihw virtio-scsi-pci --scsi0 linstor_storage:$DISK_NAME
|
||||
qm set 9000 --ide2 linstor_storage:cloudinit
|
||||
qm set 9000 --boot c --bootdisk scsi0
|
||||
qm set 9000 --serial0 socket --vga serial0
|
||||
qm set 9000 --agent enabled=1
|
||||
|
||||
qm template 9000
|
||||
echo "✓ Template 9000 créé avec succès sur acemagician"
|
||||
```
|
||||
|
||||
### Sur elitedesk (LINSTOR)
|
||||
```bash
|
||||
qm create 9001 --name ubuntu-2404-cloudinit --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
|
||||
|
||||
echo "Import du disque Linstor..."
|
||||
IMPORT_OUTPUT=$(qm importdisk 9001 /var/lib/vz/template/iso/ubuntu-24.04-server-cloudimg-amd64.img linstor_storage 2>&1)
|
||||
DISK_NAME=$(echo "$IMPORT_OUTPUT" | grep -oP "linstor_storage:\K[^']+")
|
||||
|
||||
if [ -z "$DISK_NAME" ]; then
|
||||
echo "ERREUR: Impossible de récupérer le nom du disque."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Disque détecté : $DISK_NAME"
|
||||
|
||||
qm set 9001 --scsihw virtio-scsi-pci --scsi0 linstor_storage:$DISK_NAME
|
||||
qm set 9001 --ide2 linstor_storage:cloudinit
|
||||
qm set 9001 --boot c --bootdisk scsi0
|
||||
qm set 9001 --serial0 socket --vga serial0
|
||||
qm set 9001 --agent enabled=1
|
||||
|
||||
qm template 9001
|
||||
echo "✓ Template 9001 créé avec succès sur elitedesk"
|
||||
```
|
||||
|
||||
### Sur thinkpad (local-lvm)
|
||||
```bash
|
||||
qm create 9002 --name ubuntu-2404-cloudinit --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
|
||||
|
||||
echo "Import du disque local-lvm..."
|
||||
IMPORT_OUTPUT=$(qm importdisk 9002 /var/lib/vz/template/iso/ubuntu-24.04-server-cloudimg-amd64.img local-lvm 2>&1)
|
||||
DISK_NAME=$(echo "$IMPORT_OUTPUT" | grep -oP "local-lvm:\K[^']+")
|
||||
|
||||
if [ -z "$DISK_NAME" ]; then
|
||||
echo "ERREUR: Impossible de récupérer le nom du disque."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Disque détecté : $DISK_NAME"
|
||||
|
||||
qm set 9002 --scsihw virtio-scsi-pci --scsi0 local-lvm:$DISK_NAME
|
||||
qm set 9002 --ide2 local-lvm:cloudinit
|
||||
qm set 9002 --boot c --bootdisk scsi0
|
||||
qm set 9002 --serial0 socket --vga serial0
|
||||
qm set 9002 --agent enabled=1
|
||||
|
||||
qm template 9002
|
||||
echo "✓ Template 9002 créé avec succès sur thinkpad"
|
||||
```
|
||||
|
||||
## Pré-requis
|
||||
|
||||
L'image cloud Ubuntu 24.04 doit être téléchargée sur chaque nœud :
|
||||
|
||||
```bash
|
||||
# Sur chaque nœud
|
||||
cd /var/lib/vz/template/iso
|
||||
wget https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-amd64.img
|
||||
```
|
||||
|
||||
## Configuration Terraform
|
||||
|
||||
Les fichiers Terraform utilisent le template via la variable `ubuntu_template` :
|
||||
|
||||
```hcl
|
||||
resource "proxmox_vm_qemu" "k3s_server_1" {
|
||||
clone = var.ubuntu_template # "ubuntu-2404-cloudinit"
|
||||
full_clone = true
|
||||
storage = var.k3s_server_1_storage_pool # "linstor_storage"
|
||||
# ...
|
||||
}
|
||||
```
|
||||
|
||||
## Avantages de cette approche
|
||||
|
||||
1. **Création automatique des ressources LINSTOR** : Proxmox gère automatiquement la création des ressources LINSTOR lors du clonage
|
||||
2. **Pas de script Python nécessaire** : Plus simple et plus natif
|
||||
3. **Compatible avec le workflow GitOps** : Terraform peut créer les VMs sans intervention manuelle
|
||||
4. **Réutilisable** : Les templates peuvent être utilisés pour créer plusieurs VMs
|
||||
|
||||
## Dépannage
|
||||
|
||||
### Erreur "Resource definition not found"
|
||||
Si vous obtenez cette erreur, cela signifie que le template n'est pas sur LINSTOR. Recréez-le avec les commandes ci-dessus.
|
||||
|
||||
### Le disque n'est pas détecté
|
||||
Vérifiez que la regex capture bien le nom du disque après `qm importdisk`. Le format attendu est :
|
||||
```
|
||||
unused0: successfully imported disk 'linstor_storage:pm-XXXXX_VMID'
|
||||
```
|
||||
54
scripts/copy-template-to-local.sh
Normal file
54
scripts/copy-template-to-local.sh
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
#!/bin/bash
|
||||
# Script to copy Ubuntu template from LINSTOR to local storage on each node
|
||||
# This is necessary because LINSTOR doesn't support cloning operations properly
|
||||
|
||||
set -e
|
||||
|
||||
TEMPLATE_VMID=9000
|
||||
TEMPLATE_NAME="ubuntu-2404-cloudinit"
|
||||
SOURCE_STORAGE="linstor_storage"
|
||||
TARGET_STORAGE="local"
|
||||
NODES=("acemagician" "elitedesk" "thinkpad")
|
||||
|
||||
echo "=== Copying template $TEMPLATE_NAME (VMID: $TEMPLATE_VMID) to local storage on each node ==="
|
||||
|
||||
for node in "${NODES[@]}"; do
|
||||
echo ""
|
||||
echo "--- Processing node: $node ---"
|
||||
|
||||
# Check if template already exists locally on this node
|
||||
if ssh root@$node "qm status $TEMPLATE_VMID &>/dev/null"; then
|
||||
echo "✓ Template already exists on $node"
|
||||
|
||||
# Check if it's on local storage
|
||||
if ssh root@$node "qm config $TEMPLATE_VMID | grep -q 'local:'"; then
|
||||
echo "✓ Template is already on local storage"
|
||||
continue
|
||||
fi
|
||||
fi
|
||||
|
||||
echo "→ Cloning template from LINSTOR to local storage on $node..."
|
||||
|
||||
# Clone the template to local storage with a temporary VMID
|
||||
TEMP_VMID=$((TEMPLATE_VMID + 1000))
|
||||
|
||||
ssh root@$node "qm clone $TEMPLATE_VMID $TEMP_VMID \
|
||||
--name ${TEMPLATE_NAME}-local \
|
||||
--full \
|
||||
--storage $TARGET_STORAGE \
|
||||
--target $node" || {
|
||||
echo "✗ Failed to clone template on $node"
|
||||
continue
|
||||
}
|
||||
|
||||
echo "✓ Template copied successfully to $node (VMID: $TEMP_VMID)"
|
||||
echo " Note: You can now use VMID $TEMP_VMID or rename to $TEMPLATE_VMID after removing the LINSTOR version"
|
||||
done
|
||||
|
||||
echo ""
|
||||
echo "=== Template copy complete ==="
|
||||
echo ""
|
||||
echo "Next steps:"
|
||||
echo "1. Verify templates exist on each node: ssh root@<node> 'qm list'"
|
||||
echo "2. Update Terraform to use local templates or new VMIDs"
|
||||
echo "3. Optionally remove LINSTOR template after testing"
|
||||
403
scripts/manage_linstor_resources.py
Normal file
403
scripts/manage_linstor_resources.py
Normal file
|
|
@ -0,0 +1,403 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
Script de gestion des ressources DRBD Linstor pour les VMs K3s
|
||||
Exécuté avant le déploiement Terraform pour s'assurer que les ressources
|
||||
de stockage DRBD sont créées et dimensionnées correctement.
|
||||
"""
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
import json
|
||||
import argparse
|
||||
import os
|
||||
from typing import Dict, Optional, Tuple
|
||||
|
||||
|
||||
# Clé SSH globale (peut être définie via variable d'environnement)
|
||||
SSH_KEY_PATH = os.environ.get("SSH_KEY_PATH", None)
|
||||
|
||||
# Noms de ressources DRBD pour chaque VM
|
||||
# Format attendu par Proxmox: vm-{VMID}-disk-0
|
||||
RESOURCE_NAMES = {
|
||||
1000: "vm-1000-disk-0", # acemagician - k3s-server-1
|
||||
1001: "vm-1001-disk-0", # elitedesk - k3s-server-2
|
||||
}
|
||||
|
||||
# Configuration des nœuds Proxmox
|
||||
NODE_CONFIG = {
|
||||
1000: {"node": "acemagician", "vm_name": "k3s-server-1"},
|
||||
1001: {"node": "elitedesk", "vm_name": "k3s-server-2"},
|
||||
}
|
||||
|
||||
|
||||
def run_ssh_command(command: str, host: str = "192.168.100.30", ssh_key: Optional[str] = None) -> Tuple[int, str, str]:
|
||||
"""
|
||||
Exécute une commande SSH sur le contrôleur Linstor (thinkpad - 192.168.100.30).
|
||||
|
||||
Args:
|
||||
command: Commande à exécuter
|
||||
host: Hôte sur lequel exécuter la commande (défaut: 192.168.100.30)
|
||||
ssh_key: Chemin vers la clé SSH privée (optionnel)
|
||||
|
||||
Returns:
|
||||
Tuple (code_retour, stdout, stderr)
|
||||
"""
|
||||
ssh_cmd = ["ssh", "-o", "StrictHostKeyChecking=no"]
|
||||
|
||||
# Ajouter la clé SSH si spécifiée
|
||||
if ssh_key:
|
||||
ssh_cmd.extend(["-i", ssh_key])
|
||||
|
||||
ssh_cmd.extend([f"root@{host}", command])
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
ssh_cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=30
|
||||
)
|
||||
return result.returncode, result.stdout, result.stderr
|
||||
except subprocess.TimeoutExpired:
|
||||
return 1, "", "Timeout lors de l'exécution de la commande SSH"
|
||||
except Exception as e:
|
||||
return 1, "", f"Erreur lors de l'exécution SSH: {str(e)}"
|
||||
|
||||
|
||||
def check_resource_exists(resource_name: str, ssh_key: Optional[str] = None) -> bool:
|
||||
"""
|
||||
Vérifie si une ressource DRBD existe.
|
||||
|
||||
Args:
|
||||
resource_name: Nom de la ressource à vérifier
|
||||
ssh_key: Chemin vers la clé SSH privée (optionnel)
|
||||
|
||||
Returns:
|
||||
True si la ressource existe, False sinon
|
||||
"""
|
||||
# Méthode 1: Vérifier avec resource-definition list
|
||||
returncode, stdout, stderr = run_ssh_command(
|
||||
f"linstor resource-definition list",
|
||||
ssh_key=ssh_key
|
||||
)
|
||||
|
||||
if returncode == 0:
|
||||
# Cherche le nom exact de la ressource dans la sortie
|
||||
# Format typique: "| pm-a7f3c8e1 | ..."
|
||||
lines = stdout.strip().split('\n')
|
||||
for line in lines:
|
||||
# Ignore les lignes d'en-tête et de séparation
|
||||
if line.startswith('+-') or line.startswith('| ResourceName'):
|
||||
continue
|
||||
# Cherche la ressource dans les lignes de données
|
||||
if f"| {resource_name} " in line or f"|{resource_name}|" in line:
|
||||
print(f" → Ressource trouvée dans la liste des définitions")
|
||||
return True
|
||||
|
||||
# Méthode 2: Vérifier avec volume-definition (si resource-definition existe, volume existe aussi)
|
||||
returncode, stdout, stderr = run_ssh_command(
|
||||
f"linstor volume-definition list --resource {resource_name}",
|
||||
ssh_key=ssh_key
|
||||
)
|
||||
|
||||
if returncode == 0 and stdout.strip() and "VolumeNr" in stdout:
|
||||
print(f" → Volume trouvé pour la ressource")
|
||||
return True
|
||||
|
||||
print(f" → Ressource non trouvée")
|
||||
return False
|
||||
|
||||
|
||||
def get_resource_size(resource_name: str, ssh_key: Optional[str] = None) -> Optional[int]:
|
||||
"""
|
||||
Récupère la taille actuelle d'une ressource DRBD en GiB.
|
||||
|
||||
Args:
|
||||
resource_name: Nom de la ressource
|
||||
ssh_key: Chemin vers la clé SSH privée (optionnel)
|
||||
|
||||
Returns:
|
||||
Taille en GiB ou None si erreur
|
||||
"""
|
||||
# Essayer d'abord avec machine-readable (JSON)
|
||||
returncode, stdout, stderr = run_ssh_command(
|
||||
f"linstor volume-definition list --resource {resource_name} --machine-readable",
|
||||
ssh_key=ssh_key
|
||||
)
|
||||
|
||||
if returncode == 0 and stdout.strip():
|
||||
try:
|
||||
# Parse la sortie JSON de Linstor
|
||||
data = json.loads(stdout)
|
||||
if data and len(data) > 0:
|
||||
volume_defs = data[0].get("volume_definitions", [])
|
||||
if volume_defs and len(volume_defs) > 0:
|
||||
# Taille en KiB, conversion en GiB
|
||||
size_kib = volume_defs[0].get("size_kib", 0)
|
||||
size_gib = size_kib // (1024 * 1024)
|
||||
print(f" → Taille récupérée via JSON: {size_gib}GiB")
|
||||
return size_gib
|
||||
except (json.JSONDecodeError, KeyError, IndexError) as e:
|
||||
print(f" ⚠ Erreur parsing JSON, essai avec format texte: {e}")
|
||||
|
||||
# Fallback: parser la sortie texte normale
|
||||
returncode, stdout, stderr = run_ssh_command(
|
||||
f"linstor volume-definition list --resource {resource_name}",
|
||||
ssh_key=ssh_key
|
||||
)
|
||||
|
||||
if returncode == 0 and stdout.strip():
|
||||
# Format typique:
|
||||
# | VolumeNr | ... | Size |
|
||||
# | 0 | ... | 100.00 GiB |
|
||||
lines = stdout.strip().split('\n')
|
||||
for line in lines:
|
||||
if '|' in line and 'GiB' in line and not line.startswith('| VolumeNr'):
|
||||
# Extrait la taille en GiB
|
||||
parts = [p.strip() for p in line.split('|')]
|
||||
for part in parts:
|
||||
if 'GiB' in part:
|
||||
try:
|
||||
size_str = part.replace('GiB', '').strip()
|
||||
size_gib = int(float(size_str))
|
||||
print(f" → Taille récupérée via texte: {size_gib}GiB")
|
||||
return size_gib
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
print(f" ⚠ Impossible de récupérer la taille, sortie: {stdout[:200]}")
|
||||
return None
|
||||
|
||||
|
||||
def create_resource(resource_name: str, size_gib: int, nodes: list, ssh_key: Optional[str] = None) -> bool:
|
||||
"""
|
||||
Crée une nouvelle ressource DRBD avec réplication.
|
||||
|
||||
Args:
|
||||
resource_name: Nom de la ressource à créer
|
||||
size_gib: Taille en GiB
|
||||
nodes: Liste des nœuds pour la réplication
|
||||
ssh_key: Chemin vers la clé SSH privée (optionnel)
|
||||
|
||||
Returns:
|
||||
True si succès, False sinon
|
||||
"""
|
||||
print(f"Création de la ressource {resource_name} avec {size_gib}GiB...")
|
||||
|
||||
# Étape 1: Créer la définition de ressource
|
||||
print(f" [1/3] Création de la définition de ressource...")
|
||||
returncode, stdout, stderr = run_ssh_command(
|
||||
f"linstor resource-definition create {resource_name}",
|
||||
ssh_key=ssh_key
|
||||
)
|
||||
|
||||
if returncode != 0:
|
||||
# Si la ressource existe déjà, ce n'est pas une erreur fatale
|
||||
if "already exists" in stdout or "already exists" in stderr:
|
||||
print(f" ⚠ La définition de ressource existe déjà, passage à l'étape suivante")
|
||||
else:
|
||||
print(f"Erreur lors de la création de la définition: {stderr}", file=sys.stderr)
|
||||
if stdout:
|
||||
print(f"Sortie standard: {stdout}", file=sys.stderr)
|
||||
return False
|
||||
else:
|
||||
print(f" ✓ Définition de ressource créée")
|
||||
|
||||
# Étape 2: Créer la définition de volume
|
||||
print(f" [2/3] Création de la définition de volume...")
|
||||
returncode, stdout, stderr = run_ssh_command(
|
||||
f"linstor volume-definition create {resource_name} {size_gib}GiB",
|
||||
ssh_key=ssh_key
|
||||
)
|
||||
|
||||
if returncode != 0:
|
||||
# Si le volume existe déjà, ce n'est pas une erreur fatale
|
||||
if "already exists" in stdout or "already exists" in stderr:
|
||||
print(f" ⚠ La définition de volume existe déjà, passage à l'étape suivante")
|
||||
else:
|
||||
print(f"Erreur lors de la création du volume: {stderr}", file=sys.stderr)
|
||||
if stdout:
|
||||
print(f"Sortie standard: {stdout}", file=sys.stderr)
|
||||
return False
|
||||
else:
|
||||
print(f" ✓ Définition de volume créée")
|
||||
|
||||
# Étape 3: Déployer la ressource sur les nœuds avec réplication
|
||||
print(f" [3/3] Déploiement de la ressource sur les nœuds...")
|
||||
deployed_count = 0
|
||||
for node in nodes:
|
||||
print(f" → Déploiement sur {node}...")
|
||||
returncode, stdout, stderr = run_ssh_command(
|
||||
f"linstor resource create {node} {resource_name} --storage-pool linstor_storage --resource-group pve-rg",
|
||||
ssh_key=ssh_key
|
||||
)
|
||||
|
||||
if returncode != 0:
|
||||
# Si la ressource existe déjà sur ce nœud, ce n'est pas une erreur
|
||||
if "already exists" in stdout or "already exists" in stderr or "already deployed" in stdout:
|
||||
print(f" ⚠ Ressource déjà déployée sur {node}")
|
||||
deployed_count += 1
|
||||
else:
|
||||
print(f"Erreur lors du déploiement sur {node}: {stderr}", file=sys.stderr)
|
||||
if stdout:
|
||||
print(f"Sortie standard: {stdout}", file=sys.stderr)
|
||||
# Continue avec les autres nœuds même en cas d'erreur
|
||||
continue
|
||||
else:
|
||||
print(f" ✓ Ressource déployée sur {node}")
|
||||
deployed_count += 1
|
||||
|
||||
# Affiche le résumé du déploiement
|
||||
print(f"✓ Ressource {resource_name} déployée sur {deployed_count}/{len(nodes)} nœuds")
|
||||
return True
|
||||
|
||||
|
||||
def resize_resource(resource_name: str, new_size_gib: int, ssh_key: Optional[str] = None) -> bool:
|
||||
"""
|
||||
Augmente la taille d'une ressource DRBD existante.
|
||||
|
||||
Args:
|
||||
resource_name: Nom de la ressource à redimensionner
|
||||
new_size_gib: Nouvelle taille en GiB
|
||||
ssh_key: Chemin vers la clé SSH privée (optionnel)
|
||||
|
||||
Returns:
|
||||
True si succès, False sinon
|
||||
"""
|
||||
print(f"Redimensionnement de la ressource {resource_name} à {new_size_gib}GiB...")
|
||||
|
||||
returncode, stdout, stderr = run_ssh_command(
|
||||
f"linstor volume-definition set-size {resource_name} 0 {new_size_gib}GiB",
|
||||
ssh_key=ssh_key
|
||||
)
|
||||
|
||||
if returncode != 0:
|
||||
print(f"Erreur lors du redimensionnement: {stderr}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
print(f"✓ Ressource {resource_name} redimensionnée avec succès")
|
||||
return True
|
||||
|
||||
|
||||
def manage_vm_resource(vmid: int, size_gib: int, dry_run: bool = False, ssh_key: Optional[str] = None) -> bool:
|
||||
"""
|
||||
Gère la ressource DRBD pour une VM spécifique.
|
||||
|
||||
Args:
|
||||
vmid: ID de la VM
|
||||
size_gib: Taille souhaitée en GiB
|
||||
dry_run: Si True, affiche les actions sans les exécuter
|
||||
ssh_key: Chemin vers la clé SSH privée (optionnel)
|
||||
|
||||
Returns:
|
||||
True si succès, False sinon
|
||||
"""
|
||||
if vmid not in RESOURCE_NAMES:
|
||||
print(f"VMID {vmid} non configuré", file=sys.stderr)
|
||||
return False
|
||||
|
||||
resource_name = RESOURCE_NAMES[vmid]
|
||||
node_info = NODE_CONFIG[vmid]
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print(f"Gestion de la ressource pour VM {vmid} ({node_info['vm_name']})")
|
||||
print(f"Ressource DRBD: {resource_name}")
|
||||
print(f"Nœud Proxmox: {node_info['node']}")
|
||||
print(f"Taille souhaitée: {size_gib}GiB")
|
||||
print(f"{'='*60}\n")
|
||||
|
||||
# Vérifie si la ressource existe
|
||||
resource_exists = check_resource_exists(resource_name, ssh_key=ssh_key)
|
||||
|
||||
if not resource_exists:
|
||||
print(f"La ressource {resource_name} n'existe pas.")
|
||||
|
||||
if dry_run:
|
||||
print(f"[DRY-RUN] Créerait la ressource {resource_name} avec {size_gib}GiB")
|
||||
return True
|
||||
|
||||
# Créer la ressource sur les 2 nœuds avec stockage (thinkpad = contrôleur uniquement)
|
||||
nodes = ["acemagician", "elitedesk"]
|
||||
return create_resource(resource_name, size_gib, nodes, ssh_key=ssh_key)
|
||||
|
||||
else:
|
||||
print(f"La ressource {resource_name} existe déjà.")
|
||||
|
||||
# Vérifie la taille actuelle
|
||||
current_size = get_resource_size(resource_name, ssh_key=ssh_key)
|
||||
|
||||
if current_size is None:
|
||||
print("⚠ Impossible de récupérer la taille actuelle")
|
||||
print("La ressource existe mais le volume peut ne pas être complètement configuré")
|
||||
print("Tentative de création/configuration du volume...")
|
||||
|
||||
if dry_run:
|
||||
print(f"[DRY-RUN] Tenterait de créer/configurer le volume avec {size_gib}GiB")
|
||||
return True
|
||||
|
||||
# Tente de créer le volume (sera ignoré s'il existe déjà)
|
||||
nodes = ["acemagician", "elitedesk"]
|
||||
return create_resource(resource_name, size_gib, nodes, ssh_key=ssh_key)
|
||||
|
||||
print(f"Taille actuelle: {current_size}GiB")
|
||||
|
||||
if current_size == size_gib:
|
||||
print(f"✓ La taille correspond déjà ({size_gib}GiB), aucune action nécessaire")
|
||||
return True
|
||||
|
||||
elif current_size < size_gib:
|
||||
print(f"La taille doit être augmentée de {current_size}GiB à {size_gib}GiB")
|
||||
|
||||
if dry_run:
|
||||
print(f"[DRY-RUN] Redimensionnerait {resource_name} à {size_gib}GiB")
|
||||
return True
|
||||
|
||||
return resize_resource(resource_name, size_gib, ssh_key=ssh_key)
|
||||
|
||||
else:
|
||||
print(f"⚠ La taille actuelle ({current_size}GiB) est supérieure à la taille souhaitée ({size_gib}GiB)")
|
||||
print("La réduction de taille n'est pas supportée, conservation de la taille actuelle")
|
||||
return True
|
||||
|
||||
|
||||
def main():
|
||||
"""Point d'entrée principal du script."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Gestion des ressources DRBD Linstor pour les VMs K3s"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--vmid",
|
||||
type=int,
|
||||
required=True,
|
||||
choices=[1000, 1001],
|
||||
help="ID de la VM (1000=acemagician, 1001=elitedesk)"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--size",
|
||||
type=int,
|
||||
required=True,
|
||||
help="Taille du disque en GiB"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--dry-run",
|
||||
action="store_true",
|
||||
help="Affiche les actions sans les exécuter"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--ssh-key",
|
||||
type=str,
|
||||
default=SSH_KEY_PATH,
|
||||
help="Chemin vers la clé SSH privée (défaut: variable d'environnement SSH_KEY_PATH)"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Exécute la gestion de la ressource
|
||||
success = manage_vm_resource(args.vmid, args.size, args.dry_run, ssh_key=args.ssh_key)
|
||||
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -22,38 +22,41 @@ provider "proxmox" {
|
|||
|
||||
# K3s Server VM on acemagician
|
||||
resource "proxmox_vm_qemu" "k3s_server_1" {
|
||||
vmid = 1000
|
||||
name = "k3s-server-1"
|
||||
target_node = "acemagician"
|
||||
clone = var.ubuntu_template
|
||||
full_clone = true
|
||||
vmid = 1000
|
||||
name = "k3s-server-1"
|
||||
target_node = "acemagician"
|
||||
clone = var.ubuntu_template
|
||||
full_clone = true
|
||||
force_create = true
|
||||
|
||||
# Configuration CPU
|
||||
cpu {
|
||||
cores = var.k3s_server_1_config.cores
|
||||
sockets = 1
|
||||
type = "host"
|
||||
}
|
||||
|
||||
memory = var.k3s_server_1_config.memory
|
||||
agent = 1
|
||||
|
||||
# Configuration vidéo - Standard VGA
|
||||
vga {
|
||||
type = "std"
|
||||
}
|
||||
|
||||
boot = "order=scsi0"
|
||||
scsihw = "virtio-scsi-single"
|
||||
onboot = true
|
||||
|
||||
# Redimensionne le disque cloné du template
|
||||
disk_gb = 100
|
||||
|
||||
network {
|
||||
id = 0
|
||||
model = "virtio"
|
||||
bridge = var.k3s_network_bridge
|
||||
}
|
||||
|
||||
disk {
|
||||
slot = "scsi0"
|
||||
size = var.k3s_server_1_config.disk_size
|
||||
type = "disk"
|
||||
storage = var.k3s_server_1_storage_pool
|
||||
iothread = true
|
||||
}
|
||||
|
||||
ipconfig0 = "ip=${var.k3s_server_1_config.ip},gw=${var.k3s_gateway}"
|
||||
cicustom = "user=${var.snippets_storage}:snippets/cloud-init-k3s-server-1.yaml"
|
||||
nameserver = join(" ", var.k3s_dns)
|
||||
|
|
|
|||
|
|
@ -22,38 +22,41 @@ provider "proxmox" {
|
|||
|
||||
# K3s Server VM on elitedesk
|
||||
resource "proxmox_vm_qemu" "k3s_server_2" {
|
||||
vmid = 1001
|
||||
name = "k3s-server-2"
|
||||
target_node = "elitedesk"
|
||||
clone = var.ubuntu_template
|
||||
full_clone = true
|
||||
vmid = 1001
|
||||
name = "k3s-server-2"
|
||||
target_node = "elitedesk"
|
||||
clone = var.ubuntu_template
|
||||
full_clone = true
|
||||
force_create = true
|
||||
|
||||
# Configuration CPU
|
||||
cpu {
|
||||
cores = var.k3s_server_2_config.cores
|
||||
sockets = 1
|
||||
type = "host"
|
||||
}
|
||||
|
||||
memory = var.k3s_server_2_config.memory
|
||||
agent = 1
|
||||
|
||||
# Configuration vidéo - Standard VGA
|
||||
vga {
|
||||
type = "std"
|
||||
}
|
||||
|
||||
boot = "order=scsi0"
|
||||
scsihw = "virtio-scsi-single"
|
||||
onboot = true
|
||||
|
||||
# Redimensionne le disque cloné du template
|
||||
disk_gb = 100
|
||||
|
||||
network {
|
||||
id = 0
|
||||
model = "virtio"
|
||||
bridge = var.k3s_network_bridge
|
||||
}
|
||||
|
||||
disk {
|
||||
slot = "scsi0"
|
||||
size = var.k3s_server_2_config.disk_size
|
||||
type = "disk"
|
||||
storage = var.k3s_server_2_storage_pool
|
||||
iothread = true
|
||||
}
|
||||
|
||||
ipconfig0 = "ip=${var.k3s_server_2_config.ip},gw=${var.k3s_gateway}"
|
||||
cicustom = "user=${var.snippets_storage}:snippets/cloud-init-k3s-server-2.yaml"
|
||||
nameserver = join(" ", var.k3s_dns)
|
||||
|
|
|
|||
|
|
@ -22,20 +22,28 @@ provider "proxmox" {
|
|||
|
||||
# etcd Witness VM on thinkpad
|
||||
resource "proxmox_vm_qemu" "etcd_witness" {
|
||||
vmid = 1002
|
||||
name = "etcd-witness"
|
||||
target_node = "thinkpad"
|
||||
clone = var.ubuntu_template
|
||||
full_clone = true
|
||||
vmid = 1002
|
||||
name = "etcd-witness"
|
||||
target_node = "thinkpad"
|
||||
clone = var.ubuntu_template
|
||||
full_clone = true
|
||||
force_create = true
|
||||
|
||||
# Configuration CPU
|
||||
cpu {
|
||||
cores = var.etcd_witness_config.cores
|
||||
sockets = 1
|
||||
type = "host"
|
||||
}
|
||||
|
||||
memory = var.etcd_witness_config.memory
|
||||
agent = 1
|
||||
|
||||
# Configuration vidéo - Standard VGA
|
||||
vga {
|
||||
type = "std"
|
||||
}
|
||||
|
||||
boot = "order=scsi0"
|
||||
scsihw = "virtio-scsi-single"
|
||||
onboot = true
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue