Homelab/docs/LINSTOR_TEMPLATE_ISSUE.md

113 lines
3.5 KiB
Markdown
Raw Normal View History

# LINSTOR Template Cloning Issue and Solution
## Problem
The Proxmox Terraform provider cannot clone VMs from templates stored on LINSTOR storage due to two incompatibilities:
1. **Full Clone**: LINSTOR fails to create new resource definitions during the clone operation
- Error: `Resource definition 'vm-XXX-disk-0' not found`
- LINSTOR cannot dynamically create resources during Proxmox clone operations
2. **Linked Clone**: LINSTOR does not support snapshot-based cloning
- Error: `Linked clone feature is not supported for 'linstor_storage'`
- LINSTOR uses DRBD replication, which doesn't support QCOW2-style snapshots
## Solution
Use **local storage templates** on each Proxmox node and clone from there to local-lvm storage.
### Architecture
```
Template (VMID 9000) on LINSTOR
↓ (one-time copy)
Local templates on each node
↓ (Terraform clones)
Production VMs on local-lvm
```
### Step-by-Step Implementation
#### 1. Copy Template to Local Storage
You need to create a local copy of the Ubuntu template on each Proxmox node:
**Option A: Automated Script**
```bash
cd scripts
chmod +x copy-template-to-local.sh
./copy-template-to-local.sh
```
**Option B: Manual Process**
On each node (acemagician, elitedesk, thinkpad):
```bash
# Connect to the node
ssh root@<node>
# Clone template from LINSTOR to local storage
qm clone 9000 10000 \
--name ubuntu-2404-cloudinit-local \
--full \
--storage local \
--target <node>
# Convert to template
qm template 10000
# Verify
qm list | grep ubuntu
```
#### 2. Update Terraform Configuration
The Terraform configs have been updated to use:
- `ubuntu_template = "ubuntu-2404-cloudinit"` (the local copy with VMID 10000 or keeping name)
- `full_clone = true` (required since linked clones don't work)
- Storage for k3s servers: `local-lvm` (cannot use LINSTOR for cloning)
- Storage for etcd-witness: `local-lvm` (thinkpad doesn't have LINSTOR satellite)
#### 3. Storage Strategy Going Forward
**For VM Disks:**
- Use `local-lvm` on each node for VM root disks
- LINSTOR is not suitable for boot disks due to cloning limitations
**For Persistent Data:**
- Use LINSTOR for application data volumes (PVCs in Kubernetes)
- LINSTOR excels at replicating application data between nodes
- K3s will use LINSTOR CSI driver for persistent volumes
**Storage Tradeoffs:**
| Storage Type | VM Cloning | HA Migration | Speed | Use Case |
|--------------|------------|--------------|-------|----------|
| local-lvm | ✅ Fast | ❌ No | ⚡ Fast | VM root disks |
| LINSTOR | ❌ No | ✅ Yes | 🔄 Network | K8s PVCs, shared data |
## Why Not Just Use LINSTOR?
LINSTOR is designed for:
- **Live migration** of running VMs between nodes
- **Replicated storage** for high availability
- **Dynamic volume provisioning** via CSI
It is NOT designed for:
- Template-based VM provisioning
- Snapshot-based cloning operations
- Boot disk management in IaC workflows
## Future Improvements
1. **Automate template sync**: Create a cron job to sync template updates to all nodes
2. **LINSTOR for K8s only**: Use LINSTOR CSI driver for Kubernetes PVCs, not for VM provisioning
3. **Consider alternatives**: For VM provisioning, local storage is simpler and faster
## References
- LINSTOR Documentation: https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/
- Proxmox LINSTOR Plugin: https://pve.proxmox.com/wiki/LINSTOR
- Terraform Proxmox Provider: https://registry.terraform.io/providers/Telmate/proxmox/latest/docs