[BUG] Persistent Volumes are created on nodes that are disabled/unschedulable #7727
Description
Describe the bug
3/6 of my nodes are marked as disabled (in Longhorn). I create a 3-replica statefulset with a volumeClaimTemplate
, pointing to a storageClass
of dataLocality: strict-local
. Chances are that at least one of the volumes, and therefore pod, will be scheduled on a disabled node. This occurs regardless if volumeBindingMode
is set to Immediate
, or WaitForFirstConsumer
.
My assumption here is that since the kube-scheduler isn't aware of the underlying storage, it'll create the pods wherever it thinks is best. Though, I don't believe Longhorn should be accepting/creating volumes on disabled nodes. I would presume it can choose where to create the volume, and the pod will follow (at least with Immediate
).
There are no antiaffinity, topology, or anything interfering here. I disabled one node per zone, and the volumes are still being scheduled with the zones in mind, though it just ignores the disabled status of the longhorn node.
To Reproduce
- Have a majority of nodes disabled/unschedulable to increase the odds of them being selected
- Create a
storageClass
withdataLocality
set tostrict-local
(I presume the others are fine, since if a replica is scheduled on a bad node, it'll just move it somewhere else) - Create a
statefulSet
with replicas set to the amount of the remainder nodes, and with avolumeClaimTemplate
for the previously createdstorageClass
- Wait for a pod/volume to be scheduled on an unschedulable node
I can provide more details if required, though it is pretty easy for me to recreate. If it is the kube-scheduler playing into this, you could probably influence this placement with requests/limits/soft affinities/etc to coerce the pods and reproduce this
Expected behavior
Volumes will refuse to schedule on disabled/unschedulable nodes. Pods relying on those volumes will therefore be able to start, and not be stuck in Pending
.
Support bundle for troubleshooting
can provide logs if that would be useful.
Environment
- Longhorn version:
v1.5.3
- Impacted volume (PV):
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm chart - deployed with
kluctl
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
v1.28.4
- Number of control plane nodes in the cluster:
- Number of worker nodes in the cluster:
6
- Node config
- OS type and version:
- Kernel version:
6.1.69
- CPU per node:
- Memory per node:
- Disk type (e.g. SSD/NVMe/HDD):
- Network bandwidth between the nodes (Gbps):
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:
Additional context
Mentioned it in this discussion (with some other somewhat related problems) - #7717
Metadata
Assignees
Labels
Type
Projects
Status
In Progress
Status
New Issues