Skip to content

[BUG] Persistent Volumes are created on nodes that are disabled/unschedulable #7727

Open
@MisguidedEmails

Description

Describe the bug

3/6 of my nodes are marked as disabled (in Longhorn). I create a 3-replica statefulset with a volumeClaimTemplate, pointing to a storageClass of dataLocality: strict-local. Chances are that at least one of the volumes, and therefore pod, will be scheduled on a disabled node. This occurs regardless if volumeBindingMode is set to Immediate, or WaitForFirstConsumer.

My assumption here is that since the kube-scheduler isn't aware of the underlying storage, it'll create the pods wherever it thinks is best. Though, I don't believe Longhorn should be accepting/creating volumes on disabled nodes. I would presume it can choose where to create the volume, and the pod will follow (at least with Immediate).

There are no antiaffinity, topology, or anything interfering here. I disabled one node per zone, and the volumes are still being scheduled with the zones in mind, though it just ignores the disabled status of the longhorn node.

To Reproduce

  • Have a majority of nodes disabled/unschedulable to increase the odds of them being selected
  • Create a storageClass with dataLocality set to strict-local (I presume the others are fine, since if a replica is scheduled on a bad node, it'll just move it somewhere else)
  • Create a statefulSet with replicas set to the amount of the remainder nodes, and with a volumeClaimTemplate for the previously created storageClass
  • Wait for a pod/volume to be scheduled on an unschedulable node

I can provide more details if required, though it is pretty easy for me to recreate. If it is the kube-scheduler playing into this, you could probably influence this placement with requests/limits/soft affinities/etc to coerce the pods and reproduce this

Expected behavior

Volumes will refuse to schedule on disabled/unschedulable nodes. Pods relying on those volumes will therefore be able to start, and not be stuck in Pending.

Support bundle for troubleshooting

can provide logs if that would be useful.

Environment

  • Longhorn version: v1.5.3
  • Impacted volume (PV):
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm chart - deployed with kluctl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.28.4
    • Number of control plane nodes in the cluster:
    • Number of worker nodes in the cluster: 6
  • Node config
    • OS type and version:
    • Kernel version: 6.1.69
    • CPU per node:
    • Memory per node:
    • Disk type (e.g. SSD/NVMe/HDD):
    • Network bandwidth between the nodes (Gbps):
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
  • Number of Longhorn volumes in the cluster:

Additional context

Mentioned it in this discussion (with some other somewhat related problems) - #7717

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    • Status

      In Progress
    • Status

      New Issues

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions