Skip to content

[IMPROVEMENT] Assign the pods to the same node where the strict-local volume is present #5448

Closed
@volford-bence

Description

Is your improvement request related to a feature? Please describe (👍 if you like this request)

In the Longhorn 1.4, the strict-local Data locality feature was introduced, which looks promising, mainly for distributed databases, like mongodb or elasticsearch.
I've started experimenting with it, and I found a problem.
Deploying a StatefulSet is a success, but when you redeploy it, the pods won't be scheduled to the same nodes, hence the Pods cannot be started, because the one replica is not present on that node.

Describe the solution you'd like

I would like to add possibility to control the pod scheduling by the Longhorn in case the volume is a strict-local one, so the same pod will be scheduled to the same node where the one replica is present. This is useful for StatefulSets (and for all other types), and I don't know how this can be achieved at the moment.

Describe alternatives you've considered

Because the strict-local volume causes problem when the StatefulSet is redeployed, I have to stay with the original longhorn StorageClass with disabled dataLocality.

Additional context

I tried to test it with mongodb, which is a StatefulSet with 3 replicas in a Cluster with 5 nodes with RKE2 (v1.24.10+rke2r1). I've created a new StorageClass, which is the same as the default longhorn StorageClass, but changed it to strict-local and 1 replica.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-local
parameters:
  dataLocality: strict-local
  fromBackup: ''
  fsType: ext4
  numberOfReplicas: '1'
  staleReplicaTimeout: '30'
provisioner: driver.longhorn.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

I deployed the StatefulSet, all the Pods are running. Then I removed the StatefulSet, but kept the volumes. To make sure the pods won't be scheduled to the same node by default, I created a second StatefulSet with 3 replica. After that, I redeployed the first StatefulSet, which initially deployed to a different node, and the first pod cannot be started.

An easier reproduction would be this:

  • Create a deployment with strict-local volume, but set the nodeName to a more 'busy' Node.
  • After the Pod is running, redeploy the Deployment, but without the nodeName is set. At this moment, the default kubernetes scheduling will be executed, and the pod will be scheduled to the less 'busy' Node in default. But the pod won't be started, because the one replica is not present on that Node.

Metadata

Type

No type

Projects

  • Status

    Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions