[IMPROVEMENT] labeling nodes hosting replicas to enable podAffinity #9741
Replies: 4 comments 31 replies
-
Have a look here https://longhorn.io/docs/1.7.2/advanced-resources/deploy/node-selector/ |
Beta Was this translation helpful? Give feedback.
-
cc @ChanYiLin |
Beta Was this translation helpful? Give feedback.
-
@dberardo-com
The concept here is to
We can only help on the first one, |
Beta Was this translation helpful? Give feedback.
-
I see ... so there is no pod in longhorn-system which is specific to a volume, meaning if i have 200 volumes there would be 200 pods of that specific kind. so no, we can't use such kind of pod as a target of pod affinity then. Does perhaps longhorn annotate CRDs or change CRD statuses when replicas are scheduled ? I am not sure if this could be one hint in the direction of a custom solution.
well in general yes, but i am being very pragmatic here, since i know the volume is scheduled by longhorn then i know i can benefit from data-locality if the replica exists on the same node as the pod ... and i want to make use of this fact. Anyway, i thank you for your explanation, i just wanted to make sure that my use case was not possible to achieve with the current features of longhorn and you confirmed it. I understand there is the need of custom solution to achieve what i am looking for, but i am not sure which part of k8s i should be addressing for it. My initial bet would be using the descheduler (https://github.com/kubernetes-sigs/descheduler) or perhaps a custom controller. But since i want this solution to be generic and not related to any CRD perhaps controllers aren't the way to go. Is there any other k8s component i can make use of? is perhaps the admission hook something to look into ? any suggestion here? this contribution by one github user goes in this direction: #5486 (reply in thread) |
Beta Was this translation helpful? Give feedback.
-
i am using a volume with 2 replicas on a 3 node cluster, with best-effort data locality startegy and read write once policy.
there is one pod using the volume, which needs fast read and write to disk so that's why the "best-effort" locality.
this pod is currently "free-to-move" among all cluster's node, but because the underlying volume has a lot of data, it would be desirable the have the pod constrained to just the 2 nodes hosting the volume replicas, otherwise if the pod starts on a node that does not have a replica, a huge data transfer process needs to start.
i thought of using podAffinity on the Pod definition for this, but how can i target the nodes hosting the replicas, is there any label i can use?
Beta Was this translation helpful? Give feedback.
All reactions