-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PV nodeAffinity matchExpressions problem with array items and in
operator since 1.27.0
#123465
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig storage |
In a similar vein, we have noticed the same issue in K8s v1.27 when the |
/sig scheduling |
This is because the non-existent node before 119779, get node will return error: kubernetes/pkg/scheduler/schedule_one.go Lines 434 to 444 in bd2cb72
after 119779 it's fixed kubernetes/pkg/scheduler/schedule_one.go Lines 485 to 497 in da6be3b
since version 1.30, it can run normally @sanposhiho PTAL, need to cherry-pick for the previous version? |
Looks like my PR #119779 accidentally fixed this issue. 😅 380c7f2 is the trigger of this issue, introduced in v1.27; volumebinding plugin just returns node names in PreFilterResult based on nodeAffinity in PV, and thus PreFilterResult could contain node names actually doesn't exist. @kubernetes/sig-scheduling-leads One Line Summary: If nodeaffinity in PV has non-existing node name in Questions:
|
If it's a regression, we should consider cherry-pick. 🤔 Can you explain why your PR fixes the problem? |
Before my PR, we take NodeInfo of nodes in preFilterResult via And, after my PR, we are not using |
The pitfall is that when PreFilterResult was introduced, scheduler directly fetch the nodeNames returned by PreFilter() in its node snapshot, so technically speaking, when a plugin (in-tree or out-of-tree) return non-existent/illegal nodes, the pod scheduling flow will abort immediately. And the PR #109877 happened to stomp on this pitfall. |
@Huang-Wei do you mean that you would like to see a smaller version of #119779 for cherry-pick? Note that it is only about 10 lines of actual change. The rest is comments and tests. Since this is already released code, I would suggest cherry-picking as-is. Also, worth testing E2E whether the issue reproduces in v1.30. @dsgnr do you think you can give it a try? |
yup, i'm in favor of picking the minimum fix, along with a minimum test #124539: for _, n := range allNodes {
if !preRes.NodeNames.Has(n.Node().Name) {
// We consider Nodes that are filtered out by PreFilterResult as rejected via UnschedulableAndUnresolvable.
// We have to record them in NodeToStatusMap so that they won't be considered as candidates in the preemption.
diagnosis.NodeToStatusMap[n.Node().Name] = framework.NewStatus(framework.UnschedulableAndUnresolvable, "node is filtered out by the prefilter result")
continue
}
nodes = append(nodes, n) b/c in addition to the "fix" part, #119779 actually introduced a slight behavior change. @sanposhiho do you think the above ^^ minimum fix + #124539 makes sense? |
#124539 can guard it. |
@Huang-Wei @alculquicondor @sanposhiho Could you let me try it?,picking the minimum fix and add e2e test |
@Huang-Wei That small fix should work. |
Let's move ahead with the small fix, since we all agree on it. |
/reopen Cherry-pick PR(s) aren't done |
@sanposhiho: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
FYI on a related crash #124930 |
Are there any news on the backporting the solution to older versions (v1.27+)? |
This is already backported and released. But the fix for the crash reported in #124930 was not released yet. Let's close this issue for now, as it is completed. |
/close |
@alculquicondor: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The issue here stems from the observation by Mike Sarfaty, above that volume-binding's prefilter assumes that it can equate the matchConstraint hostname labels with node names. This doesn't have to be the case. See #125336 here. Prefiltering will always fail in this situation. AFAICT this issue persists with the current head of the master branch. |
What happened?
Since v1.27.0, the
nodeAffinity
forPersistentVolumes
appear to have added extra validation on whether the values exist, despite using thein
operator. This works as expected on 1.26.14, and no longer works on 1.27.0 onwards (tested up to 1.29.2).I am unsure whether this is an intentional breaking change in 1.27.0 or something wrong in our manifests and we've just so happened to get away with it for a few years. The only thing that stands out to me in the changelog for 1.27.0 is some changes to the schedulers
Filter
plugin but this may well be unrelated.What did you expect to happen?
When using the
in
operator, I'd expect the expression to only apply to nodes where the match is truey.How can we reproduce it (as minimally and precisely as possible)?
Note, using Kind for easy reproduction
v1.26.14 - working
creating a kind cluster
Since
12614-test-worker
exists in the hostname values of this cluster, the pod schedules as expected.12614-prod-worker
node does not exist in this environment, and so is ignored.v1.27.0 - not working
creating a kind cluster
Same expectations as 1.26.14, but in this case, the pod stays stuck in pending state due to the following;
I have tested this all the way up to 1.29.2.
Anything else we need to know?
No response
Kubernetes version
v1.27.0 through to v1.29.2
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: