[scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable #125197

gabesaba · 2024-05-29T16:19:06Z

What type of PR is this?

/kind bug
/kind regression
/kind api-change

What this PR does / why we need it:

#119779 fixed a bug, but caused a performance regression #124709 observed at 5k nodes. Performance fix #124714 was merged, with modest improvement in performance. We still observe reduced throughput when running a test (15k nodes, 60k daemonset pods)

baseline (pre #119779): ~470 pods/s
current (with #124714): ~70 pods/s
more perf engineering: ~300 pods/s
this change: ~460 pods/s

This fix attempts to bring us back to baseline performance. We revert #124714, and part of #119779. We implement option 2 proposed here. While there are two unaddressed O(n) operations (1, 2), these haven't revealed themselves as performance problems in the wild. To keep this diff as small as possible for cherry-pick, we will defer the fix of those to a future minor version. This future change will require a breaking change to the NodeToStatusMap type, to allow better than O(n), or at least really fast O(n), representation of many nodes with the same status.

pair @mskrocki
/assign @alculquicondor, @liggitt, @Huang-Wei
/sig scheduling

Which issue(s) this PR fixes:

Fixes #124709

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixes a scheduling performance regression when many nodes exist and prefilter returns 1-2 nodes (e.g. daemonset scheduling)

ACTION REQUIRED: For developers of out-of-tree PostFilter plugins, note that the semantics of NodeToStatusMap are changing: A node with an absent value in the NodeToStatusMap should be interpreted as having an UnschedulableAndUnresolvable status

This reverts commit 9fcd791.

k8s-ci-robot · 2024-05-29T16:19:15Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2024-05-29T16:19:16Z

Hi @gabesaba. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

gabesaba · 2024-05-29T16:20:14Z

/assign @alculquicondor
/assign @liggitt
/assign @Huang-Wei

liggitt · 2024-05-29T16:25:31Z

/ok-to-test

alculquicondor · 2024-05-29T17:22:24Z

cc @sanposhiho @chengjoey

alculquicondor · 2024-05-29T17:25:08Z

/release-note-edit

Improved scheduling performance when many nodes, and prefilter returns 1-2 nodes (e.g. daemonset)

ACTION REQUIRED: For developers of out-of-tree PostFilter plugins, note that the semantics of NodeToStatusMap are changing: A node with an absent value in the NodeToStatusMap should be interpreted as having an UnschedulableAndUnresolvable status

k8s-ci-robot · 2024-05-29T17:25:10Z

@alculquicondor: /release-note-edit must be used with a release note block.

In response to this:

/release-note-edit

Improved scheduling performance when many nodes, and prefilter returns 1-2 nodes (e.g. daemonset)

ACTION REQUIRED: For developers of out-of-tree PostFilter plugins, note that the semantics of NodeToStatusMap are changing: A node with an absent value in the NodeToStatusMap should be interpreted as having an UnschedulableAndUnresolvable status

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

alculquicondor · 2024-05-29T17:25:48Z

/release-note-edit

Improved scheduling performance when many nodes, and prefilter returns 1-2 nodes (e.g. daemonset)

ACTION REQUIRED: For developers of out-of-tree PostFilter plugins, note that the semantics of NodeToStatusMap are changing: A node with an absent value in the NodeToStatusMap should be interpreted as having an UnschedulableAndUnresolvable status

liggitt · 2024-05-29T18:12:56Z

looks like it needs hack/update-gofmt.sh run:

diff ./pkg/scheduler/schedule_one_test.go.orig ./pkg/scheduler/schedule_one_test.go
--- ./pkg/scheduler/schedule_one_test.go.orig
+++ ./pkg/scheduler/schedule_one_test.go
@@ -2452,8 +2452,7 @@
 				Pod:         st.MakePod().Name("test-prefilter").UID("test-prefilter").Obj(),
 				NumAllNodes: 2,
 				Diagnosis: framework.Diagnosis{
-					NodeToStatusMap: framework.NodeToStatusMap{
-					},
+					NodeToStatusMap: framework.NodeToStatusMap{},
 				},
 			},
 		},

k8s-triage-robot · 2024-05-29T18:23:37Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

AxeZhan · 2024-05-31T05:06:10Z

pkg/scheduler/schedule_one.go

-
-	diagnosis := framework.Diagnosis{
-		NodeToStatusMap: make(framework.NodeToStatusMap, len(allNodes)),
+		return nil, diagnosis, err
 	}
 	// Run "prefilter" plugins.
 	preRes, s := fwk.RunPreFilterPlugins(ctx, state, pod)


I can't comment on an unchanged line.
So based on our implementation, in L460, we should only set status for all nodes if the status returned by fwk.RunPreFilterPlugins(ctx, state, pod) has Unschedulable code, right? Which can only happen in a scheduler with specific out-of-tree plugins.

In fact, I think we can only list allnodes and update diagnosis.NodeToStatusMap when runprefilter returns an Unschedulable status.

Yes, this sounds right to me. However, I omited it from this change to keep the diff small, since updating the tests produces a 30 line (+7, -23) diff. Then, the intention is to clean it up in a PR which won't be cherry-picked.

Do you prefer I include it in this change, or do you think the original plan makes sense?

Hmmm, as we're going to cherry-pick this to recent releases. I agree that this should be as simple as possible(Provided it can bring back our previous performance).
I think we can leave a comment here, and leave it for a follow up.

AxeZhan · 2024-05-31T06:04:14Z

pkg/scheduler/framework/preemption/preemption.go

+				nodeStatuses[nodeName] = framework.NewStatus(framework.UnschedulableAndUnresolvable, "Preemption is not helpful for scheduling")
+				continue
+			}
+			potentialNodes = append(potentialNodes, node)


Forgive me if this is a dumb question:
Since we only add the node in framework.NodeToStatusMap to potentialNodes.
Why don't we iterate framework.NodeToStatusMap directly instead of iterating allNodes and check if the node is in framework.NodeToStatusMap ?

On the contrary, your comment is absolutely right :)

If we decide to not fill in the map (depending on resolution of #125197 (comment)), I will implement this

I'm ok filling up the map for this cherry-pickable patch, but we should follow this suggestion for 1.31.

So, in this PR, we are still filling up the map to minimize changes and facilitate cherry-picking.
However, in version 1.31, we will change the behavior so that we no longer fill up the map with UnschedulableAndUnresolvable status, am I understanding correctly??

Correct, the NodeToStatusMap returned by nodesWherePreemptionMayHelp. UnschedulableAndUnresolvable status is only used for the error message (#nodes, and #"Preemption is not helpful for scheduling"). I think we can pipe this information in a more efficient way

Right, sgtm.

gabesaba · 2024-06-03T12:10:41Z

/retest

alculquicondor

/lgtm
/approve

Please prepare cherry-picks for all supported versions

k8s-ci-robot · 2024-06-03T12:18:50Z

LGTM label has been added.

Git tree hash: 539b029ec9b38a8a86a8a5e323e17bee4b648401

k8s-ci-robot · 2024-06-03T12:19:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, gabesaba

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [alculquicondor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…5197-upstream-release-1.30 Cherry pick of #125197: [scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable

Cherry pick of #125197: [scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable

Revert "scheduler: preallocation for NodeToStatusMap"

7ea3bf4

This reverts commit 9fcd791.

k8s-ci-robot requested review from damemi and kerthcet May 29, 2024 16:19

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 29, 2024

k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 29, 2024

k8s-ci-robot assigned alculquicondor, Huang-Wei and liggitt May 29, 2024

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 29, 2024

k8s-ci-robot added release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 29, 2024

AxeZhan mentioned this pull request May 31, 2024

register unschedulable plugin for those plugins that PreFilter's PreFilterResult filter out some nodes #122251

Merged

AxeZhan reviewed May 31, 2024

View reviewed changes

Don't fill in NodeToStatusMap with UnschedulableAndUnresolvable

c8f0ea1

gabesaba force-pushed the prefilter_perf branch from 268919d to c8f0ea1 Compare May 31, 2024 15:57

alculquicondor reviewed Jun 3, 2024

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 3, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 3, 2024

This was referenced Jun 3, 2024

scheduler_perf: measure the degradation of daemonset scheduling #125293

Merged

Throughput degradation scheduling daemonset pods #124709

Closed

k8s-ci-robot merged commit 8bd36c6 into kubernetes:master Jun 3, 2024
14 checks passed

k8s-ci-robot added this to the v1.31 milestone Jun 3, 2024

gabesaba deleted the prefilter_perf branch June 3, 2024 15:06

AxeZhan mentioned this pull request Jun 4, 2024

[Scheduler] Change back to original way of calculating EvaluatedNodes. #125303

Merged

k8s-ci-robot added a commit that referenced this pull request Jun 5, 2024

Merge pull request #125306 from gabesaba/automated-cherry-pick-of-#12…

ed1cda0

…5197-upstream-release-1.30 Cherry pick of #125197: [scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable

k8s-ci-robot added a commit that referenced this pull request Jun 5, 2024

Merge pull request #125309 from gabesaba/cherry-pick-#125197-1.27

31894ff

Cherry pick of #125197: [scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable

k8s-ci-robot added a commit that referenced this pull request Jun 5, 2024

Merge pull request #125307 from gabesaba/cherry-pick-#125197-1.29

734c5c7

Cherry pick of #125197: [scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable

k8s-ci-robot added a commit that referenced this pull request Jun 5, 2024

Merge pull request #125308 from gabesaba/cherry-pick-#125197-1.28

c446306

Cherry pick of #125197: [scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable

This was referenced Jun 5, 2024

[scheduler] Improve Handling of Node Status #125345

Closed

REQUEST: New membership for gabesaba kubernetes/org#5008

Closed

This was referenced Jul 8, 2024

WIP: Improve usage of nodeToStatusMap in scheduler #125954

Closed

Change structure of NodeToStatus map in scheduler #126022

Merged

liggitt mentioned this pull request Oct 21, 2024

scheduler: preallocation for NodeToStatusMap #124714

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable #125197

[scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable #125197

gabesaba commented May 29, 2024 •

edited by liggitt

Loading

k8s-ci-robot commented May 29, 2024

k8s-ci-robot commented May 29, 2024

gabesaba commented May 29, 2024

liggitt commented May 29, 2024

alculquicondor commented May 29, 2024

alculquicondor commented May 29, 2024

k8s-ci-robot commented May 29, 2024

alculquicondor commented May 29, 2024

liggitt commented May 29, 2024

k8s-triage-robot commented May 29, 2024

AxeZhan May 31, 2024 •

edited

Loading

gabesaba May 31, 2024

AxeZhan May 31, 2024

AxeZhan May 31, 2024

gabesaba May 31, 2024

alculquicondor May 31, 2024

AxeZhan Jun 1, 2024 •

edited

Loading

gabesaba Jun 3, 2024

AxeZhan Jun 3, 2024

gabesaba commented Jun 3, 2024

alculquicondor left a comment

k8s-ci-robot commented Jun 3, 2024

k8s-ci-robot commented Jun 3, 2024

[scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable #125197

[scheduler] absent key in NodeToStatusMap implies UnschedulableAndUnresolvable #125197

Conversation

gabesaba commented May 29, 2024 • edited by liggitt Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented May 29, 2024

k8s-ci-robot commented May 29, 2024

gabesaba commented May 29, 2024

liggitt commented May 29, 2024

alculquicondor commented May 29, 2024

alculquicondor commented May 29, 2024

k8s-ci-robot commented May 29, 2024

alculquicondor commented May 29, 2024

liggitt commented May 29, 2024

k8s-triage-robot commented May 29, 2024

AxeZhan May 31, 2024 • edited Loading

Choose a reason for hiding this comment

gabesaba May 31, 2024

Choose a reason for hiding this comment

AxeZhan May 31, 2024

Choose a reason for hiding this comment

AxeZhan May 31, 2024

Choose a reason for hiding this comment

gabesaba May 31, 2024

Choose a reason for hiding this comment

alculquicondor May 31, 2024

Choose a reason for hiding this comment

AxeZhan Jun 1, 2024 • edited Loading

Choose a reason for hiding this comment

gabesaba Jun 3, 2024

Choose a reason for hiding this comment

AxeZhan Jun 3, 2024

Choose a reason for hiding this comment

gabesaba commented Jun 3, 2024

alculquicondor left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jun 3, 2024

k8s-ci-robot commented Jun 3, 2024

gabesaba commented May 29, 2024 •

edited by liggitt

Loading

AxeZhan May 31, 2024 •

edited

Loading

AxeZhan Jun 1, 2024 •

edited

Loading