-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup: removes unnomination logic for lower preemptors on the same node #128067
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
b060178
to
9e49e1c
Compare
/hold |
/test pull-kubernetes-integration |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
2b312e9
to
257aa11
Compare
@alculquicondor What do you think about this change? |
cc @dom4ha |
So, now we know that And, after some time, the lower preemptor will be in the scheduling cycle again. kubernetes/pkg/scheduler/schedule_one.go Lines 493 to 502 in 429edc5
However, in current time. There may be other nodes which can also accept the preemptor, and will have a larger score. |
Rethinking about it, the best might be make a preemption of high priority pods consider lower priority nominated pods. Let me make a deeper thought during the weekend. |
1 similar comment
Rethinking about it, the best might be make a preemption of high priority pods consider lower priority nominated pods. Let me make a deeper thought during the weekend. |
It would be good to consider how Autoscaler handles nominated node before taking the decision to remove this logic. If some of the low priority pods no longer fit the nominated node, Autoscaler should be aware that it needs to still consider them in simulation in other places, but not necessarily tied to this particular node, which would be overallocated. Another argument for keeping this logic is that such overallocation would block scheduling of even lower priority pods which could fit. If unnominated pods still fit, they could be still scheduled in the same place. The only worry is potentially wasted ongoing preemptions and the risk unnominated pods won't reschedule, but not sure about it yet. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
When the current scheduler runs a preemption for node1, it removes
.status.nominatedNodeName
from all the other lower priority Pods based on the assumption that they no longer fit the node because of a new higher priority preemptor.But, this implementation currently doesn't work as intended actually. The comment says that
But, this is not true. The update in Pod's status is ignored, and doesn't trigger the requeueing. So, the current implementation contains the risk that the Pods that got robbed of nominatedNodeName won't be retried.
So, we have two options here:
a. Utilize
PodsToActivate
. But, it requires a change ofPodsToActivate
usage because currentlyPodsToActivate
is effective only when the scheduling cycle finishes successfully, meaning we currently doesn't handle it in PostFilter path (ref).b. Create a new function in
framework.Handle
that moves Pods to activeQ likePodsToActivate
..status.nominatedNodeName
, it tries to check other nodes as well later (ref).I'm open to the discussion whether we should take though, I made this PR pick up (2) because
.status.nominatedNodeName
, probably benefitting the performance.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: