Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Honor pod priorities when starting pods #125918

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

huutomerkki
Copy link

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR changes the order of starting pods to honor the pod priorities, higher priority pods being started first. If no priorities are used, the order is based on pod creation timestamps, oldest pods being started first.

Which issue(s) this PR fixes:

Fixes #125815

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 5, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @huutomerkki!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @huutomerkki. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 5, 2024
@k8s-ci-robot k8s-ci-robot requested review from dchen1107 and odinuge July 5, 2024 13:18
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 5, 2024
@ffromani
Copy link
Contributor

ffromani commented Jul 6, 2024

/ok-to-test

thanks for your PR. The two obvious questions are: how can we make sure this change doesn't break existing flows (e.g. do we have sufficient test coverage?) ? since this change is admittedly partially addressing a problem, how can we assess the gains?

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 6, 2024
@bart0sh
Copy link
Contributor

bart0sh commented Jul 29, 2024

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 29, 2024
@huutomerkki
Copy link
Author

huutomerkki commented Jul 31, 2024

/ok-to-test

thanks for your PR. The two obvious questions are: how can we make sure this change doesn't break existing flows (e.g. do we have sufficient test coverage?) ? since this change is admittedly partially addressing a problem, how can we assess the gains?

Thanks for checking it out and sorry for the delay in answering.

For the first question, for a flow to be affected, it has to use the order in which Kubelet tries to start the pods for something and make assumptions based on it. This order is not promised to be the actual starting order of pods, so I would assume there is rather limited amount of valid use cases for this. If there is some obvious one I'm missing, I'd be more than happy to write tests to cover it, same goes for editing the test cases that exist to cover this change if viewed necessary.

For assessing the gains of this change; I'll try to get my hands on some data that I can share. Without it it's hard to assess the gains, but I'd like to point out that this change is aimed for larger and heavier applications, and there most likely is not any noticeable benefit in applications consisting of just a handful of pods. This change assures that the order in which pods are tried to start is always optimal, which might not be too visible in action. If nothing else, it should make the restart process and times more uniform and predictable.

@kashifest
Copy link

@ffromani do you have any other concerns on this patch ?

@huutomerkki huutomerkki force-pushed the minna/start-pods-in-priority-order branch from af2c966 to 8d35728 Compare October 8, 2024 05:51
@huutomerkki
Copy link
Author

/retest

@ffromani
Copy link
Contributor

ffromani commented Dec 3, 2024

For the first question, for a flow to be affected, it has to use the order in which Kubelet tries to start the pods for something and make assumptions based on it. This order is not promised to be the actual starting order of pods, so I would assume there is rather limited amount of valid use cases for this. If there is some obvious one I'm missing, I'd be more than happy to write tests to cover it, same goes for editing the test cases that exist to cover this change if viewed necessary.

For assessing the gains of this change; I'll try to get my hands on some data that I can share. Without it it's hard to assess the gains, but I'd like to point out that this change is aimed for larger and heavier applications, and there most likely is not any noticeable benefit in applications consisting of just a handful of pods. This change assures that the order in which pods are tried to start is always optimal, which might not be too visible in action. If nothing else, it should make the restart process and times more uniform and predictable.

Hey! do we have updates about the gains this PR brings? While I'm generally in favor of this PR, it admittedly partially addresses a problem touching a critical and not-strongly-tested kubelet flow, so having solid gains will be a big plus towards the merge.

Other than that, I think we should consider adding e2e coverage. We can start with minimal non-regression coverage, i for myself would call that sufficient.

@kad
Copy link
Member

kad commented Dec 3, 2024

In overall idea looks ok, but generic question: for 2 pods with same priority, should it be still sorted by creation time?

@SergeyKanzhelev
Copy link
Member

I want to make sure to reflect one point we discussed at SIG Node today. With this change, if kubelet restart happens at precise moment when a new Pod is scheduled, we will have a big change in behavior - instead of eviction message we will get a pod admission failure. This change of behavior will be confusing.

To mitigate this risk, can you please write down all conditions need to be met for this to happen. Is it probable? Is it likely to happen in specific circumstances (e.g. in a deployment one pod is restarting kubelet to apply a new config and it results in a more critical pod scheduled).

Maybe the right fix is to change ordering for all pods that were running before and keep the new ones to the end. Maybe some other solution like this will help.

@huutomerkki huutomerkki force-pushed the minna/start-pods-in-priority-order branch from 8d35728 to b1fc578 Compare December 31, 2024 12:41
@k8s-ci-robot k8s-ci-robot added area/kubeadm area/test sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Dec 31, 2024
Signed-off-by: Minna Färm <minna.farm@est.tech>
…eature

Signed-off-by: Minna Färm <minna.farm@est.tech>
@huutomerkki huutomerkki force-pushed the minna/start-pods-in-priority-order branch from b1fc578 to 9adc6e3 Compare December 31, 2024 12:45
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: huutomerkki
Once this PR has been reviewed and has the lgtm label, please assign tallclair for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@huutomerkki huutomerkki changed the title Honor pod priorities when starting pods WIP Honor pod priorities when starting pods Dec 31, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 31, 2024
@huutomerkki
Copy link
Author

Adding the work I've done so far here as I changing org. Had some issues with rebasing so there's some accidental labels on here, feel free to remove if you know how.

Current status of the e2e test is that the pods are not being restarted, would love to have another set of eyes to tell me why?

In overall idea looks ok, but generic question: for 2 pods with same priority, should it be still sorted by creation time?

I see no reason to use other method to determine that, as 2 pods with same priorities should be interchangeable in starting order.

I want to make sure to reflect one point we discussed at SIG Node today. With this change, if kubelet restart happens at precise moment when a new Pod is scheduled, we will have a big change in behavior - instead of eviction message we will get a pod admission failure. This change of behavior will be confusing.

To mitigate this risk, can you please write down all conditions need to be met for this to happen. Is it probable? Is it likely to happen in specific circumstances (e.g. in a deployment one pod is restarting kubelet to apply a new config and it results in a more critical pod scheduled).

Maybe the right fix is to change ordering for all pods that were running before and keep the new ones to the end. Maybe some other solution like this will help.

Can you elaborate on this? I don't understand why the behaviour would change.

@k8s-ci-robot
Copy link
Contributor

@huutomerkki: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-typecheck 9adc6e3 link true /test pull-kubernetes-typecheck
pull-kubernetes-linter-hints 9adc6e3 link false /test pull-kubernetes-linter-hints
pull-kubernetes-verify 9adc6e3 link true /test pull-kubernetes-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/remove-sig cluster-lifecycle
/remove-area kubeadm

@k8s-ci-robot k8s-ci-robot removed sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. area/kubeadm labels Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Waiting on Author
Development

Successfully merging this pull request may close these issues.

Pod starting order after restart doesn't follow pod priorities
8 participants