Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only system-node-critical pods should be OOM Killed last #99729

Merged
merged 1 commit into from
Apr 11, 2021

Conversation

ravisantoshgudimetla
Copy link
Contributor

@ravisantoshgudimetla ravisantoshgudimetla commented Mar 3, 2021

What type of PR is this?

/kind feature
/kind design

What this PR does / why we need it:

Allow only system-node-critical pods to have -997 OOM score

Which issue(s) this PR fixes:

Fixes #99727

Special notes for your reviewer:

Does this PR introduce a user-facing change?

System-cluster-critical pods should not get a low OOM Score. 

As of now both system-node-critical and system-cluster-critical pods have -997 OOM score, making them one of the last processes to be OOMKilled. By definition system-cluster-critical pods can be scheduled elsewhere if there is a resource crunch on the node where as system-node-critical pods cannot be rescheduled. This was the reason for system-node-critical to have higher priority value than system-cluster-critical.  This change allows only system-node-critical priority class to have low OOMScore.

action required
If the user wants to have the pod to be OOMKilled last and the pod has system-cluster-critical priority class, it has to be changed to system-node-critical priority class to preserve the existing behavior

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


cc @sjenning @rphillips @smarterclayton @wking

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/design Categorizes issue or PR as related to design. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 3, 2021
@ravisantoshgudimetla
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 3, 2021
@k8s-ci-robot k8s-ci-robot requested review from dims and yifan-gu March 3, 2021 21:38
@rphillips
Copy link
Member

/priority important-soon
/retest
/triage accepted

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 3, 2021
@rphillips
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 3, 2021
@smarterclayton
Copy link
Contributor

Before we merge this can we reconstruct the reasoning that led to system-cluster-critical getting this oomscore? It's been long enough that I don't remember everything but there was definitely thought put into it. If we think it's correct to make a change, we need the same people who cared to also weigh in

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2021
@ravisantoshgudimetla
Copy link
Contributor Author

The reasoning was mentioned here. The main goal was a way to map OOMScore to priority of the pod which already expresses criticality of the pod. It was also mentioned that we can change the heuristic later, if we decide to do so.

@rphillips
Copy link
Member

/assign @sjenning

@derekwaynecarr
Copy link
Member

I agree that system-node-critical should have oom_score_adj value that differentiates it as more critical than system-cluster-critical, but I also worry this change to treat the system-cluster-critical with the same score as all Burstable QoS tiers isn't ideal either.

If we change system-node-critical -997, and system-cluster-critical -996, it seems to have the same effect, lower surprise, and still leaves us open to evolve further.

@ehashman
Copy link
Member

ehashman commented Apr 5, 2021

@ehashman we don't backport features which this clearly is.

Ah, while I know this was tagged as feature, it looked like a bugfix to me. But yeah, I suppose this being "action required:" might make it ineligible for k8s backport.

@ehashman
Copy link
Member

ehashman commented Apr 9, 2021

I think @smarterclayton's comments have been addressed, can we remove the hold?

@dims
Copy link
Member

dims commented Apr 9, 2021

/hold cancel

we can debate on back port in the cherry pick :)

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 9, 2021
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@ravisantoshgudimetla
Copy link
Contributor Author

Yeah, Clayton's comments are addressed, we can remove the hold. Thanks for doing it @dims

As far as the backport goes, this will be a behavioral change but from the history of implementation, this was a miss during the initial implementation as @sjenning pointed out, so it is worth backport. Let me know if you folks think otherwise.

@ravisantoshgudimetla
Copy link
Contributor Author

ravisantoshgudimetla commented Apr 9, 2021

/cherry-pick release-1.20

@ravisantoshgudimetla
Copy link
Contributor Author

/cherry-pick release-1.21

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@ehashman
Copy link
Member

ehashman commented Apr 9, 2021

@ravisantoshgudimetla k8s prow doesn't have cherry-pick functionality. You will need to use the ./hack/cherry_pick_pull.sh tool. More info here.

@ravisantoshgudimetla
Copy link
Contributor Author

Containerd job is failing continuously with the following error:

W0409 15:54:57.780] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1].

Hopefully it passes this time

/retest

@ravisantoshgudimetla
Copy link
Contributor Author

@ravisantoshgudimetla k8s prow doesn't have cherry-pick functionality. You will need to use the ./hack/cherry_pick_pull.sh tool. More info here.

Thank you @ehashman

Long time since I worked on a upstream cherry-pick. Thank you for the pointer

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@ravisantoshgudimetla
Copy link
Contributor Author

Seems this test is broken:

#100978

/hold

Putting a hold for now.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 10, 2021
@ravisantoshgudimetla
Copy link
Contributor Author

/hold cancel

#100978 addressed.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 11, 2021
@ravisantoshgudimetla
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.

Only system-node-critical pods should have low OOM Score
10 participants