Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: discard a pod at Pop() when the pod is being scheduled #127016

Merged
merged 1 commit into from
Sep 5, 2024

Conversation

sanposhiho
Copy link
Member

@sanposhiho sanposhiho commented Aug 30, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

inFlightPods should have only one element per Pod.
But, the scheduler had a bug which multiple Pod objects in different places mistakenly. (binding cycle and the queue) #118226

This kind of bug would result in a serious memory leak because Done() wouldn't be able to clean up Pods in InFlightEvents correctly.

So, in this PR, we discard the Pod if it's being scheduled (which shouldn't happen unless we make the same bug like #118226 though)
so that a possible duplicated Pod issue in the future wouldn't cause a huge memory leak issue, at least.

Which issue(s) this PR fixes:

Part of #120622

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix a potential memory leak in QueueingHint (alpha feature)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2024
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 30, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 30, 2024
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 30, 2024
@sanposhiho
Copy link
Member Author

I'll have a test for it. Just open it right now to show it to @KunWuLuan and other people.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 30, 2024
@sanposhiho
Copy link
Member Author

/hold

To make sure this PR goes thru the approver's approval

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 30, 2024
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 30, 2024
@sanposhiho sanposhiho force-pushed the multiple-inflightpods branch from 9c570de to 9b40008 Compare August 30, 2024 09:21
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 30, 2024
@sanposhiho sanposhiho marked this pull request as ready for review August 30, 2024 09:22
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2024
@sanposhiho
Copy link
Member Author

/cc @macsko @alculquicondor

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 5, 2024
@sanposhiho sanposhiho force-pushed the multiple-inflightpods branch from fc98923 to 62ead30 Compare September 5, 2024 07:20
@sanposhiho
Copy link
Member Author

/retitle fix: discard a pod at Pop() when the pod is being scheduled

@k8s-ci-robot k8s-ci-robot changed the title fix: allow inFlightPods to have multiple elements per Pod fix: discard a pod at Pop() when the pod is being scheduled Sep 5, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 5, 2024
@sanposhiho sanposhiho force-pushed the multiple-inflightpods branch from 62ead30 to 6a230c8 Compare September 5, 2024 07:46
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 5, 2024
@macsko
Copy link
Member

macsko commented Sep 5, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 5, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9d08ff05cbfad5821d57f018a4baf702b5c5211b

@sanposhiho sanposhiho force-pushed the multiple-inflightpods branch from 6a230c8 to 6d357d2 Compare September 5, 2024 13:30
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 5, 2024
@sanposhiho
Copy link
Member Author

@alculquicondor addressed

@alculquicondor
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 5, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b0306242b81ca3d89cc162c13c228b07d935c623

@sanposhiho
Copy link
Member Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 5, 2024
@k8s-ci-robot k8s-ci-robot merged commit dfb763b into kubernetes:master Sep 5, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants