-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposure the race condition on the pod preemption #94358
Conversation
/release-note-none |
@chendave: you can not set the release note label to "release-note-none" because the PR has the label "kind/deprecation". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: chendave The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/remove-kind api-change |
/release-note-none |
/cc @Huang-Wei @ahg-g @alculquicondor @soulxu I will propose a fix later, the preemption could eventually got convergent if we skip the move rquest when the |
@chendave: GitHub didn't allow me to request PR reviews from the following users: soulxu. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
00ff4e8
to
7325fdf
Compare
if q.podBackoffQ.Len() != 1 { | ||
t.Error("Expected 1 items to be in podBackoffQ") | ||
} | ||
// the lowPriority pod is popped and got scheduled while the highPriorityPod is stuck in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is by design to avoid starvation from high priority pods. Let's discuss in the original issue. However, we intend to reduce the problems of this with #94009
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem here is that the highPriorityPod
should not be in backoffQ
in the first place, it is moved to be backoffQ
just because there was a move request happened first.
lowPriority
pod popped here should be fine, but there should a way for the preemption to get convergent, pls see the code following,
// another pod is added to activeQ.
q.Add(&unschedulablePod)
if q.activeQ.Len() != 1 {
t.Error("Expected 1 item to be in activeQ")
}
it's possible a new pod is added to Queue while the highPriorityPod
is still backoff-ing, after the backoff time is up, the highPriorityPod
needs to preempt the lowPriority
pod again, if the move request is detected the highPriorityPod
is backoff again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this to be split in a different PR so we can already merge the tests using the deterministic clock.
- Fix a potential test flake. - Add new testcase to exposure the race condition on pod preemption. - Take the chance to fix a typo. Signed-off-by: Dave Chen <dave.chen@arm.com>
7325fdf
to
12de87f
Compare
/retest |
@chendave: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
if q.podBackoffQ.Len() != 1 { | ||
t.Error("Expected 1 items to be in podBackoffQ") | ||
} | ||
// the lowPriority pod is popped and got scheduled while the highPriorityPod is stuck in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this to be split in a different PR so we can already merge the tests using the deterministic clock.
} | ||
// the lowPriority pod is popped and got scheduled while the highPriorityPod is stuck in the | ||
// backoffQ. | ||
q.Pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still stuck at the problem of the low priority pod shouldn't be gotten a chance to schedule successful, since when the low priority pod getting scheduling, the resources already token by high priority pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please thinking about the case that high priority pod is still in the unscheduleQ
or backoffQ
, low priority pod is the only one in the activeQ
.
/hold |
@chendave: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/close |
@chendave: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Dave Chen dave.chen@arm.com
What type of PR is this?
Add one of the following kinds:
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Ref #93505
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: