Skip to content

kube-scheduler updates pod status mistakenly during preemptionΒ #126643

Closed
@xiazhan

Description

What happened?

When a pod with "BestEffort" of qosClass is preempted by a higher priority pod whose qolClass is "Burstable", the victim pod's qosClass will be updated to "Burstable" because the scheduler will update the victim's status with the content from higher priority pod before deleting the victim pod. The pod then will be stuck in terminating state after deleting because of this check. Supposedly, kubelet will reconcile the status back very soon. However, the deletion request stops kubelet from doing that.

What did you expect to happen?

The victim pod's qosClass shouldn't be changed and the pod should be deleted successfully.

How can we reproduce it (as minimally and precisely as possible)?

reproduced it on 1.29.4

Reproduce steps:
Scale the node number to 1

  • Create victim pod (here I use a cronjob to create a pod with empty resources)
  • Create a higher priority class(p1) with "Preempt" policy.
  • Create extra pods(using a deployment) with priorityClassName as "p1" to make the pod number beyond the maxim pod number of the node. The pods also have cpu/memory requests set.
  • The victim pod will be preempted, then stuck in "terminating" state
    cronjob.txt
    deployment.txt
    pc.txt

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: v1.29.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4

Cloud provider

Azure

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions