Running pods with devices are terminated if kubelet is restartedΒ #118559
Closed
Description
What happened?
In KubeVirt project, we now see a regression when running on Kubernetes 1.25.10 | 1.26.5 | 1.27.2
. If kubelet is restarted on a node, then all the existing and running workloads that use devices are terminated with UnexpectedAdmissionError
:
Warning UnexpectedAdmissionError 45s kubelet Allocate failed due to no healthy devices present; cannot allocate unhealthy devices devices.kubevirt.io/kvm, which is unexpected
Normal Killing 42s kubelet Stopping container compute
KubeVirt runs virtual machines inside pods and uses a device plugin to advertise e.g. /dev/kvm
on the nodes.
Presumably, this PR changed the behavior: #116376
Original issue: #109595
What did you expect to happen?
A potential restart of kubelet should not interrupt the running workloads.
How can we reproduce it (as minimally and precisely as possible)?
with KubeVirt:
- run a KubeVirt VM
pkill kubelet
- observe that the workload pod gets terminated
or with https://github.com/k8stopologyawareschedwg/sample-device-plugin
- make deploy
- make test-both
- pkill kubelet
- the pod gets restarted
Anything else we need to know?
No response
Kubernetes version
This affects the 1.25.x, 1.26.x and 1.27.x branches.
1.25.10 | 1.26.5 | 1.27.2
Cloud provider
N/A
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Assignees
Labels
Type
Projects
Status
Done