Kubelet: Cleanup returns early when there is busy orphaned pod directory. #29078
Description
Today, I found that mirror pod node e2e test could never pass on my desktop. When the static pod file is removed, the mirror pod on apiserver is expected to be removed, but that didn't happen.
It turns out that on my machine, there is an orphaned pod directory maybe left from previous running:
$ ls /var/lib/kubelet/pods
3feec45e-4bc0-11e6-bea0-8cdcd43ac064
$ sudo ls /var/lib/kubelet/pods/3feec45e-4bc0-11e6-bea0-8cdcd43ac064/volumes
kubernetes.io~empty-dir
The pod directory was never successfully deleted:
Failed to remove orphaned pod "3feec45e-4bc0-11e6-bea0-8cdcd43ac064" dir; err: remove /var/lib/kubelet/pods/3feec45e-4bc0-11e6-bea0-8cdcd43ac064/volumes/kubernetes.io~empty-dir/restart-count: device or resource busy
The output of mount
:
$ mount
...
tmpfs on /var/lib/kubelet/pods/3feec45e-4bc0-11e6-bea0-8cdcd43ac064/volumes/kubernetes.io~empty-dir/restart-count type tmpfs (rw)
The output of fuser
:
sudo fuser -v /var/lib/kubelet/pods/3feec45e-4bc0-11e6-bea0-8cdcd43ac064/volumes/kubernetes.io~empty-dir/restart-count
USER PID ACCESS COMMAND
/var/lib/kubelet/pods/3feec45e-4bc0-11e6-bea0-8cdcd43ac064/volumes/kubernetes.io~empty-dir/restart-count:
root kernel mount /var/lib/kubelet/pods/3feec45e-4bc0-11e6-bea0-8cdcd43ac064/volumes/kubernetes.io~empty-dir/restart-count
After the delete failure, kubelet cleanup function will directly return https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L2077.
And because this is a permanent error, the following mirror pod cleanup code will never run https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L2082.
- In this case, kubelet should continue the following cleanup process.
- Pod directory cleanup should handle this kind of orphaned pod directory.
- Why is there busy volumes left and never be cleaned up.
@yujuhong @saad-ali
/cc @kubernetes/sig-node @kubernetes/sig-storage