-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: storage: don't hang kubelet on unresponsive nfs #35038
kubelet: storage: don't hang kubelet on unresponsive nfs #35038
Conversation
@k8s-bot gci gke e2e test this |
@sjenning -- per our chat, a comment that describes the problem on this pr with an actual example set of directories was helpful for simpletons like myself to follow the core problem. |
@k8s-bot gci gke e2e test this |
@derekwaynecarr I updated the first comment with a more detailed example of how this happens IRL |
@k8s-bot gci gke e2e test this |
LGTM |
fyi @kubernetes/sig-node |
any chance of this fix being cherry-picked? |
@k8s-bot gci gke e2e test this |
Removing label |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Jenkins GCI GKE smoke e2e failed for commit da3683e. Full PR test history. The magic incantation to run this job again is |
Automatic merge from submit-queue |
Thanks for digging into this and getting a workaround out for 1.5 @sjenning! Let's get this cherry-picked back to 1.4. |
@sjenning, during some testing, I noticed that your fix still won't solve the problem of kubelet hung on unresponsive nfs. When a node has some directories mounting to a nfs server (running in a container) exports, after the nfs server container is deleted, ReadDirNoStat() function will hung too. Could you please check again and see whether you get different result? Thank you! cc @kubernetes/sig-storage |
Fixes #31272
Currently, due to the nature of nfs, an unresponsive nfs volume in a pod can wedge the kubelet such that additional pods can not be run.
The discussion thus far surrounding this issue was to wrap the
lstat
, the syscall that ends up hanging in uninterruptible sleep, in a goroutine and limiting the number of goroutines that hang to one per-pod per-volume.However, in my investigation, I found that the callsites that request a listing of the volumes from a particular volume plugin directory don't care anything about the properties provided by the
lstat
call. They only care about whether or not a directory exists.Given that constraint, this PR just avoids the
lstat
call by usingReaddirnames()
instead ofReadDir()
orReadDirNoExit()
More detail for reviewers
Consider the pod mounted nfs volume at
/var/lib/kubelet/pods/881341b5-9551-11e6-af4c-fa163e815edd/volumes/kubernetes.io~nfs/myvol
. The kubelet wedges because when we do aReadDir()
orReadDirNoExit()
it callssyscall.Lstat
onmyvol
which requires communication with the nfs server. If the nfs server is unreachable, this call hangs forever.However, for our code, we only care what about the names of files/directory contained in
kubernetes.io~nfs
directory, not any of the more detailed information theLstat
call provides. Getting the names can be done withReaddirnames()
, which doesn't need to involve the nfs server.@pmorie @eparis @ncdc @derekwaynecarr @saad-ali @thockin @vishh @kubernetes/rh-cluster-infra
This change is