Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.3.2] Pod stuck in the ContainerCreating state while mounting secret timeout #29617

Closed
taraspos opened this issue Jul 26, 2016 · 7 comments
Closed

Comments

@taraspos
Copy link

taraspos commented Jul 26, 2016

Hello, I have GKE 1.3.2 version and my cluster stopped being able to mount Secret files as Volumes.

I'm keep getting:

 timeout expired waiting for volumes to attach/mount for pod

after checking instances logs on the StackDriver i can see next message:

Failed cleaning pods: [remove /var/lib/kubelet/pods/08433a42-5339-11e6-95d5-42010a84003a/volumes/kubernetes.io~secret/test-secret: device or resource busy, remove /var/lib/kubelet/pods/3fc204e9-5337-11e6-95d5-42010a84003a/volumes/kubernetes.io~secret/test-secret: device or resource busy, remove /var/lib/kubelet/pods/5320c80e-5332-11e6-95d5-42010a84003a/volumes/kubernetes.io~secret/test-secret: device or resource busy]

secrets with other names working normal, but the secret with name text-secret make pod hangs.

NAME                                 READY     STATUS              RESTARTS   AGE       NODE
jnlp-slave-7b37c907f1541   0/1       ContainerCreating

and not trying to restart it either.

UPDATE:
Looks like issue related to the #29059
Default token keeps being remounted and my secret never gets mounted and then pod failing with the timeout.

I just downgraded GKE nodes to the 1.2.5 version and everything seems ok for now.

Please take a look at this bug since it's pretty critical. Pods creating hangs randomly due inability to mount volume.

@taraspos taraspos changed the title GKE unable to mount secrets [1.3.2] Failed to mount secret due to timeout. Jul 27, 2016
@taraspos
Copy link
Author

taraspos commented Aug 1, 2016

Still getting random Pod stuck while mounting secret on the GKE 1.3.3.

@taraspos taraspos changed the title [1.3.2] Failed to mount secret due to timeout. [1.3.2] Pod stuck in the ContainerCreating state while mounting secret timeout Aug 1, 2016
@saad-ali
Copy link
Member

saad-ali commented Aug 1, 2016

@trane9991 Based on what you provided, you hit an issue where if a secret fails to unmount it will be unable to mount to any other pod. It was fixed by #28939.

The fix is in v1.3.4, which should be in GKE soon.

@saad-ali
Copy link
Member

saad-ali commented Aug 1, 2016

Closing this issue. Please re-open if you run into it in v1.3.4+

@saad-ali saad-ali closed this as completed Aug 1, 2016
@boydgreenfield
Copy link

I'm experiencing this exact issue on 1.3.4 (albeit on AWS running jessie not GKE).

Has anyone successfully found a workaround that doesn't involve patching k8s itself?

We get random pods getting stuck in ContainerCreating (5-10%ish), which quite badly breaks our use case (spinning up containers on user request). We were previously on ~1.2.3 with no problems. I did downgrade to ~1.2.6 but had the same stalled container creation hang.

(Sorry for lack of precise version numbers, on a phone)

@saad-ali
Copy link
Member

saad-ali commented Aug 3, 2016

@boydgreenfield Based on what you've said you're experiencing this issue both with v1.2 and v1.3 which means it is likely not the issue described above. If you can provide more information I can help you debug: What version of master/node are you running. How are you deploying your pods? For the pods stuck in ContainerCreating, could you grab the /var/log/kubelet.log files from those nodes during the incident and the /var/log/kube-controller-manager.log file from the master.

@boydgreenfield
Copy link

@saad-ali – I was on a mobile device earlier but will respond with logs for 1.3 in the morning Pacific. Apologies about the less-than-complete error message this evening.

@saad-ali
Copy link
Member

saad-ali commented Aug 3, 2016

@boydgreenfield No problem! And I'd love to see the 1.2 logs as well if you're experiencing the issue on both and still have access to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants