-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes 1.7.0 local storage teardown failed #48331
Comments
@CallMeFoxie There are no sig labels on this issue. Please add a sig label by: |
/sig storage |
Whoops the dirty fix to force umount does not work as it may fail "as it was already unmounted". Trying out a patch which reads /proc/mounts instead, it seems to be workign much better, can remove/add pods to those PV(C)s without any issue:
|
I experienced this issue while writing some e2e tests - #47999 These tests can be used to reproduce the issue on GCE. The run-one-command pod tests succeed (e.g. "when one pod requests one prebound PVC), while the pods that are created and deleted independently end up encountering this bug. For example, |
I am unsure if this fix fixes the GCE issues as well as I've never used GCE (or AWS for that matter) but it seems to fix it for self hosted kubernetes. Can make it into PR if it's a reasonable fix? Only unsure about the err at opening /proc/mounts - not sure how it behaves on GCE/AWS/within docker. |
@CallMeFoxie - Let's reuse listProcMounts(procMountsPath) to get all the info from /proc/mounts. Yes. Let's get a PR, and I'll test it on GCE. You want to submit the PR or should I? |
Here's the code I'm testing.
|
If I understand correctly, the issue is if the underlying directory is a just a dir and not a mount point, and you bind mount that, then IsLikelyNotMountPoint() will detect it as a directory instead of a mount. If that's the case, I'm curious why Ian's tests are only running into this problem in the case of two pods accessing the same volume. |
For the fix, I would prefer that a new method is used instead of IsLikelyNotMountPoint(). It is being used by other plugins and having to iterate through all the mounts on a node could impact performance. Also it is only the local storage plugin where the underyling directory may not be a mount point. |
@msau42 The tests are failing even if I run one pod after the other. I'll create a test that just uses one pod with create and destroy (instead of one-command pod) to see if I can repro the error. |
Ok, sounds like this is just a generic issue then with non-mount point local volumes then, which fits in with the root cause. |
Btw, I noticed in the common So we should definitely get a fix into the next 1.7 patch. cc @saad-ali |
Actually nevermind, the unmount calls |
/assign @ianchakeres |
…ssue Automatic merge from submit-queue Add local volume bug to known issues **What this PR does / why we need it**: Update known issues with local volume issue kubernetes#48331
Automatic merge from submit-queue Local storage teardown fix **What this PR does / why we need it**: Local storage uses bindmounts and the method IsLikelyNotMountPoint does not detect these as mountpoints. Therefore, local PVs are not properly unmounted when they are deleted. **Which issue this PR fixes**: fixes #48331 **Special notes for your reviewer**: You can use these e2e tests to reproduce the issue and validate the fix works appropriately #47999 The existing method IsLikelyNotMountPoint purposely does not check mountpoints reliability (https://github.com/kubernetes/kubernetes/blob/4c5b22d4c6b630ff1d76b1d15d74c6597c0aa037/pkg/util/mount/mount_linux.go#L161), since the number of mountpoints can be large. https://github.com/kubernetes/kubernetes/blob/4c5b22d4c6b630ff1d76b1d15d74c6597c0aa037/pkg/util/mount/mount.go#L46 This implementation changes the behavior for local storage to detect mountpoints reliably, and avoids changing the behavior for any other callers to a UnmountPath. **Release note**: ``` Fixes bind-mount teardown failure with non-mount point Local volumes (issue #48331). ```
Is there a way to know the release for which this fix is scheduled? Thanks |
Automatic merge from submit-queue (batch tested with PRs 50670, 50332) e2e test for local storage mount point **What this PR does / why we need it**: We discovered that kubernetes can treat local directories and actual mountpoints differently. For example, #48331. The current local storage e2e tests use directories. This PR introduces a test that creates a tmpfs and mounts it, and runs one of the local storage e2e tests. **Which issue this PR fixes**: fixes #49126 **Special notes for your reviewer**: I cherrypicked PR #50177, since local storage e2e tests are broken in master on 2017-08-08 due to "no such host" error. This PR replaces NodeExec with SSH commands. You can run the tests using the following commands: ``` $ NUM_NODES=1 KUBE_FEATURE_GATES="PersistentLocalVolumes=true" go run hack/e2e.go -- -v --up $ go run hack/e2e.go -- -v --test --test_args="--ginkgo.focus=\[Feature:LocalPersistentVolumes\]" ``` Here are the summary of results from my test run: ``` Ran 9 of 651 Specs in 387.905 seconds SUCCESS! -- 9 Passed | 0 Failed | 0 Pending | 642 Skipped PASS Ginkgo ran 1 suite in 6m29.369318483s Test Suite Passed 2017/08/08 11:54:01 util.go:133: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Feature:LocalPersistentVolumes\]' finished in 6m32.077462612s ``` **Release note**: `NONE`
Automatic merge from submit-queue Local storage teardown fix **What this PR does / why we need it**: Local storage uses bindmounts and the method IsLikelyNotMountPoint does not detect these as mountpoints. Therefore, local PVs are not properly unmounted when they are deleted. **Which issue this PR fixes**: fixes #48331 **Special notes for your reviewer**: You can use these e2e tests to reproduce the issue and validate the fix works appropriately kubernetes/kubernetes#47999 The existing method IsLikelyNotMountPoint purposely does not check mountpoints reliability (https://github.com/kubernetes/kubernetes/blob/4c5b22d4c6b630ff1d76b1d15d74c6597c0aa037/pkg/util/mount/mount_linux.go#L161), since the number of mountpoints can be large. https://github.com/kubernetes/kubernetes/blob/4c5b22d4c6b630ff1d76b1d15d74c6597c0aa037/pkg/util/mount/mount.go#L46 This implementation changes the behavior for local storage to detect mountpoints reliably, and avoids changing the behavior for any other callers to a UnmountPath. **Release note**: ``` Fixes bind-mount teardown failure with non-mount point Local volumes (issue kubernetes/kubernetes#48331). ```
Automatic merge from submit-queue Local storage teardown fix **What this PR does / why we need it**: Local storage uses bindmounts and the method IsLikelyNotMountPoint does not detect these as mountpoints. Therefore, local PVs are not properly unmounted when they are deleted. **Which issue this PR fixes**: fixes #48331 **Special notes for your reviewer**: You can use these e2e tests to reproduce the issue and validate the fix works appropriately kubernetes/kubernetes#47999 The existing method IsLikelyNotMountPoint purposely does not check mountpoints reliability (https://github.com/kubernetes/kubernetes/blob/4c5b22d4c6b630ff1d76b1d15d74c6597c0aa037/pkg/util/mount/mount_linux.go#L161), since the number of mountpoints can be large. https://github.com/kubernetes/kubernetes/blob/4c5b22d4c6b630ff1d76b1d15d74c6597c0aa037/pkg/util/mount/mount.go#L46 This implementation changes the behavior for local storage to detect mountpoints reliably, and avoids changing the behavior for any other callers to a UnmountPath. **Release note**: ``` Fixes bind-mount teardown failure with non-mount point Local volumes (issue kubernetes/kubernetes#48331). ```
/kind bug
Tearing down (for example simple updating pods in stateful set) of local-storage PVC seems to be broken:
I've managed to track it down to
pkg/volume/local/local.go calling through some pkg/volume/util/util.go mounter.IsLikelyNotMountPoint(dir)
which literally has written in its comments:
which makes it detect it as a normal directory rather than a mount dir (as it is a normal directory... just bind mounted)
Environment:
kubectl version
): v1.7.0Testing out a (dirty?) patch as we're speaking
The text was updated successfully, but these errors were encountered: