-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GCI mounter issue #38124
Fix GCI mounter issue #38124
Conversation
Remove break in the mounter script to make sure gc run multiple times
// Find all references to the device. | ||
var refs []string | ||
if deviceName == "" { | ||
glog.Warningf("could not determine device for path: %q", mountPath) | ||
} else { | ||
for i := range mps { | ||
if mps[i].Device == deviceName && mps[i].Path != slTarget { | ||
refs = append(refs, mps[i].Path) | ||
mpPrefix := getPodsDirPrefix(mps[i].Path, options.DefaultKubeletPodsDirName) | ||
if mpPrefix == podsDirPrefix { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry with this is that it may skip checks on non-rkt mounts as well. Let's see if we can minimize this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to filter out the mount refs based on string "gci-mounter" in the path
Jenkins verification failed for commit 6cb0ae2. Full PR test history. The magic incantation to run this job again is |
Jenkins GKE smoke e2e failed for commit a22f52e. Full PR test history. The magic incantation to run this job again is |
@@ -130,14 +130,24 @@ func GetMountRefs(mounter Interface, mountPath string) ([]string, error) { | |||
} | |||
} | |||
|
|||
// TODO: this is a workaround for the unmount device issue caused by gci mounter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a total hack. gci_mounter
is a GKE specific deployment detail. We should not be leaking that logic into upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed a few options, but they are all seem hacky...
@@ -31,7 +31,7 @@ function gc { | |||
# Rkt pods end up creating new copies of mounts on the host. Hence it is ideal to clean them up right away. | |||
attempt=0 | |||
until [ $attempt -ge 5 ]; do | |||
${RKT_BINARY} gc --grace-period=0s &> /dev/null && break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you making this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise, it only run gc once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I test it, I think sometimes the container still in running state in the moment when gc is executed. Give a few try can give more chances to garbage collect the containers . I also thought it was intended to execute 5 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was only intended to run multiple times if any of the previous attempts fail. Blocking for 5 seconds doesn't sound like a good idea. Is it possible to detect incomplete gc and then retry?
Also why is GC of concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the container is not GC, it causes those extra mounts which makes
- mount table is very big if there are many mount points, which might be a performance concern since we list the mount points during unmount operations
- Without this PR's fix, unmount device will fail because of those unrelated mount references. (umount device function first check whether there are other mount references for the mount path and will fail the operation if there are some exist)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1
might be of concern. Do we have any data to back up that concern?
I don't get 2
. Why is umount failing if other shared recursive mounts exist? Is the umount
syscall failing or is it our own custom logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is our own logic to check the mount references.
Can't we fix that logic then? As long as there is no separate mount point with the same device (bind mount), we can safely unmount. Check the mount reference number in /proc/self/mountinfo
to make that decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a mount appears in two locations via a shared mount then /proc/self/mountinfo
will explicitly indicate that. Here is an example.
$ cat /proc/self/mountinfo
...
90 80 0:66 / /tmp/original/tmpfs rw,relatime **shared:2** - tmpfs tmpfs rw
103 101 0:66 / /tmp/shared/tmpfs rw,relatime **shared:2** - tmpfs tmpfs rw
Look at the mount propagation type followed by a propagation number. If the number is the same, then that mount can be ignored. A umount from any one of those paths will result in the mount disappearing from both locations.
$ sudo umount /tmp/original/tmpfs
vishnuk@vishnuk-linux:~$ cat /proc/self/mountinfo
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current solution used here is checking the mount path and the shared mount path created by the rkt container. If the shared mount path contains the original mount path, this will be filtered out during GetMountRef. See comments #38124 (comment)
@k8s-bot cvm gke e2e test this |
it is our own logic to check the mount references.
…On Mon, Dec 5, 2016 at 10:52 PM, Vish Kannan ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In cluster/gce/gci/mounter/mounter
<#38124>:
> @@ -31,7 +31,7 @@ function gc {
# Rkt pods end up creating new copies of mounts on the host. Hence it is ideal to clean them up right away.
attempt=0
until [ $attempt -ge 5 ]; do
- ${RKT_BINARY} gc --grace-period=0s &> /dev/null && break
1 might be of concern. Do we have any data to back up that concern?
I don't get 2. Why is umount failing if other shared recursive mounts
exist? Is the umount syscall failing or is it our own custom logic?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#38124>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASSNxeJff85Qf5NZp7QRL53kd8TJ770Zks5rFQY8gaJpZM4LEjHE>
.
--
- Jing
|
Please finalize on this PR ASAP, cutoff for 1.5 is Tuesday. |
I'm not comfortable with this PR because of #38124 (comment) |
I am wondering whether it is enough to only check the propagation type and
its value. Isn't it possible that some other mount points that also have
the same value? By checking the mount path strings (see the example below),
the shared mount path should contain the full original mount path. So it
looks like by checking the mount paths and peer group X value (shared:X) it
might good enough to filter out the propagated mount refs.
351 314 8:16 / /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/nfs-disk
rw,relatime shared:133 - ext4 /dev/sdb rw,data=ordered
353 28 8:16 / /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/nfs-disk
rw,relatime shared:133 - ext4 /dev/sdb rw,data=ordered
352 911 8:16 /
/var/lib/rkt/pods/run/3496eadc-74d9-4349-9370-db7636f5f963/stage1/rootfs/opt/stage2/gci-mounter/rootfs/var/lib/kubelet/plugins/
kubernetes.io/gce-pd/mounts/nfs-disk rw,relatime shared:133 - ext4 /dev/sdb
rw,data=ordered
…On Mon, Dec 5, 2016 at 11:22 PM, Vish Kannan ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In cluster/gce/gci/mounter/mounter
<#38124>:
> @@ -31,7 +31,7 @@ function gc {
# Rkt pods end up creating new copies of mounts on the host. Hence it is ideal to clean them up right away.
attempt=0
until [ $attempt -ge 5 ]; do
- ${RKT_BINARY} gc --grace-period=0s &> /dev/null && break
If a mount appears in two locations via a shared mount then
/proc/self/mountinfo will explicitly indicate that. Here is an example.
$ cat /proc/self/mountinfo
...
90 80 0:66 / /tmp/original/tmpfs rw,relatime **shared:2** - tmpfs tmpfs rw
103 101 0:66 / /tmp/shared/tmpfs rw,relatime **shared:2** - tmpfs tmpfs rw
Look at the mount propagation type followed by a propagation number. If
the number is the same, then that mount can be ignored. A umount from any
one of those paths will result in the mount disappearing from both
locations.
$ sudo umount /tmp/original/tmpfs
***@***.***:~$ cat /proc/self/mountinfo
...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#38124>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASSNxToh8NuuUoTS4PrS756Yaok4yvNQks5rFQ0_gaJpZM4LEjHE>
.
--
- Jing
|
Updated the PR. PTAL. |
Jenkins unit/integration failed for commit 887cd40. Full PR test history. The magic incantation to run this job again is |
this is a workaround for the unmount device issue caused by gci mounter. In GCI cluster, if gci mounter is used for mounting, the container started by mounter script will cause additional mounts created in the container. Since these mounts are irrelavant to the original mounts, they should be not considered when checking the mount references. By comparing the mount path prefix, those additional mounts can be filtered out. Plan to work on better approach to solve this issue.
@vishh Without this change when anyone uses GlusterFS via mounter, all other mounts will fail to unmount. That's pretty bad. We need this fixed for the 1.5 release ASAP. This is hacky and I'd like to have a cleaner solution, but it gets the job done. @jingxu97 please open an issue to revisit and fix this code with a more robust solution. |
/lgtm |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue |
I hope by merging this in a hurry, we did not introduce new bugs :) |
+1 We'll let it bake overnight in master. If the world hasn't burned down, we'll cherry pick this in the AM (Wed morning PST). |
It would help if Jing can add more comments to the PR explaining the logic
that she introduced using some example.
…On Wed, Dec 7, 2016 at 8:26 AM, Saad Ali ***@***.***> wrote:
+1 We'll let it bake overnight in master. If the world hasn't burned down,
we'll cherry pick this in the AM (Wed morning PST).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#38124 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKJCXZ5sdN_UFxUYVWyqwkPMriI1Vks5rFiB6gaJpZM4LEjHE>
.
|
Commit found in the "release-1.5" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
This is a fix on top kubernetes#38124. In this fix, we move the logic to filter out shared mount references into operation_executor's UnmountDevice function to avoid this part is being used by other types volumes such as rdb, azure etc. This filter function should be only needed during unmount device for GCI image.
Automatic merge from submit-queue Fix unmountDevice issue caused by shared mount in GCI This is a fix on top #38124. In this fix, we move the logic to filter out shared mount references into operation_executor's UnmountDevice function to avoid this part is being used by other types volumes such as rdb, azure etc. This filter function should be only needed during unmount device for GCI image.
This is a fix on top kubernetes#38124. In this fix, we move the logic to filter out shared mount references into operation_executor's UnmountDevice function to avoid this part is being used by other types volumes such as rdb, azure etc. This filter function should be only needed during unmount device for GCI image.
This is a fix on top kubernetes#38124. In this fix, we move the logic to filter out shared mount references into operation_executor's UnmountDevice function to avoid this part is being used by other types volumes such as rdb, azure etc. This filter function should be only needed during unmount device for GCI image.
…24-upstream-release-1.4 Automated cherry pick of #38124
Two fixes in this PR
Remove break in the mounter script to make sure gc run multiple times
To make sure existed container created by mounter script gets garbage collected, we make rkt gc run multiple times
Fix unmount issue cased by GlusterFS mounter
When GlusterFS volume is mounted on GCI image cluster, the GCI mounter script will create a container and then execute mount in the container. Because mount for GlusterFS will run a glusterfs daemon, the container created by the mounter will continue be in running state and rkt garbage collection will not collect it. Because of the shared mount propagation setting, any mounts on the host will be reflected to the container which might cause a large mount table. Also, in current umount device logic, we check the mount references before issuing the operations. In this case, unmount device will fail because of the extra mount points in the container. The following gives an example of original mount path and the shared mount point created in container.
a. original mount
/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/nfs-disk rw,relatime shared:133 - ext4 /dev/sdb rw,data=ordered
b. shared mount in container
/var/lib/rkt/pods/run/3496eadc-74d9-4349-9370-db7636f5f963/stage1/rootfs/opt/stage2/gci mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/nfs-disk rw,relatime shared:133 - ext4 /dev/sdb rw,data=ordered
The current workaround for this issue is: since these shared mounts are irrelevant to the original mounts, they should be not considered when checking the mount references. By comparing the mount paths, the shared mount path contains the full original mount path so they should be filtered out.