-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix 111933 #112012
fix 111933 #112012
Conversation
@cvvz: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @cvvz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/release-note-none |
pkg/volume/csi/csi_mounter.go
Outdated
if err := removeMountDir(c.plugin, dir); err != nil { | ||
return errors.New(log("Unmounter.TearDownAt failed to clean mount dir [%s]: %v", dir, err)) | ||
} | ||
|
||
volID := c.volumeID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem correct, so if NodeUnpublish
fails then subsequent calls to NodeUnpublish
won't even start because directory and json files has already been removed.
I do not think fixing it here is the right solution but we need to think it from perspective of how cleanup should work if drivers are not yet available or volume is not yet mounted. IMO - the fix has to be made in reconstruction code, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, csi mount directory will be removed anyway by housekeeping handler in removeOrphanedPodVolumeDirs, this function will even remove entire pod volume dir, which is /var/lib/kubelet/pods/<POD_UID>/volumes
, the reason why the directory was not deleted sucessfully and left over is that there is a file called vol_data.json
in it, and this function can only delete directory but when any file exists, it returns error. So here I put removeMountDir
before calling out csi driver to make sure vol_data.json
is cleaned up no matter NodeUnpublish
sucess or not, in this way, the orphan pod volume will be cleaned up by housekeeping handler eventually.
Anyway, the goal is to clean up orphan Pod volume, if we can make it by this simple way, there is no need to make things complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @gnufied , this doesn't seem like the right place to fix it.
What happens when the directory is still mounted? Removal of the directory would fail then but with this change it means TearDownAt
would return an error before calling NodeUnpublishVolume
to unmount the volume, and it would never get unmounted. So I think this breaks the main use case (unmount volume and then remove the mountpoint).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, when the directory is still mounted, removeMountDir
won't return error. It is still unmounted by csi driver first. Every cleaning up action in kubelet should have considered the mount state in the first place to ensure the safety.
Considering that csi driver is responsible for unmounting and cleaning the mount
directory, which also mentioned in the comment, maybe we just only remove vol_data.json
here, which should be done by kubelet, rather than remove the mount dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/kubelet/kubelet_volumes.go
Outdated
podVolDir := kl.getPodVolumesDir(uid) | ||
if err := removeall.RemoveDirsOneFilesystem(kl.mounter, podVolDir); err != nil { | ||
if err := removeall.RemoveAllOneFilesystemCommon(kl.mounter, podVolDir, func(path string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change actually re-introduces a rather serious race condition where vol_data.json
may be deleted by removeOrphanedPodVolumeDirs
before NodeUnstageVolume
has a chance to run, which causes unmount to fail and kubelet ends up not being able to clean the mountpoints on the node: #101911
So this is not the right place to remove vol_data.json
, it will result in unmounts failing in some scenarios.
You may want to take a look at the volume reconstruction code in volumemanager. This is a tough bug in a complex part of the code, and the fix is not obvious to me, but when I saw it before syncStates failed here because the /mount subdir did not exist. Then it calls cleanupMounts
but that failed because the volume failed to be reconstructed and it couldn't create the unmounter at that point. If you're able to reproduce the problem, you can check if you're seeing the same failure scenario in kubelet.log.
IMO, we either need to be able to reconstruct the volume to unmount it in this scenario, or at least find some way to create the unmounter so that cleanupMounts can remove vol_data.json even when the /mount subdir does not exist. Sorry if this direction is still a little vague, but that's as far as I've been able to dig into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dobsonj Thank you so much for the explanation. I am excited that there are still so many things I didn't know😄, and I will dig into it again to find a right solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dobsonj @gnufied Actually, not only the volume directory /mount
will be left over, the /globalmount
directory will be left over either. Because in this scenario, the volume manager did neither NodeUnstageVolume
nor NodeUnpublishVolume
:
When reconstructing, the volume directory /mount
is not mountpoint, so volume manager tries to do cleanup by calling NodeUnpublishVolume
and cleanup the /mount
directory and vol_data.json
, but the csi driver did not register yet, which makes cleanup fails. Actually, no matter it succeeds or fails, NodeUnstageVolume
won't be called, so /globalmount
will be left over.
I think there may be two solutions:
- Let the reconstruction success no matter
/mount
is mountpoint or not. After adding the volume to ASW, the volume manager's reconcile process will doNodeUnstageVolume
andNodeUnpublishVolume
later. However, I wonder if there is special reason for making reconstruction failing when mountpoint does not exist. - When the unmounter executes unmounting, clean up the directory and
vol_data.json
without callingNodeUnpublishVolume
andNodeUnstageVolume
if the directory is already not a mountpoint.
WDYT? Which one is better or any other idea?
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cvvz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@dobsonj @gnufied Hi, I've updated the PR, PTAL. The latest solution is not checking the existence of mount point during reconstruction. If mount point does not exist, reconstruction will success and update reconstructed volume into ASW, which will be unmounted and cleaned up in the subsequent reconcile. So, previously, why making reconstruction fail when mountpoint does not exist? I found the original issue #51982, it is to fix the problem that after rebooting the node, ASW and DSW will be the same after reconstruction and updating ASW succeed, thus reconcile won't remount the volume which is no longer mount point. However, this will never happen now since during reconstruction, we will not update ASW if the volume is in DSW. |
@cvvz: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
If you still need this PR then please rebase, if not, please close the PR |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Fixes #111933, do not check the existence of mount point during reconstruction. If mount point does not exist, reconstruction will success and update reconstructed volume into ASW, which will be unmounted and cleaned up in the subsequent reconcile.
Which issue(s) this PR fixes:
Fixes #111933
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: