Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix 111933 #112012

Closed
wants to merge 1 commit into from
Closed

fix 111933 #112012

wants to merge 1 commit into from

Conversation

cvvz
Copy link
Member

@cvvz cvvz commented Aug 24, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

Fixes #111933, do not check the existence of mount point during reconstruction. If mount point does not exist, reconstruction will success and update reconstructed volume into ASW, which will be unmounted and cleaned up in the subsequent reconcile.

Which issue(s) this PR fixes:

Fixes #111933

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 24, 2022
@k8s-ci-robot
Copy link
Contributor

@cvvz: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 24, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @cvvz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 24, 2022
@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 24, 2022
@yangjunmyfm192085
Copy link
Contributor

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 24, 2022
if err := removeMountDir(c.plugin, dir); err != nil {
return errors.New(log("Unmounter.TearDownAt failed to clean mount dir [%s]: %v", dir, err))
}

volID := c.volumeID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem correct, so if NodeUnpublish fails then subsequent calls to NodeUnpublish won't even start because directory and json files has already been removed.

I do not think fixing it here is the right solution but we need to think it from perspective of how cleanup should work if drivers are not yet available or volume is not yet mounted. IMO - the fix has to be made in reconstruction code, not here.

Copy link
Member Author

@cvvz cvvz Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, csi mount directory will be removed anyway by housekeeping handler in removeOrphanedPodVolumeDirs, this function will even remove entire pod volume dir, which is /var/lib/kubelet/pods/<POD_UID>/volumes, the reason why the directory was not deleted sucessfully and left over is that there is a file called vol_data.json in it, and this function can only delete directory but when any file exists, it returns error. So here I put removeMountDir before calling out csi driver to make sure vol_data.json is cleaned up no matter NodeUnpublish sucess or not, in this way, the orphan pod volume will be cleaned up by housekeeping handler eventually.

Anyway, the goal is to clean up orphan Pod volume, if we can make it by this simple way, there is no need to make things complicated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @gnufied , this doesn't seem like the right place to fix it.
What happens when the directory is still mounted? Removal of the directory would fail then but with this change it means TearDownAt would return an error before calling NodeUnpublishVolume to unmount the volume, and it would never get unmounted. So I think this breaks the main use case (unmount volume and then remove the mountpoint).

Copy link
Member Author

@cvvz cvvz Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, when the directory is still mounted, removeMountDir won't return error. It is still unmounted by csi driver first. Every cleaning up action in kubelet should have considered the mount state in the first place to ensure the safety.

Considering that csi driver is responsible for unmounting and cleaning the mount directory, which also mentioned in the comment, maybe we just only remove vol_data.json here, which should be done by kubelet, rather than remove the mount dir.

Copy link
Member Author

@cvvz cvvz Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied @dobsonj I agree that we should not break the original logic of TearDownAt. And I update the PR. This time I put the deletion of the volume data file in kubelet housekeeping process, which is responsible to clean up the orphan Pod volume. Could you PTAL?

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 2, 2022
podVolDir := kl.getPodVolumesDir(uid)
if err := removeall.RemoveDirsOneFilesystem(kl.mounter, podVolDir); err != nil {
if err := removeall.RemoveAllOneFilesystemCommon(kl.mounter, podVolDir, func(path string) error {
Copy link
Member

@dobsonj dobsonj Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change actually re-introduces a rather serious race condition where vol_data.json may be deleted by removeOrphanedPodVolumeDirs before NodeUnstageVolume has a chance to run, which causes unmount to fail and kubelet ends up not being able to clean the mountpoints on the node: #101911
So this is not the right place to remove vol_data.json, it will result in unmounts failing in some scenarios.
You may want to take a look at the volume reconstruction code in volumemanager. This is a tough bug in a complex part of the code, and the fix is not obvious to me, but when I saw it before syncStates failed here because the /mount subdir did not exist. Then it calls cleanupMounts but that failed because the volume failed to be reconstructed and it couldn't create the unmounter at that point. If you're able to reproduce the problem, you can check if you're seeing the same failure scenario in kubelet.log.
IMO, we either need to be able to reconstruct the volume to unmount it in this scenario, or at least find some way to create the unmounter so that cleanupMounts can remove vol_data.json even when the /mount subdir does not exist. Sorry if this direction is still a little vague, but that's as far as I've been able to dig into this.

Copy link
Member Author

@cvvz cvvz Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dobsonj Thank you so much for the explanation. I am excited that there are still so many things I didn't know😄, and I will dig into it again to find a right solution.

Copy link
Member Author

@cvvz cvvz Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dobsonj @gnufied Actually, not only the volume directory /mount will be left over, the /globalmount directory will be left over either. Because in this scenario, the volume manager did neither NodeUnstageVolume nor NodeUnpublishVolume:

When reconstructing, the volume directory /mount is not mountpoint, so volume manager tries to do cleanup by calling NodeUnpublishVolume and cleanup the /mount directory and vol_data.json, but the csi driver did not register yet, which makes cleanup fails. Actually, no matter it succeeds or fails, NodeUnstageVolume won't be called, so /globalmount will be left over.

I think there may be two solutions:

  1. Let the reconstruction success no matter /mount is mountpoint or not. After adding the volume to ASW, the volume manager's reconcile process will do NodeUnstageVolume and NodeUnpublishVolume later. However, I wonder if there is special reason for making reconstruction failing when mountpoint does not exist.
  2. When the unmounter executes unmounting, clean up the directory and vol_data.json without calling NodeUnpublishVolume and NodeUnstageVolume if the directory is already not a mountpoint.

WDYT? Which one is better or any other idea?

@cvvz cvvz requested review from gnufied and dobsonj and removed request for xing-yang, verult and gnufied September 6, 2022 23:29
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cvvz
Once this PR has been reviewed and has the lgtm label, please assign saad-ali for approval by writing /assign @saad-ali in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cvvz
Copy link
Member Author

cvvz commented Sep 7, 2022

@dobsonj @gnufied Hi, I've updated the PR, PTAL.

The latest solution is not checking the existence of mount point during reconstruction. If mount point does not exist, reconstruction will success and update reconstructed volume into ASW, which will be unmounted and cleaned up in the subsequent reconcile.

So, previously, why making reconstruction fail when mountpoint does not exist? I found the original issue #51982, it is to fix the problem that after rebooting the node, ASW and DSW will be the same after reconstruction and updating ASW succeed, thus reconcile won't remount the volume which is no longer mount point. However, this will never happen now since during reconstruction, we will not update ASW if the volume is in DSW.

@cvvz cvvz requested review from gnufied and dobsonj and removed request for dobsonj and gnufied September 13, 2022 02:06
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 8, 2022
@k8s-ci-robot
Copy link
Contributor

@cvvz: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dims
Copy link
Member

dims commented Dec 12, 2022

If you still need this PR then please rebase, if not, please close the PR

@cvvz cvvz closed this Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

CSI volumes left overs are not cleaned up if CSI plugin is attachable
6 participants