Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: set "host is down" as corrupted mount #203

Closed
wants to merge 1 commit into from

Conversation

andyzhangx
Copy link
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:
fix: set "host is down" as corrupted mount

When SMB server is down, there is no way to terminate pod which is using SMB mount, would get following error. This PR regard host is down as corrupted mount dir, and then I would work out another PR in k8s to skip UnmountVolume.TearDown process if host is down

Apr 20 11:11:52 aks-nonzone-17963928-vmss000000 kubelet[8516]: E0420 11:11:52.618206    8516 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/smb.csi.k8s.io^pvc-da830ab0-e8d2-4e0e-89f4-906a4e4398a4 podName:90ee5993-7f80-4fe3-9d88-7bc89b19cbfe nodeName:}" failed. No retries permitted until 2021-04-20 11:11:56.61815874 +0000 UTC m=+337.879942743 (durationBeforeRetry 4s). Error: "UnmountVolume.TearDown failed for volume \"persistent-storage\" (UniqueName: \"kubernetes.io/csi/smb.csi.k8s.io^pvc-da830ab0-e8d2-4e0e-89f4-906a4e4398a4\") pod \"90ee5993-7f80-4fe3-9d88-7bc89b19cbfe\" (UID: \"90ee5993-7f80-4fe3-9d88-7bc89b19cbfe\") : kubernetes.io/csi: mounter.TearDownAt failed to clean mount dir [/var/lib/kubelet/pods/90ee5993-7f80-4fe3-9d88-7bc89b19cbfe/volumes/kubernetes.io~csi/pvc-da830ab0-e8d2-4e0e-89f4-906a4e4398a4/mount]: stat /var/lib/kubelet/pods/90ee5993-7f80-4fe3-9d88-7bc89b19cbfe/volumes/kubernetes.io~csi/pvc-da830ab0-e8d2-4e0e-89f4-906a4e4398a4/mount: host is down"

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Release note:

fix: set "host is down" as corrupted mount

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 20, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from jingxu97 and msau42 April 20, 2021 11:48
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 20, 2021
@andyzhangx
Copy link
Member Author

/assign @jingxu97 @msau42

@andyzhangx
Copy link
Member Author

Another PR in k/k that could finally solves the SMB server down issue when terminating pods:

~/go/src/k8s.io/kubernetes# git diff
diff --git a/pkg/volume/csi/csi_mounter.go b/pkg/volume/csi/csi_mounter.go
index 4195ab38759..192ceec9677 100644
--- a/pkg/volume/csi/csi_mounter.go
+++ b/pkg/volume/csi/csi_mounter.go
@@ -364,6 +364,10 @@ func (c *csiMountMgr) TearDownAt(dir string) error {

        // clean mount point dir
        if err := removeMountDir(c.plugin, dir); err != nil {
+               if isCorruptedDir(dir) {
+                       klog.Warningf(log("dir[%s] is corrupted, error: %v, skip mount dir removal", dir, err))
+                       return nil
+               }
                return errors.New(log("mounter.TearDownAt failed to clean mount dir [%s]: %v", dir, err))
        }
        klog.V(4).Infof(log("mounter.TearDownAt successfully unmounted dir [%s]", dir))
diff --git a/vendor/k8s.io/utils/mount/mount_helper_unix.go b/vendor/k8s.io/utils/mount/mount_helper_unix.go
index 11b70ebc298..2c14a27c719 100644
--- a/vendor/k8s.io/utils/mount/mount_helper_unix.go
+++ b/vendor/k8s.io/utils/mount/mount_helper_unix.go
@@ -52,7 +52,7 @@ func IsCorruptedMnt(err error) bool {
                underlyingError = pe.Err
        }

-       return underlyingError == syscall.ENOTCONN || underlyingError == syscall.ESTALE || underlyingError == syscall.EIO || underlyingError == syscall.EACCES
+       return underlyingError == syscall.ENOTCONN || underlyingError == syscall.ESTALE || underlyingError == syscall.EIO || underlyingError == syscall.EACCES || underlyingError == syscall.EHOSTDOWN
 }

 // MountInfo represents a single line in /proc/<pid>/mountinfo.

@msau42
Copy link
Member

msau42 commented Apr 20, 2021

We moved the mount library back into kubernetes/staging. This repo is no longer being used for mount utils.

@andyzhangx
Copy link
Member Author

thanks, this PR would fix the issue: kubernetes/kubernetes#101305

@andyzhangx andyzhangx closed this Apr 21, 2021
@andyzhangx andyzhangx reopened this Apr 22, 2021
@andyzhangx
Copy link
Member Author

@msau42 I found this repo is still used by SMB CSI driver: https://github.com/kubernetes-csi/csi-driver-smb/blob/2fd1672aa1c57f6a1243cba89ae6c4187b425135/go.mod#L28, shall we merge this PR?

@msau42
Copy link
Member

msau42 commented Apr 22, 2021

I think it's better if smb driver switches to the new repo: https://github.com/kubernetes/mount-utils

There's already a few PR differences between these two repos.

@andyzhangx
Copy link
Member Author

I think it's better if smb driver switches to the new repo: https://github.com/kubernetes/mount-utils

There's already a few PR differences between these two repos.

@msau42 so this repo is deprecated? I am quite confused about these two repos

@msau42
Copy link
Member

msau42 commented Apr 23, 2021

Yes this repo is deprecated. @brahmaroutu can you send a PR to deprecate/remove this?

@andyzhangx
Copy link
Member Author

close this PR, opened another PR: kubernetes/mount-utils#3

@andyzhangx
Copy link
Member Author

andyzhangx commented Apr 23, 2021

maybe we could merge this PR this time since some CSI drivers still depend on this?

@jingxu97
Copy link
Contributor

maybe we could merge this PR this time since some CSI drivers still depend on this?

i think most driver still uses kubernetes/mount-utils, worth to discuss how to migrate all of them?

@msau42
Copy link
Member

msau42 commented Apr 23, 2021

The interfaces should be mostly the same. It should just be changing the package name.

@andyzhangx andyzhangx closed this Jun 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants