-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: set "host is down" as corrupted mount #101398
Conversation
/kind bug |
/retest |
/test pull-kubernetes-e2e-aks-engine-azure-file |
/retest |
1 similar comment
/retest |
@andyzhangx The follow up PR will be in the CSI driver repo right? So as on |
@gnufied I found this should be the only fix in CSI driver repo since SMB CSI driver also uses this |
/lgtm |
@@ -52,7 +52,7 @@ func IsCorruptedMnt(err error) bool { | |||
underlyingError = pe.Err | |||
} | |||
|
|||
return underlyingError == syscall.ENOTCONN || underlyingError == syscall.ESTALE || underlyingError == syscall.EIO || underlyingError == syscall.EACCES | |||
return underlyingError == syscall.ENOTCONN || underlyingError == syscall.ESTALE || underlyingError == syscall.EIO || underlyingError == syscall.EACCES || underlyingError == syscall.EHOSTDOWN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder though - if we should separate corrupted mounts from socket/host not connected errors. But may be we could consider them same. cc @chakri-nelluri
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the host is down does that actually mean the mount is gone and cleaned up on the client side? Wondering if this will cause stale/leaking mounts and we need to do something like a force unmount instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the way how this host is down
mount point is unmounted in CSI driver, SMB CSI driver invoke CleanupMountPoint
func to unmount mount point:
- without this PR,
PathExists
returnfalse, err
directly, and thenCleanupMountPoint
returnsWarning: Unmount skipped because path does not exist
(actually do nothing) - with this PR,
PathExists
returntrue, err
, and finally invokedoCleanupMountPoint
to unmount, that will fix this issue.
So by this PR, CSI driver should unmount corrupted mount point if using CleanupMountPoint
func. @jingxu97
kubernetes/staging/src/k8s.io/mount-utils/mount_helper_common.go
Lines 137 to 147 in ea07644
func PathExists(path string) (bool, error) { | |
_, err := os.Stat(path) | |
if err == nil { | |
return true, nil | |
} else if os.IsNotExist(err) { | |
return false, nil | |
} else if IsCorruptedMnt(err) { | |
return true, err | |
} | |
return false, err | |
} |
kubernetes/staging/src/k8s.io/mount-utils/mount_helper_common.go
Lines 31 to 41 in ea07644
func CleanupMountPoint(mountPath string, mounter Interface, extensiveMountPointCheck bool) error { | |
pathExists, pathErr := PathExists(mountPath) | |
if !pathExists && pathErr == nil { | |
klog.Warningf("Warning: Unmount skipped because path does not exist: %v", mountPath) | |
return nil | |
} | |
corruptedMnt := IsCorruptedMnt(pathErr) | |
if pathErr != nil && !corruptedMnt { | |
return fmt.Errorf("Error checking path: %v", pathErr) | |
} | |
return doCleanupMountPoint(mountPath, mounter, extensiveMountPointCheck, corruptedMnt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about Windows case? Is windows version of function can already handle this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jingxu97 good point, I added windows fix.
While this specific SMB unmount issue actually does not exists on Windows since CleanupMountPoint
on Windows just removes directly. Anyway, should also fix IsCorruptedMnt
issue on Windows.
func CleanupMountPoint(m *mount.SafeFormatAndMount, target string, extensiveMountCheck bool) error {
proxy, ok := m.Interface.(*mounter.CSIProxyMounter)
if !ok {
return fmt.Errorf("could not cast to csi proxy class")
}
return proxy.Rmdir(target)
}
add HOSTDOWN code for Windows
/retest |
/lgtm /assign @jingxu97 |
/retest |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx, jingxu97 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
…101398-upstream-release-1.21 Automated cherry pick of #101398: fix: set "host is down" as corrupted mount
…101398-upstream-release-1.20 Automated cherry pick of #101398: fix: set "host is down" as corrupted mount
this PR is cherry-picked to 1.20.7, 1.21.1, 1.22.0 releases |
What type of PR is this?
/kind bug
What this PR does / why we need it:
fix: set "host is down" as corrupted mount
When SMB server is down, there is no way to terminate pod which is using SMB mount, would get following error. This PR regard
host is down
as corrupted mount dir.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: