Skip to content

[BUG] Block device volume failed to unmount when it is detached unexpectedly #3778

Closed
@guangbochen

Description

Describe the bug

LH's block device volume failed to unmount when it is detached unexpectedly (e.g. during Kubernetes upgrade, Docker reboot, or network disconnect), and therefore the pod is stuck in the terminating state.

To Reproduce

Steps to reproduce the behavior:

  1. create a single pod deployment and mount a PV/PVC with the block device mode.
  2. trying to mimic the unexpected detached scenario, e.g kill the longhorn instance manager that the pod is currently running on.
  3. the pod will get terminated but will stuck on the Terminating state
virt-launcher-lh-test-7gww9                          0/1     Terminating   0          3h28m
  1. check the LH longhorn manager and the other LH pods, there is not obvious errors.
  2. check the kubelet logs(e.g., pod startTime: "2022-03-23T08:11:04Z", volume remountRequestedAt: "2022-03-23T09:37:10Z") and found the volume is busy and failed to unmount.
E0323 09:36:59.035738    2734 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-b5a67cc8-a751-4ee4-ae9b-0ba3
4885be77 podName:dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4 nodeName:}" failed. No retries permitted until 2022-03-23 09:37:03.03570295 +0000 UTC m=+534617.23041224
8 (durationBeforeRetry 4s). Error: UnmapVolume.UnmapBlockVolume failed for volume "disk-0" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-b5a67cc8-a75
1-4ee4-ae9b-0ba34885be77") pod "dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4" (UID: "dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4") : blkUtil.DetachFileDevice failed. globalU
nmapPath:/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev, podUID: dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4,
bindMount: true: failed to unmount linkPath /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-01d
d-4a7b-afeb-57c8c1aed9e4: unmount failed: exit status 32
Unmounting arguments: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-01dd-4a7b-afeb-57c8c1aed9
e4
Output: umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4: ta
rget is busy.
E0323 09:37:00.155861    2734 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"4dbd88a6-c14b-41b5-
a65b-233c4e0b8f87\" found, but error not a directory occurred when trying to remove the volumes dir" numErrs=2
E0323 09:37:02.157018    2734 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"4dbd88a6-c14b-41b5-
a65b-233c4e0b8f87\" found, but error not a directory occurred when trying to remove the volumes dir" numErrs=2
I0323 09:37:03.058180    2734 reconciler.go:194] "operationExecutor.UnmountVolume started for volume \"disk-0\" (UniqueName: \"kubernetes.io/csi/driver.longho
rn.io^pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77\") pod \"dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4\" (UID: \"dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4\") "
W0323 09:37:03.058472    2734 volume_path_handler_linux.go:61] couldn't find loopback device which takes file descriptor lock. Skip detaching device. device p
ath: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4"
W0323 09:37:03.058494    2734 volume_path_handler.go:217] Warning: Unmap skipped because symlink does not exist on the path: /var/lib/kubelet/pods/dd9cedeb-01
dd-4a7b-afeb-57c8c1aed9e4/volumeDevices/kubernetes.io~csi/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77
E0323 09:37:03.074826    2734 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-b5a67cc8-a751-4ee4-ae9b-0ba3
4885be77 podName:dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4 nodeName:}" failed. No retries permitted until 2022-03-23 09:37:11.074796266 +0000 UTC m=+534625.2695055
51 (durationBeforeRetry 8s). Error: UnmapVolume.UnmapBlockVolume failed for volume "disk-0" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-b5a67cc8-a7
51-4ee4-ae9b-0ba34885be77") pod "dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4" (UID: "dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4") : blkUtil.DetachFileDevice failed. global
UnmapPath:/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev, podUID: dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4,
 bindMount: true: failed to unmount linkPath /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-01
dd-4a7b-afeb-57c8c1aed9e4: unmount failed: exit status 32
Unmounting arguments: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-01dd-4a7b-afeb-57c8c1aed9
e4
Output: umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4: ta
rget is busy.
E0323 09:37:04.159256    2734 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"4dbd88a6-c14b-41b5-
a65b-233c4e0b8f87\" found, but error not a directory occurred when trying to remove the volumes dir" numErrs=2
I0323 09:37:05.291550    2734 scope.go:110] "RemoveContainer" containerID="355e61c379199d6e41e7cb5361edaf6a683e39f299b4aa5536b9131a13788c9f"
E0323 09:37:06.160342    2734 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"4dbd88a6-c14b-41b5-
a65b-233c4e0b8f87\" found, but error not a directory occurred when trying to remove the volumes dir" numErrs=2
E0323 09:37:08.154310    2734 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"4dbd88a6-c14b-41b5-
a65b-233c4e0b8f87\" found, but error not a directory occurred when trying to remove the volumes dir" numErrs=2
E0323 09:37:10.160535    2734 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"4dbd88a6-c14b-41b5-
a65b-233c4e0b8f87\" found, but error not a directory occurred when trying to remove the volumes dir" numErrs=2
I0323 09:37:11.150165    2734 reconciler.go:194] "operationExecutor.UnmountVolume started for volume \"disk-0\" (UniqueName: \"kubernetes.io/csi/driver.longho
rn.io^pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77\") pod \"dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4\" (UID: \"dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4\") "
W0323 09:37:11.150620    2734 volume_path_handler_linux.go:61] couldn't find loopback device which takes file descriptor lock. Skip detaching device. device p
ath: "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4"
W0323 09:37:11.150651    2734 volume_path_handler.go:217] Warning: Unmap skipped because symlink does not exist on the path: /var/lib/kubelet/pods/dd9cedeb-01
dd-4a7b-afeb-57c8c1aed9e4/volumeDevices/kubernetes.io~csi/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77
E0323 09:37:11.167357    2734 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-b5a67cc8-a751-4ee4-ae9b-0ba3
4885be77 podName:dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4 nodeName:}" failed. No retries permitted until 2022-03-23 09:37:27.167332992 +0000 UTC m=+534641.3620422
75 (durationBeforeRetry 16s). Error: UnmapVolume.UnmapBlockVolume failed for volume "disk-0" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-b5a67cc8-a
751-4ee4-ae9b-0ba34885be77") pod "dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4" (UID: "dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4") : blkUtil.DetachFileDevice failed. globa
lUnmapPath:/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev, podUID: dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4
, bindMount: true: failed to unmount linkPath /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-0
1dd-4a7b-afeb-57c8c1aed9e4: unmount failed: exit status 32
Unmounting arguments: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-01dd-4a7b-afeb-57c8c1aed9
e4
Output: umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/dd9cedeb-01dd-4a7b-afeb-57c8c1aed9e4: ta
rget is busy.
E0323 09:37:12.157239    2734 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"4dbd88a6-c14b-41b5-
a65b-233c4e0b8f87\" found, but error not a directory occurred when trying to remove the volumes dir" numErrs=2

Expected behavior

Should be able to unmount the volume successfully.

Log or Support bundle

support.zip

Environment

  • Longhorn version: v1.2.4
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.22.7+rke2r1
    • Number of management node in the cluster: 3
    • Number of worker node in the cluster: 1
  • Node config
    • OS type and version: Harvester v1.0.1

Additional context

umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b5a67cc8-a751-4ee4-ae9b-0ba34885be77/dev/ab369da5-c126-4600-a0b0-2311088296ea: target is busy.

manual umount with -l will work.

Metadata

Labels

area/csiCSI related like control/node driver, sidecarsarea/kubernetesKubernetes related like K8s version compatibilityarea/upstreamUpstream related like tgt upstream librarykind/bugpriority/0Must be implement or fixed in this release (managed by PO)require/auto-e2e-testRequire adding/updating auto e2e test cases if they can be automatedrequire/knowledge-baseRequire adding knowledge base document

Type

No type

Projects

  • Status

    Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions