-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BACKPORT][v1.5.5][IMPROVEMENT] Improve logging in CSI plugin when mount fails. #8286
Comments
Created this backport issue manually, since part of the previous work on the environment script was already backported. |
Pre Ready-For-Testing Checklist
|
Hi @james-munson , I can reproduce the RWX pod mount fail on Ubuntu > uname -r
5.15.0-94-generic
>
> k get volume -A
NAMESPACE NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
longhorn-system pvc-d5ca45f2-7d92-429b-b7bd-14b084c2d48b attached healthy 1073741824 cha 117s
>
> k -n longhorn-system get volume pvc-d5ca45f2-7d92-429b-b7bd-14b084c2d48b -o yaml | grep accessMode
accessMode: rwx
>
> k get pods
NAME READY STATUS RESTARTS AGE
longhorn-nfs-installation-t7pfk 1/1 Running 0 4m47s
longhorn-iscsi-installation-m7jb4 1/1 Running 0 4m47s
test-deployment-754dd9fc66-68qsp 0/1 ContainerCreating 0 2m20s
>
> k describe pod test-deployment-754dd9fc66-68qsp | grep Event -A 20
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m37s default-scheduler 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Warning FailedScheduling 2m35s default-scheduler 0/1 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Normal Scheduled 2m33s default-scheduler Successfully assigned default/test-deployment-754dd9fc66-68qsp to cha
Normal SuccessfulAttachVolume 2m17s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-d5ca45f2-7d92-429b-b7bd-14b084c2d48b"
Warning FailedMount 8s (x9 over 2m17s) kubelet MountVolume.MountDevice failed for volume "pvc-d5ca45f2-7d92-429b-b7bd-14b084c2d48b" : rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: /usr/local/sbin/nsmounter
Mounting arguments: mount -t nfs -o vers=4.1,noresvport,timeo=600,retrans=5,softerr 10.43.132.6:/pvc-d5ca45f2-7d92-429b-b7bd-14b084c2d48b /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/6381505fa73cc1887c25ca8b3079326a4843af3ac8af2f500e818253fefc3afb/globalmount
Output: mount.nfs: Protocol not supported supportbundle_5e584417-1cc9-44ad-8d2f-6c8c3661f36f_2024-04-02T02-19-38Z.zip In addition. use kernel |
I think I see the problem. Testing the fixup. |
This is strange. I'm having a hard time inducing the failure. I installed my test build of longhorn-manager for both daemonsets (longhorn-manager and longhorn-csi-plugin), changed the kernel on one of my The pod events:
On the node itself,
And in the pod, everything is happy
I'm not sure what's going on. |
Repeated the test with |
@james-munson Is it reproducible if you use Longhorn 1.6.0 instead? |
Actually, that was with 1.6.0 for everything but my custom longhorn-manager and csi-plugin. Perhaps I'll give it a try with a 1.5.x release. Or compare test procedure with @chriscchien. |
So, since I can't repro the kernel-based
In particular,
which shows what we wanted to capture in the logs. |
With longhorn/longhorn-manager#2724 committed, this should be testable again. |
Verified pass on longhorn v1.5.x (longhorn-manager Deploy Longhorn v1.5.x on Ubuntu
|
backport #7931
Specifically, the improvement to CSI host namespace and CSI logging of environment when mount fails.
The text was updated successfully, but these errors were encountered: