-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] DataEngineV2 Unable to attach a PV to a pod in the newer kernel #7190
Comments
@xinydev |
Yes, I have already inserted it. if kernel module not be inserted, there will be a error log like:
|
Thanks @xinydev. |
Would the module version be an issue? The instance manager pod is using nvme-cli v2.5 while the host is using v1.9.
|
|
Tried to reproduce the issue and found the error when executing
dmesg shows
|
A new check of hostid and hostnqn is introduced since linux kernel v6.5. |
Probably in the |
Yes, this is a solution, but a random |
Don´t know, but from the comment in the code it should be
|
Pre Ready-For-Testing Checklist
longhorn/go-spdk-helper#60
|
Verified passed on master-head (longhorn-instance-manager c61da18). Created an ubuntu-22.04 cluster and upgraded kernel version to
Then followed https://longhorn.io/docs/1.5.3/spdk/quick-start/ to setup v2 engine environment and create a v2 volume with a pod. Everything works without problem. |
This issue can be triggered if the 'extras' kernel module package is not up-to-date. In my recent case, if somehow the nodes provided by the cloud vendor are rebooted, their OS kernel version may be updated automatically. Then the error mentioned by this ticket will be triggered.
|
Describe the bug (🐛 if you encounter this issue)
When using dataengine v2, the pod is always in the ContainerCreating state, and there will be a log of nvme discover execution failure in the instance-manager. This problem only occurs in 6.5.0-060500rc6-generic, and everything is normal when rolled back to 5.15.0-88-generic.
logs in below additional context.
It seems like there might be a compatibility issue between NVMe userspace tool and NVMe driver, but i am not sure from which version this issue started to appear.
To Reproduce
I found this problem when I tried following the quickstart with ubuntu 20.04(6.5.0-060500rc6-generic) and after rolling back to the kernel 5.15.0-88-generic, the problem was gone.
Expected behavior
pod running
Support bundle for troubleshooting
Environment
Longhorn version: v1.5.3
Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
helm install longhorn . -n longhorn --create-namespace --set="defaultSettings.defaultReplicaCount=1,defaultSettings.v2DataEngine=true,longhornUI.replicas=1,persistence.defaultClassReplicaCount=1,csi.attacherReplicaCount=1,csi.provisionerReplicaCount=1,csi.resizerReplicaCount=1,csi.snapshotterReplicaCount=1"
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
kubeadm, flannel, single node
Node config
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): KVM
Number of Longhorn volumes in the cluster: 1
Impacted Longhorn resources:
Additional context
instance-manager log
k describe pod volume-test
in instance-manager pod
in host
The text was updated successfully, but these errors were encountered: