-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vhost: when using qemu and vhost, an IO error may occur when the vhost is restarted. #3504
Comments
@ygtzf do you use the vhost recovery feature for your case? restart a new vhost process while the QEMU is running? |
@changpe1 Yes, the live recovery feature is in our vhost version and we use it as well. QEMU is running, and we use systemctl to restart the vhost service. |
@changpe1 Could you please provide some guidance or methods to investigate the issue? We've encountered this problem multiple times in our production environment.Thank you very much. |
You can run fio tests in your VM and restart vhost process to reproduce this issue, and check your log such as "VHOST_CONFIG: (/tmp/nvm-15x3hhwnspnm13-disk0.sock) vring base idx:8 last_used_idx:51594 last_avail_idx:51594." to see if any vring has different values for 'last_used_idx' and 'last_avail_idx', these values mean there are inflight IOs in the shared memory, we need to know this IO error request is from inflight IOs or just garbage values. |
@changpe1 I've been running tests many days, but it's difficult to reproduce this issue. |
I know it's difficult to reproduce such an issue in production environment, from my experience on this issue, most likely there were outstanding IOs when doing hot upgrade, when stopping vhost service, vhost should first stop fetching new requests from VRINGs and then waiting for all outstanding IOs sent to backend to be completed, you can watch the normal cases to see if your vhost restart process follow above steps. I would not suggest to use |
Thank you for the reply. We have added additional logging to the shutdown and start-up paths, and we are also comparing the hot upgrade process you mentioned to see if we can identify any potential issues. |
Sighting report
We use vhost to provide vhost-user-blk backend server for qemu. Recently we encountered some problems: when restarting the vhost service, the VM corresponding to qemu had an IO error and the file system became readonly mode.
We found some error logs in the vhost log. These memory addresses were too large, and there were obviously some incorrect accesses.
rte_vhost_user.c: 653:vhost_vring_desc_payload_to_iov: ERROR: gpa_to_vva(0xe1ec329e50c1e048) == NULL
rte_vhost_user.c: 653:vhost_vring_desc_payload_to_iov: ERROR: gpa_to_vva((nil)) == NULL
I see that this memory address is taken from vring. Can it be understood that the guest or qemu mapped the wrong address?
Expected Behavior
Current Behavior
Possible Solution
Steps to Reproduce
So far, we have not found any way to reproduce this problem, but there have been several cases in production.
Context (Environment including OS version, SPDK version, etc.)
Qemu version: v8.2.0
SPDK version: v24.01
The text was updated successfully, but these errors were encountered: