-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpustat reports only the first program on nv driver 535 #161
Comments
I see, thanks for the report. I will make an update to support the new nvidia driver. Probably the same issue as #157. |
This bug is due to the breaking changes in NVIDIA Driver R535.xx series (affected versions are TL;DR)
NVIDIA Driver Changes:
Cross-ref: XuehaiPan/nvitop#88 (comment) |
Hi, I'll be using nvidia-ml-py 12.535.77 for now. Many thanks for the help. |
NVIDIA 535.43, 535.86 can display process information correctly only with nvidia-ml-py==12.535.77. Display an warning message when an incompatible combination is detected. See #161 for more details.
NVIDIA 535.43, 535.86 can display process information correctly only with nvidia-ml-py==12.535.77. Display an warning message when an incompatible combination is detected. See #161 for more details.
We won't be adding monkey-patching because it is extremely complex to manage all the combinations. The buggy versions of nvidia drivers (535.43 and 535.86) and nvidia-ml-py 12.535.77 should be avoided, but there is a working workaround. I've added a warning message shown when such incompatible versions of driver/pynvml are found. |
nvidia-ml-py==12.535.77 is a buggy version that breaks the struct for process information, and should not be used (unless NVIDIA driver is *also* buggy, 535.43, 535.54, and 535.86). The latest version nvidia-ml-py==12.535.108 fixes the problem and is still compatible with our supported drivers (R450+). To ensure users who will install gpustat 1.2.0 have a correct version of nvidia-ml-py version installed, we bump up the requirement. See #160 and #161 for more details.
TL;DR
NVIDIA Driver >= 535.43, < 535.98 are broken. Avoid buggy driver and upgrade your NVIDIA driver software to higher (or lower) versions.
Original bug report
Hi, we recently updated our driver version to 535.54.03 and cuda 12.2, then gpustat would only give the first program info even if we are running multiple programs on the same gpu.
Screenshots or Program Output
Please provide the output of
gpustat --debug
andnvidia-smi
. Or attach screenshots if applicable.Environment information:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: