Skip to content

[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions #20544

Closed
@hashJoe

Description

@hashJoe

Describe the issue

I am sharing below my OrtCUDAProviderOptions, which I use to set the gpu device to use for computation on a server with multiple GPUs.

When setting the deviceId, I encounter buggy memory allocations.

  1. For example, setting the following OrtCUDAProviderOptions:
        OrtCUDAProviderOptions cudaOptions =new OrtCUDAProviderOptions(6);
        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");
        options.addCUDA(cudaOptions);

results in:

GPU0: 1545MiB / 24576MiB
GPU6: 3MiB / 24576MiB

It discards deviceId being set to 6 and takes 0.

  1. Where as commenting out cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); to become:
        OrtCUDAProviderOptions cudaOptions =new OrtCUDAProviderOptions(6);
//        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");
        options.addCUDA(cudaOptions);

results in:

GPU0: 545MiB / 24576MiB (0% Util)
GPU6: 1545MiB / 24576MiB

where it selected the correct gpu, but resulted in 545MiB being allocated on device 0 without utilizing the device.

  1. Finally, keeping cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); but adding cudaOptions.add("device_id", String.valueOf(6)); for selecting the device instead of directly specifying it in the constructor as in:
        OrtCUDAProviderOptions cudaOptions = new OrtCUDAProviderOptions();
        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");      // this has no effect anymore! The flag is not considered
        cudaOptions.add("device_id", String.valueOf(6));
        options.addCUDA(cudaOptions);

results in:

GPU0: 545MiB / 24576MiB (0% Util)
GPU6: 1545MiB / 24576MiB

which is the same as example 2.

To confirm whether cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); is being executed or ignored in example 3, I did some experiments and turned out it is not considered anymore and is neglected/shadowed out by cudaOptions.add("device_id", String.valueOf(6)); added afterwards.

There are two problems here:

  1. Using cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); results in selecting the wrong device, in this case device 0 all the time.
  2. There is an initial memory being allocated on device 0, even though the correct deviceId has been utilized for the computation.

As a workaround, I am exporting only one visible cuda device to avoid this problem.

To reproduce

Unfortunately model cannot be provided, but can write a toy example+model and supply if needed.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04.4 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

v1.17.3

ONNX Runtime API

Java

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

cuda: 11.2, cudnn: 8.1.1

Model File

No response

Is this a quantized model?

No

Activity

added
api:Javaissues related to the Java API
ep:CUDAissues related to the CUDA execution provider
on May 2, 2024
changed the title [Bug] OrtCUDAProviderOptions allocates memory on different devices [Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions on May 2, 2024
Craigacp

Craigacp commented on May 2, 2024

@Craigacp
Contributor

Ok, I think I understand what's going on there. I had expected the C API's UpdateCUDAProviderOptions function to be something I could use to append options to a CUDA options struct, but it looks like what it actually does is delete the old one and only set the options specified in the update call (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/cuda_provider_factory.cc#L236). The Java code calls UpdateCUDAProviderOptions each time the add method is called, so the new option overwrites all the old ones. That's pretty annoying, but I can fix it on the Java side. I'll put a fix together next week.

hashJoe

hashJoe commented on May 2, 2024

@hashJoe
Author

Thanks for the clarification!
Regardingthe 2nd problem, does the call to UpdateCUDAProviderOptions allocate some memory on device 0 (maybe for initialization) before setting main device to 6 and proceed with main computation? And is it possible to pass multiple options at once to avoid calling the add method several times?

Craigacp

Craigacp commented on May 2, 2024

@Craigacp
Contributor

It is possible in the native code to pass multiple options at once, but not how I've written the Java binding to that native code. The Java object tracks all the options that are set, so I need to modify the SessionOptions.addCUDA call to call a new method on OrtCUDAProviderOptions which calls update once with the aggregated parameters before it's passed in.

WRT the memory allocation on GPU zero, that might be an artifact of how CUDA & ORT works, I think the primary GPU tends to end up with some driver & code related stuff in general, but someone with more CUDA expertise might be able to help there.

hashJoe

hashJoe commented on May 3, 2024

@hashJoe
Author

I did additional tests regarding the memory allocation problem on gpu 0, where I simply inserted the following code using cuda api before calling anything that is ORT related:

    int status = cudart.cudaSetDevice(6);      // set cuda device
    checkCuda(cudart.CUDA_SUCCESS, status, "cudaSetDevice");      // check for exception

and ORT code still allocates on deviceId=6:

OrtCUDAProviderOptions cudaOptions = new OrtCUDAProviderOptions();
cudaOptions.add("device_id", String.valueOf(6));
options.addCUDA(cudaOptions);

and the problem above is solved, where nothing is allocated on gpu 0 anymore. However, the same memory size is still allocated on gpu 6 --> 1545MiB. I would have expected that the memory now should be aggregated with that which has been allocated on gpu 0 to become maybe smthn like ~2000MiB if it is to be some cuda driver related stuff.

In other words, I assume there is also a bug where ORT starts allocating on gpu 0 for main computation before detecting the deviceId set by the user?

hashJoe

hashJoe commented on May 3, 2024

@hashJoe
Author

Additionally, without the cuda code:

    int status = cudart.cudaSetDevice(6);      // set cuda device
    checkCuda(cudart.CUDA_SUCCESS, status, "cudaSetDevice");      // check for exception

When gpu 0 is free, 545MiB is being allocated on it, however, when it is allocated by another process and not much space is left, a portion of that memory is allocated instead ~100MiB

added a commit that references this issue on May 5, 2024
a366920
added a commit that references this issue on May 7, 2024
155fe02
added a commit that references this issue on May 9, 2024
9a827d7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    api:Javaissues related to the Java APIep:CUDAissues related to the CUDA execution provider

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      [Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions · Issue #20544 · microsoft/onnxruntime