[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions

### Describe the issue

I am sharing below my `OrtCUDAProviderOptions`, which I use to set the gpu device to use for computation on a server with multiple GPUs. 

When setting the `deviceId`, I encounter buggy memory allocations.

1. For example, setting the following `OrtCUDAProviderOptions`:

```java
        OrtCUDAProviderOptions cudaOptions =new OrtCUDAProviderOptions(6);
        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");
        options.addCUDA(cudaOptions);
```

results in:
```
GPU0: 1545MiB / 24576MiB
GPU6: 3MiB / 24576MiB
```
It discards `deviceId` being set to `6` and takes `0`.

2. Where as commenting out `cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");` to become:

```java
        OrtCUDAProviderOptions cudaOptions =new OrtCUDAProviderOptions(6);
//        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");
        options.addCUDA(cudaOptions);
```
results in:
```
GPU0: 545MiB / 24576MiB (0% Util)
GPU6: 1545MiB / 24576MiB
```
where it selected the correct gpu, but resulted in `545MiB` being allocated on device `0` without utilizing the device.

3. Finally, keeping `cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");` but adding `cudaOptions.add("device_id", String.valueOf(6));` for selecting the device instead of directly specifying it in the constructor as in:

```java
        OrtCUDAProviderOptions cudaOptions = new OrtCUDAProviderOptions();
        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");      // this has no effect anymore! The flag is not considered
        cudaOptions.add("device_id", String.valueOf(6));
        options.addCUDA(cudaOptions);
```
results in:
```
GPU0: 545MiB / 24576MiB (0% Util)
GPU6: 1545MiB / 24576MiB
```
which is the same as example 2. 

To confirm whether ` cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");` is being executed or ignored in example 3, I did some experiments and turned out it is not considered anymore and is neglected/shadowed out by `cudaOptions.add("device_id", String.valueOf(6));` added afterwards. 

There are two problems here:

1. Using `cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");  ` results in selecting the wrong device, in this case device `0` all the time.
2. There is an initial memory being allocated on device `0`, even though the correct `deviceId` has been utilized for the computation. 

As a workaround, I am exporting only one visible cuda device to avoid this problem.

### To reproduce

Unfortunately model cannot be provided, but can write a toy example+model and supply if needed. 

### Urgency

_No response_

### Platform

Linux

### OS Version

Ubuntu 22.04.4 LTS

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

v1.17.3

### ONNX Runtime API

Java

### Architecture

X64

### Execution Provider

CUDA

### Execution Provider Library Version

cuda: 11.2, cudnn: 8.1.1

### Model File

_No response_

### Is this a quantized model?

No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions #20544

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions #20544

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Activity

Craigacp commented on May 2, 2024

hashJoe commented on May 2, 2024

Craigacp commented on May 2, 2024

hashJoe commented on May 3, 2024

hashJoe commented on May 3, 2024

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions