Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Build nhwc ops by default #22648

Merged
merged 3 commits into from
Nov 6, 2024
Merged

Conversation

tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Oct 29, 2024

Description

  • Build cuda nhwc ops by default.
  • Deprecate --enable_cuda_nhwc_ops in build.py and add --disable_cuda_nhwc_ops option

Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops will be disabled automatically.

Motivation and Context

In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with Tensor Cores, and this could improve performance for vision models.

This is the first step to prefer NHWC for CUDA in 1.21 release. Next step is to do some tests on popular vision models. If it help in most models and devices, set prefer_nhwc=1 as default cuda provider option.

@tianleiwu tianleiwu requested a review from a team as a code owner October 29, 2024 22:15
@tianleiwu tianleiwu marked this pull request as draft October 29, 2024 22:15
@tianleiwu tianleiwu marked this pull request as ready for review October 31, 2024 02:39
@tianleiwu tianleiwu changed the title Build cuda nhwc ops by default [CUDA] Build nhwc ops by default Oct 31, 2024
@tianleiwu tianleiwu requested review from jywu-msft and snnn October 31, 2024 17:54
@jywu-msft jywu-msft requested a review from chilo-ms November 5, 2024 18:08
@tianleiwu tianleiwu merged commit 72186bb into main Nov 6, 2024
90 of 91 checks passed
@tianleiwu tianleiwu deleted the tlwu/build_cuda_nhwc_ops_by_default branch November 6, 2024 17:54
ishwar-raut1 pushed a commit to ishwar-raut1/onnxruntime that referenced this pull request Nov 19, 2024
### Description

* Build cuda nhwc ops by default.
* Deprecate `--enable_cuda_nhwc_ops` in build.py and add
`--disable_cuda_nhwc_ops` option

Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops
will be disabled automatically.

### Motivation and Context

In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with
Tensor Cores, and this could improve performance for vision models.

This is the first step to prefer NHWC for CUDA in 1.21 release. Next
step is to do some tests on popular vision models. If it help in most
models and devices, set `prefer_nhwc=1` as default cuda provider option.
guschmue pushed a commit that referenced this pull request Dec 2, 2024
### Description

* Build cuda nhwc ops by default.
* Deprecate `--enable_cuda_nhwc_ops` in build.py and add
`--disable_cuda_nhwc_ops` option

Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops
will be disabled automatically.

### Motivation and Context

In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with
Tensor Cores, and this could improve performance for vision models.

This is the first step to prefer NHWC for CUDA in 1.21 release. Next
step is to do some tests on popular vision models. If it help in most
models and devices, set `prefer_nhwc=1` as default cuda provider option.
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description

* Build cuda nhwc ops by default.
* Deprecate `--enable_cuda_nhwc_ops` in build.py and add
`--disable_cuda_nhwc_ops` option

Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops
will be disabled automatically.

### Motivation and Context

In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with
Tensor Cores, and this could improve performance for vision models.

This is the first step to prefer NHWC for CUDA in 1.21 release. Next
step is to do some tests on popular vision models. If it help in most
models and devices, set `prefer_nhwc=1` as default cuda provider option.
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description

* Build cuda nhwc ops by default.
* Deprecate `--enable_cuda_nhwc_ops` in build.py and add
`--disable_cuda_nhwc_ops` option

Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops
will be disabled automatically.

### Motivation and Context

In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with
Tensor Cores, and this could improve performance for vision models.

This is the first step to prefer NHWC for CUDA in 1.21 release. Next
step is to do some tests on popular vision models. If it help in most
models and devices, set `prefer_nhwc=1` as default cuda provider option.
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description

* Build cuda nhwc ops by default.
* Deprecate `--enable_cuda_nhwc_ops` in build.py and add
`--disable_cuda_nhwc_ops` option

Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops
will be disabled automatically.

### Motivation and Context

In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with
Tensor Cores, and this could improve performance for vision models.

This is the first step to prefer NHWC for CUDA in 1.21 release. Next
step is to do some tests on popular vision models. If it help in most
models and devices, set `prefer_nhwc=1` as default cuda provider option.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants