Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[js/webgpu] Optimize transpose #21964

Merged
merged 2 commits into from
Sep 4, 2024
Merged

[js/webgpu] Optimize transpose #21964

merged 2 commits into from
Sep 4, 2024

Conversation

qjia7
Copy link
Contributor

@qjia7 qjia7 commented Sep 3, 2024

Description

Fix bugs in previous implementation and add more situations to go the optimized path.

Below situations will go to the optimized path.

  1. 2d inputs or squeezed 2d inputs
  2. channels last or channels first transpose. For example, channel last transpose: [1, 256, 512, 512] -> [1, 512, 512, 256]
    For this case, the transpose becomes [256, 512x512] -> [512x512, 256]

Motivation and Context

For SD Turbo demo, the total transpose time becomes 39.98ms from 122.09ms. And the correspnding percents becomes 3.89% from 11.05% in this demo.

This PR will also help #21618, the total transpose time in that demo becomes 17.32 ms from 70.25 ms on my iGPUs.

Fix bugs in previous implementation and add more situations to go to the
optimized path.

1. 2d inputs or squeezed 2d inputs
2. channel last or channel first transpose.
  For example, channel last transpose: [1, 256, 512, 512] -> [1, 512, 512, 256]
  For this case, the transpose becomes [256, 512x512] -> [512x512, 256]

For SD Turbo demo, the total transpose time becomes 39.98ms from
122.09ms. And the correspnding percents becomes 3.89% from 11.05% in this demo.
@qjia7
Copy link
Contributor Author

qjia7 commented Sep 3, 2024

@guschmue @fs-eire @satyajandhyala Please take a look, thanks.

@fs-eire
Copy link
Contributor

fs-eire commented Sep 3, 2024

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

@fs-eire
Copy link
Contributor

fs-eire commented Sep 3, 2024

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline

@fs-eire
Copy link
Contributor

fs-eire commented Sep 3, 2024

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

2 similar comments
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Sep 3, 2024
@satyajandhyala
Copy link
Contributor

Can we add testcases that exercise the newly added code if not already exists?

guschmue
guschmue previously approved these changes Sep 3, 2024
@qjia7
Copy link
Contributor Author

qjia7 commented Sep 4, 2024

Can we add testcases that exercise the newly added code if not already exists?

Done. Please take another look, thanks.

@fs-eire
Copy link
Contributor

fs-eire commented Sep 4, 2024

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

@fs-eire
Copy link
Contributor

fs-eire commented Sep 4, 2024

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline

@fs-eire
Copy link
Contributor

fs-eire commented Sep 4, 2024

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

2 similar comments
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@fs-eire fs-eire merged commit a80bfed into microsoft:main Sep 4, 2024
50 of 53 checks passed
@qjia7 qjia7 deleted the opt_transpose branch November 18, 2024 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants