-
-
Notifications
You must be signed in to change notification settings - Fork 55.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RISC-V HAL implementation for meanStdDev #26624
Conversation
Interesting implementation! Have you considered how this performs on various datatypes? |
Hi @kevinl03 , so far we have only implemented SIMD optimizations for 8UC1, 8UC4 and 32FC1 types, and it is possible to implemente for other types using similar methods, as is done with Universal Intrinsic: https://github.com/opencv/opencv/blob/4.x/modules/core/src/mean.simd.hpp However, we also note that in the performance test, only the above three types are included. We guess it's because they're the most commonly used type. And I'm not sure that implementing SIMD optimizations for other types would be as beneficial in practice. Anyhow, we can add the performance test cases and implement RVV-specific optimizations for other types in |
I believe we don't need these for now. |
Add RISC-V HAL implementation for meanStdDev opencv#26624 `meanStdDev` benefits from the Universal Intrinsic backend of RVV, but we also found that the performance on the `8UC4` type is worse than the scalar version when there is a mask, and there is no optimization implementation on `32FC1`. This patch implements `meanStdDev` function in RVV_HAL using native intrinsic, significantly optimizing the performance for `8UC1`, `8UC4` and `32FC1`. This patch is tested on BPI-F3 for both gcc 14.2 and clang 19.1. ``` $ opencv_test_core --gtest_filter="*MeanStdDev*" $ opencv_perf_core --gtest_filter="Size_MatType_meanStdDev* ``` ![1734077611879](https://github.com/user-attachments/assets/71c85c9d-1db1-470d-81d1-bf546e27ad86) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
meanStdDev
benefits from the Universal Intrinsic backend of RVV, but we also found that the performance on the8UC4
type is worse than the scalar version when there is a mask, and there is no optimization implementation on32FC1
.This patch implements
meanStdDev
function in RVV_HAL using native intrinsic, significantly optimizing the performance for8UC1
,8UC4
and32FC1
.This patch is tested on BPI-F3 for both gcc 14.2 and clang 19.1.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.