Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RISC-V HAL implementation for meanStdDev #26624

Merged
merged 3 commits into from
Dec 18, 2024
Merged

Conversation

hanliutong
Copy link
Contributor

meanStdDev benefits from the Universal Intrinsic backend of RVV, but we also found that the performance on the 8UC4 type is worse than the scalar version when there is a mask, and there is no optimization implementation on 32FC1.

This patch implements meanStdDev function in RVV_HAL using native intrinsic, significantly optimizing the performance for 8UC1, 8UC4 and 32FC1.

This patch is tested on BPI-F3 for both gcc 14.2 and clang 19.1.

$ opencv_test_core --gtest_filter="*MeanStdDev*"
$ opencv_perf_core --gtest_filter="Size_MatType_meanStdDev*

1734077611879

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@kevinl03
Copy link

Interesting implementation! Have you considered how this performs on various datatypes?

@hanliutong
Copy link
Contributor Author

Hi @kevinl03 , so far we have only implemented SIMD optimizations for 8UC1, 8UC4 and 32FC1 types, and it is possible to implemente for other types using similar methods, as is done with Universal Intrinsic: https://github.com/opencv/opencv/blob/4.x/modules/core/src/mean.simd.hpp

However, we also note that in the performance test, only the above three types are included. We guess it's because they're the most commonly used type. And I'm not sure that implementing SIMD optimizations for other types would be as beneficial in practice.

Anyhow, we can add the performance test cases and implement RVV-specific optimizations for other types in meanStdDev if really need to. What do you think? @asmorkalov @mshabunin

@asmorkalov asmorkalov added this to the 4.11.0 milestone Dec 18, 2024
@mshabunin
Copy link
Contributor

Anyhow, we can add the performance test cases and implement RVV-specific optimizations for other types in meanStdDev if really need to

I believe we don't need these for now.

@asmorkalov asmorkalov merged commit 3fbaad3 into opencv:4.x Dec 18, 2024
21 of 31 checks passed
vrabaud pushed a commit to vrabaud/opencv that referenced this pull request Dec 18, 2024
Add RISC-V HAL implementation for meanStdDev opencv#26624

`meanStdDev` benefits from the Universal Intrinsic backend of RVV, but we also found that the performance on the `8UC4` type is worse than the scalar version when there is a mask, and there is no optimization implementation on `32FC1`.

This patch implements `meanStdDev` function in RVV_HAL using native intrinsic, significantly optimizing the performance for `8UC1`, `8UC4` and `32FC1`.

This patch is tested on BPI-F3 for both gcc 14.2 and clang 19.1.
```
$ opencv_test_core --gtest_filter="*MeanStdDev*"
$ opencv_perf_core --gtest_filter="Size_MatType_meanStdDev*
```

![1734077611879](https://github.com/user-attachments/assets/71c85c9d-1db1-470d-81d1-bf546e27ad86)

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants