-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8320998: RISC-V: C2 RoundDoubleModeV #21164
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back dzhang! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@DingliZhang The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
csrwi(CSR_FRM, C2_MacroAssembler::rdn); | ||
break; | ||
case RoundDoubleModeNode::rmode_rint: | ||
csrwi(CSR_FRM, C2_MacroAssembler::rne); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to set the CSR here as FRM
has been set to Round to Nearest mode when enter Java code. Check JDK-8330094 for more details. And if you set FRM
to some other rounding modes, you will need to restore it to Round to Nearest mode after processing. But the problem is that modifying CSR on RISC-V is very costly. Guess that's one of the reasons why the JMH result is not obvious.
Hi @RealFYang Thanks for review! I ran two sets of tests on the k1, the first for adding the judgment at the end of diff --git a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
index 402624a825e..50dbd5e3232 100644
--- a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
@@ -3116,7 +3116,6 @@ void C2_MacroAssembler::round_double_mode_v(VectorRegister dst, BasicType dst_bt
csrwi(CSR_FRM, C2_MacroAssembler::rdn);
break;
case RoundDoubleModeNode::rmode_rint:
- csrwi(CSR_FRM, C2_MacroAssembler::rne);
break;
default:
ShouldNotReachHere();
@@ -3145,4 +3144,8 @@ void C2_MacroAssembler::round_double_mode_v(VectorRegister dst, BasicType dst_bt
// If got conversion overflow return src
vmerge_vvm(dst, src, dst);
+
+ if (round_mode != RoundDoubleModeNode::rmode_rint) {
+ csrwi(CSR_FRM, C2_MacroAssembler::rne);
+ }
} Result:
The second set is to add predicate to enable intrinsic commands only when rounding mode is rint, e.g.: diff --git a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
index 402624a825e..94f0f8e49f3 100644
--- a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
+++ b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp
@@ -3108,19 +3108,7 @@ void C2_MacroAssembler::round_double_mode_v(VectorRegister dst, BasicType dst_bt
// Here we use similar modes to double->long and long->double conversions
// Different mode for long->double conversion matter only if long value was not representable as double,
// we got long value as a result of double->long conversion so, it is definitely representable
- switch (round_mode) {
- case RoundDoubleModeNode::rmode_ceil:
- csrwi(CSR_FRM, C2_MacroAssembler::rup);
- break;
- case RoundDoubleModeNode::rmode_floor:
- csrwi(CSR_FRM, C2_MacroAssembler::rdn);
- break;
- case RoundDoubleModeNode::rmode_rint:
- csrwi(CSR_FRM, C2_MacroAssembler::rne);
- break;
- default:
- ShouldNotReachHere();
- }
+ assert(round_mode == 0, "rounding mode should be rint");
// Conversion from double to long
vfcvt_x_f_v(dst, src);
diff --git a/src/hotspot/cpu/riscv/riscv_v.ad b/src/hotspot/cpu/riscv/riscv_v.ad
index 78acc55aabb..1da0f080fef 100644
--- a/src/hotspot/cpu/riscv/riscv_v.ad
+++ b/src/hotspot/cpu/riscv/riscv_v.ad
@@ -4746,7 +4746,8 @@ instruct vround_d(vReg dst, vReg src, fRegD tmp, vRegMask_V0 v0) %{
// ------------------------------ RoundDouble ----------------------------------
instruct vroundD(vReg dst, vReg src, immI rmode, vRegMask_V0 v0, vReg tmp1, vReg tmp2) %{
- predicate(n->bottom_type()->is_vect()->element_basic_type() == T_DOUBLE);
+ predicate(n->bottom_type()->is_vect()->element_basic_type() == T_DOUBLE &&
+ n->in(2)->get_int() == 0);
match(Set dst (RoundDoubleModeV src rmode));
ins_cost(VEC_COST);
effect(TEMP_DEF dst, TEMP tmp1, TEMP tmp2, TEMP v0); Result:
We can see that both methods cause performance degradation when rounding mode is ceil and floor. So I'm turning this PR into a draft for further research, perhaps a 512bit development board is needed to verify if this method of adding this intrinsic instruction is feasible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, just a general question about the test, in #17745, some tests are added, because the existing tests do not help to catch all the failures. Seems this pr only uses existing tests to verify, not sure if it's sufficient, it's good to have an answer for it.
Thanks for the review! |
Hi all,
This patch will add RoundDoubleModeV intrinsics for riscv64. The vector implementation is similar to the scalar version. Please take a look and have some reviews. Thanks a lot!
Just like #17745, current test shows that, it bring performance gain when vlenb >= 32 (which is on k1), but bring regression when vlenb == 16 (which is on k230). So I only enable the intrinsic when vlenb >= 32.
Please compare the data below, thanks!
Test
Test on k1
test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java
test/hotspot/jtreg/compiler/floatingpoint/TestRound.java
test/jdk/java/lang/Math/RoundTests.java
test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java
Test on qemu(enable RVV1.0)
test/jdk/jdk/incubator/vector/*
Performance - with Intrinsic
on k1
Benchmark on k1 (+intrinsic)
Benchmark on k1 (-intrinsic)
on k230
Benchmark on k230 (+intrinsic, enable intrinsic even when vlenb == 16)
Benchmark on k230 (-intrinsic, enable intrinsic even when vlenb == 16)
Performance - without Intrinsic
on k1, intrinsic disabled due to -UseSuperWord
Benchmark on k1, -UseSuperWord (+intrinsic)
Benchmark on k1, -UseSuperWord (-intrinsic)
on k230, intrinsic disabled due to -UseSuperWord
Benchmark on k230, -UseSuperWord (+intrinsic)
on k230, intrinsic disabled due to vlenb == 16
Benchmark on k230, +UseSuperWord (+intrinsic)
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21164/head:pull/21164
$ git checkout pull/21164
Update a local copy of the PR:
$ git checkout pull/21164
$ git pull https://git.openjdk.org/jdk.git pull/21164/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 21164
View PR using the GUI difftool:
$ git pr show -t 21164
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21164.diff
Webrev
Link to Webrev Comment