Cranelift: Constant propagate floats #8954

primoly · 2024-07-13T20:16:36Z

Adds constant propagation of the following instructions for $F32 and $F64:
fadd, fsub, fmul, fdiv, sqrt, ceil, floor, trunc, nearest.

fmin and fmax are still missing. Those are tricky due to their handling and propagation of NaNs: cranelift_codegen::ir::InstBuilder::fmin

github-actions · 2024-07-13T22:44:39Z

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle
fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

fitzgen

Thanks! Looks good to me, modulo one clarification below.

fitzgen · 2024-07-15T19:28:33Z

cranelift/codegen/src/opts/cprop.isle

@@ -280,7 +278,53 @@
 (decl pure u64_bswap64 (u64) u64)
 (extern constructor u64_bswap64 u64_bswap64)

-;; Constant fold bitwise float operations (fneg/fabs/fcopysign)
+;; Constant fold float operations
+;; TODO: fmin, fmax, fcmp, fma, demote, promote, from and to ops


fma, and any other relaxed SIMD opcode that is non-deterministic, is a little bit tricky. We would want to double check the required Wasm semantics, and whether the non-determinism is allowed to "change" throughout the program or not. If it is spec'd to be non-deterministic, but always has the same behavior for a given evaluation of the N non-deterministic choices, then const folding could be problematic if we are compiling on an aarch64 machine but running on an x64 machine, for example. It would essentially be observable to the program whether the constant folding was happening or not, and I'm not sure if that is allowed by the spec.

So we should answer this question, and then document the answer here.

If it is spec'd to be non-deterministic, but always has the same behavior for a given evaluation of the N non-deterministic choices, then const folding could be problematic if we are compiling on an aarch64 machine but running on an x64 machine, for example.

IMHO it would be problematic either way. It would break reproducibility of the compiled executable/.cwasm file across compiler host architectures.

FWIW, I tried to answer this question for myself and didn't come away with anything firm.

Filed WebAssembly/relaxed-simd#155 to ask for clarification.

I think this TODO comment can just add a note about the potential gotcha with non-deterministic instructions and link to that issue.

I think we may already be able to observe const propagation with this PR already. At least on RISC-V any NaN producing operation is guaranteed to produce the canonical NaN, but I think some other platforms preserve the payload bits.

At least on RISC-V any NaN producing operation is guaranteed to produce the canonical NaN, but I think some other platforms preserve the payload bits.

NaNs don't have any guarantee around having the same behavior across instructions, AFAIK. But the relaxed SIMD instructions might, or at least I'm not sure, which is why I filed that issue.

It would break reproducibility of the compiled executable/.cwasm file across compiler host architectures.

I guess this is a philosophical choice. It isn't clear to me whether we want to provide the weaker (deterministic cross compilation given the same host architecture) or the stronger (deterministic cross compilation regardless of host architecture) guarantee.

If the latter, then we can't do compile-time evaluation of any floating point operation that could involve NaNs.

I read the docs as fma in Cranelift being deterministic with regards to rounding. However I don’t know if this is actually honoured by codegen, since properly emulating fma’s rounding might be too expensive. If it turns out that fma is (either set or list) non-deterministic (ignoring propagation of NaNs), then the documentation in InstructionBuilder has to be updated.

If the latter, then we can't do compile-time evaluation of any floating point operation that could involve NaNs.

You could still attempt evaluation and bail out optimizing if the output turns out to be NaN.

Yeah that's true. We could make all these external constructors partial and have them return None if the operation is given or produces a NaN.

primoly · 2024-07-17T09:17:14Z

Reading the WebAssembly spec with regards to NaNs^[1][2] here are some remarks and questions.

When all the NaNs in the input are canonical (NaN_canon), then the output is canonical as well.^[1] So there is no non-determinism in this case.^[3] If at least one of the inputs is non-canonical, the output is non-deterministic, but must be an arithmetic NaN (which of course can be a NaN_canon).^[2] I don’t know if this is actually handled correctly by Cranelift, because always propagating NaNs is incorrect, since non-arithmetic NaNs must be turned into arithmetic ones. Unless it is guaranteed that Ieee32 and Ieee64 NaNs are always arithmetic, but I don’t think this is the case. (opened #8967)

In Cranelift the functions used by the constant propagation of float in this PR (as well as #8625) convert the Ieee32/Ieee64 to Rust f32/f64, perform the operation and then convert the result back to Ieee32/Ieee64 by reinterpreting their bits. So the question: What happens to different kinds of NaNs during both calculation and conversions? Do they propagate, are they canonicalised or is this non-deterministic?

To avoid const prop leading to different results than before @bjorn3 mentioned that well could just ignore const prop for all instructions involving NaNs. Actually you could still include operations that involve only NaN_canon, since there propagation and canonicalisation would be the same. Except there still remains the problem with the sign.^[3]

Personally, I would prefer to do the constant propagation in all cases (and just pick one spec compliant way of dealing with NaNs), since programs should never assume platform specific behaviour (list non-determinism) for instructions that are defined to be set non-deterministic. I except that constant NaNs (and especially non NaN_canon) will almost never occur in practice, so my reason is just the avoidance of to much special casing.

[1] https://webassembly.github.io/spec/core/exec/numerics.html#nan-propagation

[2] https://webassembly.github.io/spec/core/syntax/values.html#syntax-float

[3] Not quite: The sign is not part of the NaN_canon, so there are effectively two NaN_canon: +NaN_canon and −NaN_canon. The sign of the output is non-deterministic for all operations except fneg, fabs and fcopysign.

fitzgen · 2024-07-17T16:24:50Z

I would be more comfortable with aborting the compile-time evaluation if either given or producing a NaN. It is more conservative, and we can always have follow up discussions/PRs focused on just that issue. I also suspect there isn't too much additional performance to gain w.r.t NaNs.

FWIW, it seems like compile-time evaluating relaxed SIMD instructions like fma is probably a no-go: WebAssembly/relaxed-simd#155 (comment)

primoly · 2024-07-18T18:17:39Z

I’ve updated the PR to always check whether the result is NaN and don’t do constant folding in that case. I also changed neg abs and copysign to not convert to Rust f32/f64 but operate on the bits directly, since otherwise NaNs could have their payload and even sign changed. I also added fmin and fmax folding.

Regarding fma: I still believe that it is a deterministic instruction when not involving NaNs, but since this PR doesn’t include it, this is something we can discuss in the future.

fitzgen · 2024-07-18T18:24:17Z

Regarding fma: I still believe that it is a deterministic instruction when not involving NaNs, but since this PR doesn’t include it, this is something we can discuss in the future.

Yes, my b I was thinking of the relaxed_madd Wasm instruction.

fitzgen

Looks good modulo a rebase to address conflicts with main and a couple tiny nitpicks. Thanks!!

fitzgen · 2024-07-18T18:31:13Z

cranelift/codegen/src/isle_prelude.rs

+        }
+
+        fn f64_sqrt(&mut self, n: Ieee64) -> Option<Ieee64> {
+            Some(n.sqrt()).filter(|r| !r.is_nan())


I think it is worth worth defining helpers like

impl Ieee{32,64} { pub(crate) fn non_nan(self) -> Option<Self> { Some(self).filter(|f| !f.is_nan()) } }

to clean up some of this repetition.

fitzgen · 2024-07-18T19:10:43Z

cranelift/codegen/src/opts/cprop.isle

+;; Note: With the exception of fabs, fneg and copysign,
+;; constant folding is only performed when
+;; the result of an instruction isn't NaN.


Can you follow this with a sentence saying why that is? Something like

We want the NaN bit patterns produced by an instruction to be consistent, and compile-time evaluation in a cross-compilation scenario risks producing different NaN bit patterns than the target would have at run-time.

fitzgen

Perfect! Thanks again!

Those methods are stable since Rust version 1.77.0

const propagate fadd, fsub, fmul, fdiv

a8d038c

primoly requested a review from a team as a code owner July 13, 2024 20:16

primoly requested review from abrown and removed request for a team July 13, 2024 20:16

add sqrt, ceil, floor, trunc, nearest

c44661c

primoly changed the title ~~Constant propagate fadd, fsub, fmul and fdiv~~ Cranelift: Constant propagate floats Jul 13, 2024

github-actions bot added cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language labels Jul 13, 2024

todo

f4c16ca

fitzgen reviewed Jul 15, 2024

View reviewed changes

alexcrichton mentioned this pull request Jul 17, 2024

Does Cranelift honour the requirement of WebAssembly that NaN results of most float ops must be arithmetic? #8967

Closed

primoly added 2 commits July 18, 2024 00:49

bail if result is NaN

75f099f

add fmin, fmax

001c0ef

fitzgen reviewed Jul 18, 2024

View reviewed changes

primoly added 2 commits July 18, 2024 22:02

non_nan helper methods

5d87e65

explain why no const folding of NaNs

45d6b26

fitzgen approved these changes Jul 18, 2024

View reviewed changes

primoly added 2 commits July 18, 2024 22:33

Merge remote-tracking branch 'upstream/main' into cprop-float

da4bf97

use f32/f64 round_ties_even methods

f5c4c0b

Those methods are stable since Rust version 1.77.0

fitzgen approved these changes Jul 19, 2024

View reviewed changes

fitzgen added this pull request to the merge queue Jul 19, 2024

Merged via the queue into bytecodealliance:main with commit 542af68 Jul 19, 2024
37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cranelift: Constant propagate floats #8954

Cranelift: Constant propagate floats #8954

primoly commented Jul 13, 2024 •

edited

Loading

github-actions bot commented Jul 13, 2024

fitzgen left a comment

fitzgen Jul 15, 2024

bjorn3 Jul 15, 2024

fitzgen Jul 15, 2024

afonso360 Jul 15, 2024

fitzgen Jul 15, 2024

primoly Jul 15, 2024

bjorn3 Jul 15, 2024

fitzgen Jul 15, 2024

primoly commented Jul 17, 2024 •

edited

Loading

fitzgen commented Jul 17, 2024

primoly commented Jul 18, 2024

fitzgen commented Jul 18, 2024

fitzgen left a comment

fitzgen Jul 18, 2024

fitzgen Jul 18, 2024

fitzgen left a comment

Cranelift: Constant propagate floats #8954

Cranelift: Constant propagate floats #8954

Conversation

primoly commented Jul 13, 2024 • edited Loading

github-actions bot commented Jul 13, 2024

Subscribe to Label Action

fitzgen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

primoly commented Jul 17, 2024 • edited Loading

fitzgen commented Jul 17, 2024

primoly commented Jul 18, 2024

fitzgen commented Jul 18, 2024

fitzgen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fitzgen left a comment

Choose a reason for hiding this comment

primoly commented Jul 13, 2024 •

edited

Loading

primoly commented Jul 17, 2024 •

edited

Loading