-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cranelift: Constant propagate floats #8954
Conversation
fadd
, fsub
, fmul
and fdiv
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good to me, modulo one clarification below.
@@ -280,7 +278,53 @@ | |||
(decl pure u64_bswap64 (u64) u64) | |||
(extern constructor u64_bswap64 u64_bswap64) | |||
|
|||
;; Constant fold bitwise float operations (fneg/fabs/fcopysign) | |||
;; Constant fold float operations | |||
;; TODO: fmin, fmax, fcmp, fma, demote, promote, from and to ops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fma
, and any other relaxed SIMD opcode that is non-deterministic, is a little bit tricky. We would want to double check the required Wasm semantics, and whether the non-determinism is allowed to "change" throughout the program or not. If it is spec'd to be non-deterministic, but always has the same behavior for a given evaluation of the N non-deterministic choices, then const folding could be problematic if we are compiling on an aarch64 machine but running on an x64 machine, for example. It would essentially be observable to the program whether the constant folding was happening or not, and I'm not sure if that is allowed by the spec.
So we should answer this question, and then document the answer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is spec'd to be non-deterministic, but always has the same behavior for a given evaluation of the N non-deterministic choices, then const folding could be problematic if we are compiling on an aarch64 machine but running on an x64 machine, for example.
IMHO it would be problematic either way. It would break reproducibility of the compiled executable/.cwasm
file across compiler host architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I tried to answer this question for myself and didn't come away with anything firm.
Filed WebAssembly/relaxed-simd#155 to ask for clarification.
I think this TODO
comment can just add a note about the potential gotcha with non-deterministic instructions and link to that issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may already be able to observe const propagation with this PR already. At least on RISC-V any NaN producing operation is guaranteed to produce the canonical NaN, but I think some other platforms preserve the payload bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on RISC-V any NaN producing operation is guaranteed to produce the canonical NaN, but I think some other platforms preserve the payload bits.
NaN
s don't have any guarantee around having the same behavior across instructions, AFAIK. But the relaxed SIMD instructions might, or at least I'm not sure, which is why I filed that issue.
It would break reproducibility of the compiled executable/
.cwasm
file across compiler host architectures.
I guess this is a philosophical choice. It isn't clear to me whether we want to provide the weaker (deterministic cross compilation given the same host architecture) or the stronger (deterministic cross compilation regardless of host architecture) guarantee.
If the latter, then we can't do compile-time evaluation of any floating point operation that could involve NaN
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the docs as fma
in Cranelift being deterministic with regards to rounding. However I don’t know if this is actually honoured by codegen, since properly emulating fma
’s rounding might be too expensive. If it turns out that fma
is (either set or list) non-deterministic (ignoring propagation of NaNs), then the documentation in InstructionBuilder
has to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the latter, then we can't do compile-time evaluation of any floating point operation that could involve NaNs.
You could still attempt evaluation and bail out optimizing if the output turns out to be NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's true. We could make all these external constructors partial
and have them return None
if the operation is given or produces a NaN
.
Reading the WebAssembly spec with regards to NaNs[1][2] here are some remarks and questions. When all the NaNs in the input are canonical (NaNcanon), then the output is canonical as well.[1] So there is no non-determinism in this case.[3] If at least one of the inputs is non-canonical, the output is non-deterministic, but must be an arithmetic NaN (which of course can be a NaNcanon).[2] I don’t know if this is actually handled correctly by Cranelift, because always propagating NaNs is incorrect, since non-arithmetic NaNs must be turned into arithmetic ones. Unless it is guaranteed that In Cranelift the functions used by the constant propagation of float in this PR (as well as #8625) convert the To avoid const prop leading to different results than before @bjorn3 mentioned that well could just ignore const prop for all instructions involving NaNs. Actually you could still include operations that involve only NaNcanon, since there propagation and canonicalisation would be the same. Except there still remains the problem with the sign.[3] Personally, I would prefer to do the constant propagation in all cases (and just pick one spec compliant way of dealing with NaNs), since programs should never assume platform specific behaviour (list non-determinism) for instructions that are defined to be set non-deterministic. I except that constant NaNs (and especially non NaNcanon) will almost never occur in practice, so my reason is just the avoidance of to much special casing. [1] https://webassembly.github.io/spec/core/exec/numerics.html#nan-propagation [2] https://webassembly.github.io/spec/core/syntax/values.html#syntax-float [3] Not quite: The sign is not part of the NaNcanon, so there are effectively two NaNcanon: +NaNcanon and −NaNcanon. The sign of the output is non-deterministic for all operations except |
I would be more comfortable with aborting the compile-time evaluation if either given or producing a FWIW, it seems like compile-time evaluating relaxed SIMD instructions like |
I’ve updated the PR to always check whether the result is NaN and don’t do constant folding in that case. I also changed Regarding |
Yes, my b I was thinking of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good modulo a rebase to address conflicts with main
and a couple tiny nitpicks. Thanks!!
} | ||
|
||
fn f64_sqrt(&mut self, n: Ieee64) -> Option<Ieee64> { | ||
Some(n.sqrt()).filter(|r| !r.is_nan()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth worth defining helpers like
impl Ieee{32,64} {
pub(crate) fn non_nan(self) -> Option<Self> {
Some(self).filter(|f| !f.is_nan())
}
}
to clean up some of this repetition.
;; Note: With the exception of fabs, fneg and copysign, | ||
;; constant folding is only performed when | ||
;; the result of an instruction isn't NaN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you follow this with a sentence saying why that is? Something like
We want the NaN bit patterns produced by an instruction to be consistent, and compile-time evaluation in a cross-compilation scenario risks producing different NaN bit patterns than the target would have at run-time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Thanks again!
Those methods are stable since Rust version 1.77.0
Adds constant propagation of the following instructions for
$F32
and$F64
:fadd
,fsub
,fmul
,fdiv
,sqrt
,ceil
,floor
,trunc
,nearest
.fmin
andfmax
are still missing. Those are tricky due to their handling and propagation of NaNs: cranelift_codegen::ir::InstBuilder::fmin