Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: Constant propagate floats #8954

Merged
merged 9 commits into from
Jul 19, 2024

Conversation

primoly
Copy link
Contributor

@primoly primoly commented Jul 13, 2024

Adds constant propagation of the following instructions for $F32 and $F64:
fadd, fsub, fmul, fdiv, sqrt, ceil, floor, trunc, nearest.

fmin and fmax are still missing. Those are tricky due to their handling and propagation of NaNs: cranelift_codegen::ir::InstBuilder::fmin

@primoly primoly requested a review from a team as a code owner July 13, 2024 20:16
@primoly primoly requested review from abrown and removed request for a team July 13, 2024 20:16
@primoly primoly changed the title Constant propagate fadd, fsub, fmul and fdiv Cranelift: Constant propagate floats Jul 13, 2024
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language labels Jul 13, 2024
Copy link

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me, modulo one clarification below.

@@ -280,7 +278,53 @@
(decl pure u64_bswap64 (u64) u64)
(extern constructor u64_bswap64 u64_bswap64)

;; Constant fold bitwise float operations (fneg/fabs/fcopysign)
;; Constant fold float operations
;; TODO: fmin, fmax, fcmp, fma, demote, promote, from and to ops
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fma, and any other relaxed SIMD opcode that is non-deterministic, is a little bit tricky. We would want to double check the required Wasm semantics, and whether the non-determinism is allowed to "change" throughout the program or not. If it is spec'd to be non-deterministic, but always has the same behavior for a given evaluation of the N non-deterministic choices, then const folding could be problematic if we are compiling on an aarch64 machine but running on an x64 machine, for example. It would essentially be observable to the program whether the constant folding was happening or not, and I'm not sure if that is allowed by the spec.

So we should answer this question, and then document the answer here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is spec'd to be non-deterministic, but always has the same behavior for a given evaluation of the N non-deterministic choices, then const folding could be problematic if we are compiling on an aarch64 machine but running on an x64 machine, for example.

IMHO it would be problematic either way. It would break reproducibility of the compiled executable/.cwasm file across compiler host architectures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I tried to answer this question for myself and didn't come away with anything firm.

Filed WebAssembly/relaxed-simd#155 to ask for clarification.

I think this TODO comment can just add a note about the potential gotcha with non-deterministic instructions and link to that issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may already be able to observe const propagation with this PR already. At least on RISC-V any NaN producing operation is guaranteed to produce the canonical NaN, but I think some other platforms preserve the payload bits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least on RISC-V any NaN producing operation is guaranteed to produce the canonical NaN, but I think some other platforms preserve the payload bits.

NaNs don't have any guarantee around having the same behavior across instructions, AFAIK. But the relaxed SIMD instructions might, or at least I'm not sure, which is why I filed that issue.

It would break reproducibility of the compiled executable/.cwasm file across compiler host architectures.

I guess this is a philosophical choice. It isn't clear to me whether we want to provide the weaker (deterministic cross compilation given the same host architecture) or the stronger (deterministic cross compilation regardless of host architecture) guarantee.

If the latter, then we can't do compile-time evaluation of any floating point operation that could involve NaNs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the docs as fma in Cranelift being deterministic with regards to rounding. However I don’t know if this is actually honoured by codegen, since properly emulating fma’s rounding might be too expensive. If it turns out that fma is (either set or list) non-deterministic (ignoring propagation of NaNs), then the documentation in InstructionBuilder has to be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the latter, then we can't do compile-time evaluation of any floating point operation that could involve NaNs.

You could still attempt evaluation and bail out optimizing if the output turns out to be NaN.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's true. We could make all these external constructors partial and have them return None if the operation is given or produces a NaN.

@primoly
Copy link
Contributor Author

primoly commented Jul 17, 2024

Reading the WebAssembly spec with regards to NaNs[1][2] here are some remarks and questions.

When all the NaNs in the input are canonical (NaNcanon), then the output is canonical as well.[1] So there is no non-determinism in this case.[3] If at least one of the inputs is non-canonical, the output is non-deterministic, but must be an arithmetic NaN (which of course can be a NaNcanon).[2] I don’t know if this is actually handled correctly by Cranelift, because always propagating NaNs is incorrect, since non-arithmetic NaNs must be turned into arithmetic ones. Unless it is guaranteed that Ieee32 and Ieee64 NaNs are always arithmetic, but I don’t think this is the case. (opened #8967)

In Cranelift the functions used by the constant propagation of float in this PR (as well as #8625) convert the Ieee32/Ieee64 to Rust f32/f64, perform the operation and then convert the result back to Ieee32/Ieee64 by reinterpreting their bits. So the question: What happens to different kinds of NaNs during both calculation and conversions? Do they propagate, are they canonicalised or is this non-deterministic?

To avoid const prop leading to different results than before @bjorn3 mentioned that well could just ignore const prop for all instructions involving NaNs. Actually you could still include operations that involve only NaNcanon, since there propagation and canonicalisation would be the same. Except there still remains the problem with the sign.[3]

Personally, I would prefer to do the constant propagation in all cases (and just pick one spec compliant way of dealing with NaNs), since programs should never assume platform specific behaviour (list non-determinism) for instructions that are defined to be set non-deterministic. I except that constant NaNs (and especially non NaNcanon) will almost never occur in practice, so my reason is just the avoidance of to much special casing.

[1] https://webassembly.github.io/spec/core/exec/numerics.html#nan-propagation

[2] https://webassembly.github.io/spec/core/syntax/values.html#syntax-float

[3] Not quite: The sign is not part of the NaNcanon, so there are effectively two NaNcanon: +NaNcanon and −NaNcanon. The sign of the output is non-deterministic for all operations except fneg, fabs and fcopysign.

@fitzgen
Copy link
Member

fitzgen commented Jul 17, 2024

I would be more comfortable with aborting the compile-time evaluation if either given or producing a NaN. It is more conservative, and we can always have follow up discussions/PRs focused on just that issue. I also suspect there isn't too much additional performance to gain w.r.t NaNs.


FWIW, it seems like compile-time evaluating relaxed SIMD instructions like fma is probably a no-go: WebAssembly/relaxed-simd#155 (comment)

@primoly
Copy link
Contributor Author

primoly commented Jul 18, 2024

I’ve updated the PR to always check whether the result is NaN and don’t do constant folding in that case. I also changed neg abs and copysign to not convert to Rust f32/f64 but operate on the bits directly, since otherwise NaNs could have their payload and even sign changed. I also added fmin and fmax folding.

Regarding fma: I still believe that it is a deterministic instruction when not involving NaNs, but since this PR doesn’t include it, this is something we can discuss in the future.

@fitzgen
Copy link
Member

fitzgen commented Jul 18, 2024

Regarding fma: I still believe that it is a deterministic instruction when not involving NaNs, but since this PR doesn’t include it, this is something we can discuss in the future.

Yes, my b I was thinking of the relaxed_madd Wasm instruction.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good modulo a rebase to address conflicts with main and a couple tiny nitpicks. Thanks!!

}

fn f64_sqrt(&mut self, n: Ieee64) -> Option<Ieee64> {
Some(n.sqrt()).filter(|r| !r.is_nan())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth worth defining helpers like

impl Ieee{32,64} {
    pub(crate) fn non_nan(self) -> Option<Self> {
        Some(self).filter(|f| !f.is_nan())
    }
}

to clean up some of this repetition.

Comment on lines 282 to 284
;; Note: With the exception of fabs, fneg and copysign,
;; constant folding is only performed when
;; the result of an instruction isn't NaN.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you follow this with a sentence saying why that is? Something like

We want the NaN bit patterns produced by an instruction to be consistent, and compile-time evaluation in a cross-compilation scenario risks producing different NaN bit patterns than the target would have at run-time.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Thanks again!

@fitzgen fitzgen added this pull request to the merge queue Jul 19, 2024
Merged via the queue into bytecodealliance:main with commit 542af68 Jul 19, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants