Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranellift: remove Baldrdash support and related features. #4571

Merged
merged 5 commits into from
Aug 2, 2022

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Aug 1, 2022

As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.

In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.

This PR removes Baldrdash ABI support, the fallthrough_return
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425

@cfallin cfallin requested review from fitzgen and bnjbvr August 1, 2022 19:11
As noted in Mozilla's bugzilla bug 1781425 [1], the SpiderMonkey team
has recently determined that their current form of integration with
Cranelift is too hard to maintain, and they have chosen to remove it
from their codebase. If and when they decide to build updated support
for Cranelift, they will adopt different approaches to several details
of the integration.

In the meantime, after discussion with the SpiderMonkey folks, they
agree that it makes sense to remove the bits of Cranelift that exist
to support the integration ("Baldrdash"), as they will not need
them. Many of these bits are difficult-to-maintain special cases that
are not actually tested in Cranelift proper: for example, the
Baldrdash integration required Cranelift to emit function bodies
without prologues/epilogues, and instead communicate very precise
information about the expected frame size and layout, then stitched
together something post-facto. This was brittle and caused a lot of
incidental complexity ("fallthrough returns", the resulting special
logic in block-ordering); this is just one example. As another
example, one particular Baldrdash ABI variant processed stack args in
reverse order, so our ABI code had to support both traversal
orders. We had a number of other Baldrdash-specific settings as well
that did various special things.

This PR removes Baldrdash ABI support, the `fallthrough_return`
instruction, and pulls some threads to remove now-unused bits as a
result of those two, with the  understanding that the SpiderMonkey folks
will build new functionality as needed in the future and we can perhaps
find cleaner abstractions to make it all work.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1781425
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. labels Aug 1, 2022
cranelift/docs/ir.md Outdated Show resolved Hide resolved
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@cfallin
Copy link
Member Author

cfallin commented Aug 1, 2022

There is something strange going on with the lldb test (gdb passes) where it can't find a line number for a breakpoint anymore. It seems like this should be unrelated and I'm wondering if it has to do with a silent version change somewhere. I'll poke at a bit, but if I can't work it out I may just disable the lldb test, since debug support in general is tier-3.

…ons.

The debugger tests invoke `wasmtime` from within each test case under
the control of a debugger (gdb or lldb). Some of these tests started to
inexplicably fail in CI with unrelated changes, and the failures were
only inconsistently reproducible locally. It seems to be cache related:
if we disable cached compilation on the nested `wasmtime` invocations,
the tests consistently pass.
@cfallin
Copy link
Member Author

cfallin commented Aug 2, 2022

I was having difficulty investigating the lldb failure locally because it seemed flaky (inconsistently failed) until I realized this depended on me moving the branch around and rebuilding -- hence on the state of the compilation cache. It turns out our debugger tests invoke a nested wasmtime process under a debugger but don't disable the compilation cache, and this appears to cause issues with missing debug info.

I'm not sure if this implies an issue with the caching key or not (cc @alexcrichton perhaps?) but let's see if the latest commit at least fixes the tests here...

@cfallin
Copy link
Member Author

cfallin commented Aug 2, 2022

If this does fix the issue then I'm happy to either split out the last commit into another PR or include it here with your updated r+, @fitzgen...

Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions for more things that could be simplified, but could be done in a follow-up if preferable. Cheers, and fare thee well baldrdash!

cranelift/codegen/src/isa/aarch64/abi.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/aarch64/abi.rs Show resolved Hide resolved
cranelift/codegen/src/isa/aarch64/abi.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/aarch64/inst/emit.rs Outdated Show resolved Hide resolved
cranelift/README.md Show resolved Hide resolved
cranelift/codegen/src/isa/s390x/inst/emit.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/x64/lower.rs Outdated Show resolved Hide resolved
cranelift/codegen/src/machinst/blockorder.rs Show resolved Hide resolved
@cfallin cfallin enabled auto-merge (squash) August 2, 2022 17:07
@cfallin cfallin merged commit 43f1765 into bytecodealliance:main Aug 2, 2022
@cfallin cfallin deleted the remove-baldrdash branch August 2, 2022 19:38
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 4, 2022
This commit fixes bytecodealliance#4600 in a somewhat roundabout fashion. Currently the
`main` branch of Wasmtime exhibits unusual behavior:

* If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level`
  test does not fail.
* If `cargo test -p wasmtime --lib` is run, however, then the test
  fails.

This test is indeed being run as part of `./ci/run-tests.sh` and it's
also passing in CI. The exact failure is that part of the debuginfo
support we have takes an existing ELF image, copies it, and then appends
some information to inform profilers/gdb about the image. This code is
all quite old at this point and not 100% optimal, but that's at least
where we're at.

The problem is that the appended `ProgramHeader64` is not aligned
correctly during `cargo test -p wasmtime --lib`, which is the panic that
happens causing the test to fail. The reason, however, that this test
passes with `./ci/run-tests.sh` is that the alignment of
`ProgramHeader64` is 1 instead of 8. The reason for that is that the
`object` crate has an `unaligned` feature which forcibly unaligns all
primitives to 1 byte instead of their natural alignment. During `cargo
test -p wasmtime --lib` this feature is not enabled but during
`./ci/run-tests.sh` this feature is enabled. The feature is currently
enabled through inclusion of the `backtrace` crate which only happens
for some tests in some crates.

The alignment issue explains why the test fails on a single crate test
but fails on the whole workspace tests. The next issue I investigated
was if this test ever passed. It turns out that on v0.39.0 this test
passed, and the regression to main was introduced during bytecodealliance#4571. That
PR, however, has nothing to do with any of this! The reason that this
showed up as causing a "regression" however is because it changed
cranelift settings which changed the size of serialized metadata at the
end of a Wasmtime cache object.

Wasmtime compiled artifacts are ELF images with Wasmtime-specific
metadata appended after them. This appended metadata was making its way
all the way through to the gdbjit image itself which mean that while the
end of the ELF file itself was properly aligned the space after the
Wasmtime metadata was not aligned. This metadata changes in size over
time as Cranelift settings change which explains why bytecodealliance#4571 was the
"source" of the regression.

The fix in this commit is to discard the extra Wasmtime metadata when
creating an `MmapVec` representing the underlying ELF image. This is
already supported with `MmapVec::drain` so it was relatively easy to
insert that. This means that the gdbjit image starts with just the ELF
file itself which is always aligned at the end, which gets the test
passing with/without the `unaligned` feature in the `object` crate.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 5, 2022
This commit fixes bytecodealliance#4600 in a somewhat roundabout fashion. Currently the
`main` branch of Wasmtime exhibits unusual behavior:

* If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level`
  test does not fail.
* If `cargo test -p wasmtime --lib` is run, however, then the test
  fails.

This test is indeed being run as part of `./ci/run-tests.sh` and it's
also passing in CI. The exact failure is that part of the debuginfo
support we have takes an existing ELF image, copies it, and then appends
some information to inform profilers/gdb about the image. This code is
all quite old at this point and not 100% optimal, but that's at least
where we're at.

The problem is that the appended `ProgramHeader64` is not aligned
correctly during `cargo test -p wasmtime --lib`, which is the panic that
happens causing the test to fail. The reason, however, that this test
passes with `./ci/run-tests.sh` is that the alignment of
`ProgramHeader64` is 1 instead of 8. The reason for that is that the
`object` crate has an `unaligned` feature which forcibly unaligns all
primitives to 1 byte instead of their natural alignment. During `cargo
test -p wasmtime --lib` this feature is not enabled but during
`./ci/run-tests.sh` this feature is enabled. The feature is currently
enabled through inclusion of the `backtrace` crate which only happens
for some tests in some crates.

The alignment issue explains why the test fails on a single crate test
but fails on the whole workspace tests. The next issue I investigated
was if this test ever passed. It turns out that on v0.39.0 this test
passed, and the regression to main was introduced during bytecodealliance#4571. That
PR, however, has nothing to do with any of this! The reason that this
showed up as causing a "regression" however is because it changed
cranelift settings which changed the size of serialized metadata at the
end of a Wasmtime cache object.

Wasmtime compiled artifacts are ELF images with Wasmtime-specific
metadata appended after them. This appended metadata was making its way
all the way through to the gdbjit image itself which mean that while the
end of the ELF file itself was properly aligned the space after the
Wasmtime metadata was not aligned. This metadata changes in size over
time as Cranelift settings change which explains why bytecodealliance#4571 was the
"source" of the regression.

The fix in this commit is to discard the extra Wasmtime metadata when
creating an `MmapVec` representing the underlying ELF image. This is
already supported with `MmapVec::drain` so it was relatively easy to
insert that. This means that the gdbjit image starts with just the ELF
file itself which is always aligned at the end, which gets the test
passing with/without the `unaligned` feature in the `object` crate.
alexcrichton added a commit that referenced this pull request Aug 10, 2022
This commit fixes #4600 in a somewhat roundabout fashion. Currently the
`main` branch of Wasmtime exhibits unusual behavior:

* If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level`
  test does not fail.
* If `cargo test -p wasmtime --lib` is run, however, then the test
  fails.

This test is indeed being run as part of `./ci/run-tests.sh` and it's
also passing in CI. The exact failure is that part of the debuginfo
support we have takes an existing ELF image, copies it, and then appends
some information to inform profilers/gdb about the image. This code is
all quite old at this point and not 100% optimal, but that's at least
where we're at.

The problem is that the appended `ProgramHeader64` is not aligned
correctly during `cargo test -p wasmtime --lib`, which is the panic that
happens causing the test to fail. The reason, however, that this test
passes with `./ci/run-tests.sh` is that the alignment of
`ProgramHeader64` is 1 instead of 8. The reason for that is that the
`object` crate has an `unaligned` feature which forcibly unaligns all
primitives to 1 byte instead of their natural alignment. During `cargo
test -p wasmtime --lib` this feature is not enabled but during
`./ci/run-tests.sh` this feature is enabled. The feature is currently
enabled through inclusion of the `backtrace` crate which only happens
for some tests in some crates.

The alignment issue explains why the test fails on a single crate test
but fails on the whole workspace tests. The next issue I investigated
was if this test ever passed. It turns out that on v0.39.0 this test
passed, and the regression to main was introduced during #4571. That
PR, however, has nothing to do with any of this! The reason that this
showed up as causing a "regression" however is because it changed
cranelift settings which changed the size of serialized metadata at the
end of a Wasmtime cache object.

Wasmtime compiled artifacts are ELF images with Wasmtime-specific
metadata appended after them. This appended metadata was making its way
all the way through to the gdbjit image itself which mean that while the
end of the ELF file itself was properly aligned the space after the
Wasmtime metadata was not aligned. This metadata changes in size over
time as Cranelift settings change which explains why #4571 was the
"source" of the regression.

The fix in this commit is to discard the extra Wasmtime metadata when
creating an `MmapVec` representing the underlying ELF image. This is
already supported with `MmapVec::drain` so it was relatively easy to
insert that. This means that the gdbjit image starts with just the ELF
file itself which is always aligned at the end, which gets the test
passing with/without the `unaligned` feature in the `object` crate.
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Apr 22, 2024
Before this, Cranelift ABI code would emit a stack-load instruction for
every stack argument and add all register arguments to the `args`
pseudo-instruction, whether those arguments were used or not.

However, we already know which arguments are used at that point because
we need the analysis for load-sinking, so it's easy to filter the unused
arguments out.

This avoids generating loads that are immediately dead, which is good
for the generated code. It also slightly reduces the size of the
register allocation problem, which is a small win in compile time.

This also changes which registers RA2 chooses in some cases because it
no longer considers unused defs from the `args` pseudo-instruction.

There was an existing method named `arg_is_needed_in_body` which sounded
like it should be the right place to implement this. However, that
method was only used for Baldrdash integration and has been a stub since
that integration was removed in bytecodealliance#4571. Also it didn't have access to the
`value_ir_uses` map needed here. But the place where that method was
called does have access to that map and was perfect for this.

Thanks to @elliottt for doing the initial investigation of this change
with me.
jameysharp added a commit to jameysharp/wasmtime that referenced this pull request Apr 22, 2024
Before this, Cranelift ABI code would emit a stack-load instruction for
every stack argument and add all register arguments to the `args`
pseudo-instruction, whether those arguments were used or not.

However, we already know which arguments are used at that point because
we need the analysis for load-sinking, so it's easy to filter the unused
arguments out.

This avoids generating loads that are immediately dead, which is good
for the generated code. It also slightly reduces the size of the
register allocation problem, which is a small win in compile time.

This also changes which registers RA2 chooses in some cases because it
no longer considers unused defs from the `args` pseudo-instruction.

There was an existing method named `arg_is_needed_in_body` which sounded
like it should be the right place to implement this. However, that
method was only used for Baldrdash integration and has been a stub since
that integration was removed in bytecodealliance#4571. Also it didn't have access to the
`value_ir_uses` map needed here. But the place where that method was
called does have access to that map and was perfect for this.

Also, don't emit a `dummy_use` pseudo-instruction for the vmctx if it's
otherwise unused everywhere, as we want to drop it from the `args`
instruction in that case and then RA2 complains that it's used without
being defined.

Furthermore, don't emit debug info specially for the vmctx parameter,
because it's already emitted for all block parameters including vmctx.

Thanks to @elliottt for doing the initial investigation of this change
with me, and to @cfallin for helping me track down the `dummy_use` false
dependency.
github-merge-queue bot pushed a commit that referenced this pull request Apr 22, 2024
Before this, Cranelift ABI code would emit a stack-load instruction for
every stack argument and add all register arguments to the `args`
pseudo-instruction, whether those arguments were used or not.

However, we already know which arguments are used at that point because
we need the analysis for load-sinking, so it's easy to filter the unused
arguments out.

This avoids generating loads that are immediately dead, which is good
for the generated code. It also slightly reduces the size of the
register allocation problem, which is a small win in compile time.

This also changes which registers RA2 chooses in some cases because it
no longer considers unused defs from the `args` pseudo-instruction.

There was an existing method named `arg_is_needed_in_body` which sounded
like it should be the right place to implement this. However, that
method was only used for Baldrdash integration and has been a stub since
that integration was removed in #4571. Also it didn't have access to the
`value_ir_uses` map needed here. But the place where that method was
called does have access to that map and was perfect for this.

Also, don't emit a `dummy_use` pseudo-instruction for the vmctx if it's
otherwise unused everywhere, as we want to drop it from the `args`
instruction in that case and then RA2 complains that it's used without
being defined.

Furthermore, don't emit debug info specially for the vmctx parameter,
because it's already emitted for all block parameters including vmctx.

Thanks to @elliottt for doing the initial investigation of this change
with me, and to @cfallin for helping me track down the `dummy_use` false
dependency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift:meta Everything related to the meta-language. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants