Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shield compiled modules from their appended metadata #4609

Merged
merged 1 commit into from
Aug 10, 2022

Conversation

alexcrichton
Copy link
Member

This commit fixes #4600 in a somewhat roundabout fashion. Currently the
main branch of Wasmtime exhibits unusual behavior:

  • If ./ci/run-tests.sh is run then the cache_accounts_for_opt_level
    test does not fail.
  • If cargo test -p wasmtime --lib is run, however, then the test
    fails.

This test is indeed being run as part of ./ci/run-tests.sh and it's
also passing in CI. The exact failure is that part of the debuginfo
support we have takes an existing ELF image, copies it, and then appends
some information to inform profilers/gdb about the image. This code is
all quite old at this point and not 100% optimal, but that's at least
where we're at.

The problem is that the appended ProgramHeader64 is not aligned
correctly during cargo test -p wasmtime --lib, which is the panic that
happens causing the test to fail. The reason, however, that this test
passes with ./ci/run-tests.sh is that the alignment of
ProgramHeader64 is 1 instead of 8. The reason for that is that the
object crate has an unaligned feature which forcibly unaligns all
primitives to 1 byte instead of their natural alignment. During cargo test -p wasmtime --lib this feature is not enabled but during
./ci/run-tests.sh this feature is enabled. The feature is currently
enabled through inclusion of the backtrace crate which only happens
for some tests in some crates.

The alignment issue explains why the test fails on a single crate test
but fails on the whole workspace tests. The next issue I investigated
was if this test ever passed. It turns out that on v0.39.0 this test
passed, and the regression to main was introduced during #4571. That
PR, however, has nothing to do with any of this! The reason that this
showed up as causing a "regression" however is because it changed
cranelift settings which changed the size of serialized metadata at the
end of a Wasmtime cache object.

Wasmtime compiled artifacts are ELF images with Wasmtime-specific
metadata appended after them. This appended metadata was making its way
all the way through to the gdbjit image itself which mean that while the
end of the ELF file itself was properly aligned the space after the
Wasmtime metadata was not aligned. This metadata changes in size over
time as Cranelift settings change which explains why #4571 was the
"source" of the regression.

The fix in this commit is to discard the extra Wasmtime metadata when
creating an MmapVec representing the underlying ELF image. This is
already supported with MmapVec::drain so it was relatively easy to
insert that. This means that the gdbjit image starts with just the ELF
file itself which is always aligned at the end, which gets the test
passing with/without the unaligned feature in the object crate.

This commit fixes bytecodealliance#4600 in a somewhat roundabout fashion. Currently the
`main` branch of Wasmtime exhibits unusual behavior:

* If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level`
  test does not fail.
* If `cargo test -p wasmtime --lib` is run, however, then the test
  fails.

This test is indeed being run as part of `./ci/run-tests.sh` and it's
also passing in CI. The exact failure is that part of the debuginfo
support we have takes an existing ELF image, copies it, and then appends
some information to inform profilers/gdb about the image. This code is
all quite old at this point and not 100% optimal, but that's at least
where we're at.

The problem is that the appended `ProgramHeader64` is not aligned
correctly during `cargo test -p wasmtime --lib`, which is the panic that
happens causing the test to fail. The reason, however, that this test
passes with `./ci/run-tests.sh` is that the alignment of
`ProgramHeader64` is 1 instead of 8. The reason for that is that the
`object` crate has an `unaligned` feature which forcibly unaligns all
primitives to 1 byte instead of their natural alignment. During `cargo
test -p wasmtime --lib` this feature is not enabled but during
`./ci/run-tests.sh` this feature is enabled. The feature is currently
enabled through inclusion of the `backtrace` crate which only happens
for some tests in some crates.

The alignment issue explains why the test fails on a single crate test
but fails on the whole workspace tests. The next issue I investigated
was if this test ever passed. It turns out that on v0.39.0 this test
passed, and the regression to main was introduced during bytecodealliance#4571. That
PR, however, has nothing to do with any of this! The reason that this
showed up as causing a "regression" however is because it changed
cranelift settings which changed the size of serialized metadata at the
end of a Wasmtime cache object.

Wasmtime compiled artifacts are ELF images with Wasmtime-specific
metadata appended after them. This appended metadata was making its way
all the way through to the gdbjit image itself which mean that while the
end of the ELF file itself was properly aligned the space after the
Wasmtime metadata was not aligned. This metadata changes in size over
time as Cranelift settings change which explains why bytecodealliance#4571 was the
"source" of the regression.

The fix in this commit is to discard the extra Wasmtime metadata when
creating an `MmapVec` representing the underlying ELF image. This is
already supported with `MmapVec::drain` so it was relatively easy to
insert that. This means that the gdbjit image starts with just the ELF
file itself which is always aligned at the end, which gets the test
passing with/without the `unaligned` feature in the `object` crate.
@github-actions github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Aug 4, 2022
@github-actions
Copy link

github-actions bot commented Aug 4, 2022

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton alexcrichton requested a review from peterhuene August 9, 2022 17:55
Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed several read-throughs of both the commit message and the patch to understand this, but eventually I was able to convince myself that this patch does what you said it needed to: limit the returned SerializedModule::artifacts to refer only to the ELF object, and not any trailers.

I think it was MmapVec::drain that caused me the most trouble as it only loosely resembles the corresponding method on Vec. Perhaps a future improvement would be to model that method after Vec::split_off instead?

@alexcrichton alexcrichton merged commit fd28d94 into bytecodealliance:main Aug 10, 2022
@alexcrichton alexcrichton deleted the fix-a-test branch August 10, 2022 14:58
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 10, 2022
alexcrichton added a commit that referenced this pull request Aug 15, 2022
* Rename `MmapVec::drain` to `split_off`

As suggested on #4609

* Fix tests

* Make MmapVec::split_off work like Vec::split_off

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wasmtime: cache_accounts_for_opt_level failing on main
2 participants