Shield compiled modules from their appended metadata #4609

alexcrichton · 2022-08-04T14:57:46Z

This commit fixes #4600 in a somewhat roundabout fashion. Currently the
main branch of Wasmtime exhibits unusual behavior:

If ./ci/run-tests.sh is run then the cache_accounts_for_opt_level
test does not fail.
If cargo test -p wasmtime --lib is run, however, then the test
fails.

This test is indeed being run as part of ./ci/run-tests.sh and it's
also passing in CI. The exact failure is that part of the debuginfo
support we have takes an existing ELF image, copies it, and then appends
some information to inform profilers/gdb about the image. This code is
all quite old at this point and not 100% optimal, but that's at least
where we're at.

The problem is that the appended ProgramHeader64 is not aligned
correctly during cargo test -p wasmtime --lib, which is the panic that
happens causing the test to fail. The reason, however, that this test
passes with ./ci/run-tests.sh is that the alignment of
ProgramHeader64 is 1 instead of 8. The reason for that is that the
object crate has an unaligned feature which forcibly unaligns all
primitives to 1 byte instead of their natural alignment. During cargo test -p wasmtime --lib this feature is not enabled but during
./ci/run-tests.sh this feature is enabled. The feature is currently
enabled through inclusion of the backtrace crate which only happens
for some tests in some crates.

The alignment issue explains why the test fails on a single crate test
but fails on the whole workspace tests. The next issue I investigated
was if this test ever passed. It turns out that on v0.39.0 this test
passed, and the regression to main was introduced during #4571. That
PR, however, has nothing to do with any of this! The reason that this
showed up as causing a "regression" however is because it changed
cranelift settings which changed the size of serialized metadata at the
end of a Wasmtime cache object.

Wasmtime compiled artifacts are ELF images with Wasmtime-specific
metadata appended after them. This appended metadata was making its way
all the way through to the gdbjit image itself which mean that while the
end of the ELF file itself was properly aligned the space after the
Wasmtime metadata was not aligned. This metadata changes in size over
time as Cranelift settings change which explains why #4571 was the
"source" of the regression.

The fix in this commit is to discard the extra Wasmtime metadata when
creating an MmapVec representing the underlying ELF image. This is
already supported with MmapVec::drain so it was relatively easy to
insert that. This means that the gdbjit image starts with just the ELF
file itself which is always aligned at the end, which gets the test
passing with/without the unaligned feature in the object crate.

This commit fixes bytecodealliance#4600 in a somewhat roundabout fashion. Currently the `main` branch of Wasmtime exhibits unusual behavior: * If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level` test does not fail. * If `cargo test -p wasmtime --lib` is run, however, then the test fails. This test is indeed being run as part of `./ci/run-tests.sh` and it's also passing in CI. The exact failure is that part of the debuginfo support we have takes an existing ELF image, copies it, and then appends some information to inform profilers/gdb about the image. This code is all quite old at this point and not 100% optimal, but that's at least where we're at. The problem is that the appended `ProgramHeader64` is not aligned correctly during `cargo test -p wasmtime --lib`, which is the panic that happens causing the test to fail. The reason, however, that this test passes with `./ci/run-tests.sh` is that the alignment of `ProgramHeader64` is 1 instead of 8. The reason for that is that the `object` crate has an `unaligned` feature which forcibly unaligns all primitives to 1 byte instead of their natural alignment. During `cargo test -p wasmtime --lib` this feature is not enabled but during `./ci/run-tests.sh` this feature is enabled. The feature is currently enabled through inclusion of the `backtrace` crate which only happens for some tests in some crates. The alignment issue explains why the test fails on a single crate test but fails on the whole workspace tests. The next issue I investigated was if this test ever passed. It turns out that on v0.39.0 this test passed, and the regression to main was introduced during bytecodealliance#4571. That PR, however, has nothing to do with any of this! The reason that this showed up as causing a "regression" however is because it changed cranelift settings which changed the size of serialized metadata at the end of a Wasmtime cache object. Wasmtime compiled artifacts are ELF images with Wasmtime-specific metadata appended after them. This appended metadata was making its way all the way through to the gdbjit image itself which mean that while the end of the ELF file itself was properly aligned the space after the Wasmtime metadata was not aligned. This metadata changes in size over time as Cranelift settings change which explains why bytecodealliance#4571 was the "source" of the regression. The fix in this commit is to discard the extra Wasmtime metadata when creating an `MmapVec` representing the underlying ELF image. This is already supported with `MmapVec::drain` so it was relatively easy to insert that. This means that the gdbjit image starts with just the ELF file itself which is always aligned at the end, which gets the test passing with/without the `unaligned` feature in the `object` crate.

github-actions · 2022-08-04T15:57:48Z

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

jameysharp

I needed several read-throughs of both the commit message and the patch to understand this, but eventually I was able to convince myself that this patch does what you said it needed to: limit the returned SerializedModule::artifacts to refer only to the ELF object, and not any trailers.

I think it was MmapVec::drain that caused me the most trouble as it only loosely resembles the corresponding method on Vec. Perhaps a future improvement would be to model that method after Vec::split_off instead?

As suggested on bytecodealliance#4609

* Rename `MmapVec::drain` to `split_off` As suggested on #4609 * Fix tests * Make MmapVec::split_off work like Vec::split_off Co-authored-by: Jamey Sharp <jsharp@fastly.com>

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Aug 4, 2022

alexcrichton requested a review from peterhuene August 9, 2022 17:55

jameysharp approved these changes Aug 9, 2022

View reviewed changes

alexcrichton merged commit fd28d94 into bytecodealliance:main Aug 10, 2022

alexcrichton deleted the fix-a-test branch August 10, 2022 14:58

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 10, 2022

Rename MmapVec::drain to split_off

3f96245

As suggested on bytecodealliance#4609

alexcrichton mentioned this pull request Aug 10, 2022

Rename MmapVec::drain to split_off #4673

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shield compiled modules from their appended metadata #4609

Shield compiled modules from their appended metadata #4609

alexcrichton commented Aug 4, 2022

github-actions bot commented Aug 4, 2022

jameysharp left a comment

Shield compiled modules from their appended metadata #4609

Shield compiled modules from their appended metadata #4609

Conversation

alexcrichton commented Aug 4, 2022

github-actions bot commented Aug 4, 2022

Subscribe to Label Action

jameysharp left a comment

Choose a reason for hiding this comment