Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add shared library support #429

Merged
merged 14 commits into from
Sep 28, 2023
Merged

add shared library support #429

merged 14 commits into from
Sep 28, 2023

Conversation

dicej
Copy link
Contributor

@dicej dicej commented Jul 28, 2023

This adds support for building WASI shared libraries per https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md.

For the time being, the goal is to allow "pseudo-dynamic" linking using the Component Model per
https://github.com/WebAssembly/component-model/blob/main/design/mvp/examples/SharedEverythingDynamicLinking.md. This requires all libraries to be available when the component is created, but still allows runtime symbol resolution via dlopen/dlsym backed by a static lookup table. This is sufficient to support Python native extensions, for example. A complete demo using wit-component is available at https://github.com/dicej/component-linking-demo.

This commit adds support for building libc.so, libc++.so, and libc++abi.so alongside their static counterparts.

Notes:

  • I had to refactor errno support a bit to avoid a spurious _ZTH5errno (AKA "thread-local initialization routine for errno") import in libc++.so.
  • Long double print and scan are included by default in libc.so rather than in a separate library.
  • __main_argc_argv is now a weak symbol since it's not relevant for reactors.
  • dlopen/dlsym rely on a lookup table provided by the "dynamic" linker via __wasm_set_libraries. Not all flags are supported yet, and unrecognized flags will result in an error.
  • This requires https://reviews.llvm.org/D153293, which we will need to backport to LLVM 16 until 17 is released. I'll open a wasi-sdk PR with that change and various Makefile tweaks to support shared libraries.
  • libc.so is temporarily disabled for the wasi-threads build until someone can make wasi_thread_start.s position-independent.

Makefile Outdated
@@ -8,7 +8,7 @@ NM ?= $(patsubst %clang,%llvm-nm,$(filter-out ccache sccache,$(CC)))
ifeq ($(origin AR), default)
AR = $(patsubst %clang,%llvm-ar,$(filter-out ccache sccache,$(CC)))
endif
EXTRA_CFLAGS ?= -O2 -DNDEBUG
EXTRA_CFLAGS ?= -O2 -DNDEBUG -fPIC -fvisibility=default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -fPIC overhead also applies to the original non-shared libc.a which is not ideal IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a topic that's come up a few times. @sunfishcode took a look at libc.a before and after -fPIC was added and concluded that it wasn't a big deal from his perspective. The alternative is to build everything twice -- once for static and once for dynamic. Either way is fine with me, as long as we have consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any disadvantages of building everything twice other than doubled build time? If not, I think that'd be my preference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of. @sunfishcode or @alexcrichton do you have any concerns about building the libc and libc++ code twice (with and without -fPIC)? (Actually four times, since we're already building twice for threads/no-threads)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions either way. The -fPIC overhead from the builds I've seen was low. But also, I don't think building twice is a problem either, unless it makes something more complex.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for not reviewing this earlier I've only gotten around to this just now!

One point I would say in favor of only having one build is that by having two sysroots everyone is now presented with a choice of which one to use and it's possible to get it wrong. That can add downstream complexity to builds, configuration, etc. Having only one build solves this because you can't make the wrong choice and there's only one canonical version of libc.a

I see now though I'm late to the punch and you've already updated this to build two copies. I don't feel particularly strongly either way, but I think that this is something to watch out for when integrating this into other toolchains. If it's onerous/difficult to have multiple copies of libc then I think it's worth reconsidering. One example of this is that to use this from Rust there will have to be new target triples, and no preexisting target I think differentiates target triples purely on dynamic linking of the same C standard library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because you can't make the wrong choice

maybe it is the compiler driver's responsibility to automatically pick between .a and .so from the same sysroot libdir, based on whether it is doing dynamic linking or not?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While possible, that's forcing all compiler drivers, embedders, etc, to have duplicate copies and sysroots. For example the rust standard library would need to be compiled for both dynamic linking and not and store both sysroots, it's not purely a concern for WASI. Rust doesn't currently do this for any other target that I'm aware of, and if that's the case it's unlikely to happen for just the WASI target.

To me at least this cost is not worth the minor savings of code size in libc which optimizing compilers/engines will all strip away anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies in advance given my rust experience is limited and error-prone; afaik rust-std already ships both libstd-*.so / libstd-*.rlib for various targets, so it looks like a natural conclusion to ship both -fPIC/non--fPIC variants of libstd for wasm, and for wasm32-wasi, ship two variants of libc as well (in the same sysroot libdir, to be chosen at link-time)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! Internally though those two artifacts have the same machine code. The standard library is compiled once simultaneously with a dylib/rlib output which means that LLVM generates one copy of all the code for the standard library and then that object file is linked into two locations. This is enabled because all Rust platforms (AFAIK) generate position-independent code by default and eschew the minor size-related optimizations from generating static code. That's left for users to build their own toolchains if that size difference matters.

I'll also admit though I'm probably quite biased on this as I wrote the support for rlibs/dylibs and am the reason that the standard library ships two artifacts for example.

dicej added a commit to dicej/rust that referenced this pull request Jul 28, 2023
See WebAssembly/wasi-libc#429 for details.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
@yamt
Copy link
Contributor

yamt commented Jul 29, 2023

I had to refactor errno support a bit to avoid a spurious _ZTH5errno (AKA "thread-local initialization routine for errno") import in libc++.so.

can you explain a bit?

with the patch reverted, i got:

wasm-ld: error: libcxx/src/CMakeFiles/cxx_shared.dir/atomic.cpp.o: relocation R_WASM_MEMORY_ADDR_TLS_SLEB cannot be used against an undefined symbol `errno`

is this what you meant?

Long double print and scan are included by default in libc.so rather than in a separate library.

i'd suggest to drop this because it's an unrelated change.

dlclose/dlsym rely on a lookup table provided by the "dynamic" linker via __wasm_set_libraries. Not all flags are supported yet, and unrecognized flags will result in an error.

i'd suggest to drop this as it seems like a kludge for the specific demo.

libc.so is temporarily disabled for the wasi-threads build until someone can make wasi_thread_start.s position-independent.

it's fine as wasi-threads as of today is not compatible witt multiple module linking setup like this.

Copy link
Contributor

@yamt yamt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, please revert whitespace changes.

Makefile Outdated
$(LIBWASI_EMULATED_PROCESS_CLOCKS_OBJS) \
$(LIBWASI_EMULATED_GETPID_OBJS) \
$(LIBWASI_EMULATED_SIGNAL_OBJS) \
$(LIBWASI_EMULATED_SIGNAL_MUSL_OBJS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

including EMULATED things in libc is an unrelated change. please drop.

Copy link
Contributor Author

@dicej dicej Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure application programmers can use these features in a shared library context if they need them. Putting them in libc.so would be the most convenient option, but I'm open to alternatives. Is your main concern that they make libc.so larger that it otherwise needs to be? If so, would you prefer that they go in a separate library, e.g. libwasi-emulated.so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure application programmers can use these features in a shared library context if they need them. Putting them in libc.so would be the most convenient option, but I'm open to alternatives. Is your main concern that they make libc.so larger that it otherwise needs to be? If so, would you prefer that they go in a separate library, e.g. libwasi-emulated.so?

sort of.
i feel it's better to keep things consistent between .a and .so where reasonable.
yes, i prefer to have libwasi-emulated-process-clocks.so etc.

@dicej
Copy link
Contributor Author

dicej commented Aug 7, 2023

I had to refactor errno support a bit to avoid a spurious _ZTH5errno (AKA "thread-local initialization routine for errno") import in libc++.so.

can you explain a bit?

with the patch reverted, i got:

wasm-ld: error: libcxx/src/CMakeFiles/cxx_shared.dir/atomic.cpp.o: relocation R_WASM_MEMORY_ADDR_TLS_SLEB cannot be used against an undefined symbol `errno`

is this what you meant?

Originally, I hit a different issue such that libc++.so would build without trouble, but import _ZTH5errno, leading to an undefined symbol error when using libc++.so. However, something must have changed since then because now I'm getting the same error you reported above.

Long double print and scan are included by default in libc.so rather than in a separate library.

i'd suggest to drop this because it's an unrelated change.

If we don't build libc.so with long double print/scan support, how will applications enable it? They can't link against libc-printscan-long-double.a because that will result in duplicate definitions, and I believe we'd have the same problem if we built a libc-printscan-long-double.so since its definitions will also conflict with those of libc.so.

dlclose/dlsym rely on a lookup table provided by the "dynamic" linker via __wasm_set_libraries. Not all flags are supported yet, and unrecognized flags will result in an error.

i'd suggest to drop this as it seems like a kludge for the specific demo.

Would you please elaborate on why you feel it's a kludge? Do you have a different approach in mind for supporting dlopen, dlsym, etc. that would work in a WASI runtime?

@dicej dicej force-pushed the dynamic-linking branch from 5362139 to c68659c Compare August 7, 2023 20:12
@dicej
Copy link
Contributor Author

dicej commented Aug 7, 2023

also, please revert whitespace changes.

Done!

@dicej
Copy link
Contributor Author

dicej commented Aug 7, 2023

Regarding the CI failures: stock LLVM versions prior to 17 simply can't build .so files, so ideally we'd just skip building libc.so in that case. Probably the easiest option would be to not include libc.so in the default makefile target, i.e. only build it when explicitly requested. Does that sound reasonable?

@yamt
Copy link
Contributor

yamt commented Aug 8, 2023

I had to refactor errno support a bit to avoid a spurious _ZTH5errno (AKA "thread-local initialization routine for errno") import in libc++.so.

can you explain a bit?
with the patch reverted, i got:

wasm-ld: error: libcxx/src/CMakeFiles/cxx_shared.dir/atomic.cpp.o: relocation R_WASM_MEMORY_ADDR_TLS_SLEB cannot be used against an undefined symbol `errno`

is this what you meant?

Originally, I hit a different issue such that libc++.so would build without trouble, but import _ZTH5errno, leading to an undefined symbol error when using libc++.so. However, something must have changed since then because now I'm getting the same error you reported above.

ok.

Long double print and scan are included by default in libc.so rather than in a separate library.

i'd suggest to drop this because it's an unrelated change.

If we don't build libc.so with long double print/scan support, how will applications enable it? They can't link against libc-printscan-long-double.a because that will result in duplicate definitions, and I believe we'd have the same problem if we built a libc-printscan-long-double.so since its definitions will also conflict with those of libc.so.

why?
it's common to have a shared library which override some symbols in other library. not specific to wasm.

dlclose/dlsym rely on a lookup table provided by the "dynamic" linker via __wasm_set_libraries. Not all flags are supported yet, and unrecognized flags will result in an error.

i'd suggest to drop this as it seems like a kludge for the specific demo.

Would you please elaborate on why you feel it's a kludge? Do you have a different approach in mind for supporting dlopen, dlsym, etc. that would work in a WASI runtime?

yes, different (and IMO more natural) approaches are possible.
eg. https://github.com/yamt/toywasm/tree/master/examples/libdl

@yamt
Copy link
Contributor

yamt commented Aug 8, 2023

Regarding the CI failures: stock LLVM versions prior to 17 simply can't build .so files, so ideally we'd just skip building libc.so in that case. Probably the easiest option would be to not include libc.so in the default makefile target, i.e. only build it when explicitly requested. Does that sound reasonable?

making it a non-default target for now sounds reasonable to me.

@dicej
Copy link
Contributor Author

dicej commented Aug 8, 2023

If we don't build libc.so with long double print/scan support, how will applications enable it? They can't link against libc-printscan-long-double.a because that will result in duplicate definitions, and I believe we'd have the same problem if we built a libc-printscan-long-double.so since its definitions will also conflict with those of libc.so.

why? it's common to have a shared library which override some symbols in other library. not specific to wasm.

They're currently not weak symbols, but yes, we could make them weak in libc.so and strong in libc-printscan-long-double.so.

Would you please elaborate on why you feel it's a kludge? Do you have a different approach in mind for supporting dlopen, dlsym, etc. that would work in a WASI runtime?

yes, different (and IMO more natural) approaches are possible. eg. https://github.com/yamt/toywasm/tree/master/examples/libdl

I'll move dl.c into a separate PR so we can discuss it separately.

@dicej
Copy link
Contributor Author

dicej commented Aug 8, 2023

They're currently not weak symbols, but yes, we could make them weak in libc.so and strong in libc-printscan-long-double.so.

Well, I tried this, but I can't seem to make it work -- at least not the way I had planned.

My expectation was that adding __attribute__((__weak__)) to the relevant symbols would result in an WASM_DYLINK_EXPORT_INFO entry being added the dylink.0 for each symbol, with a flags field that includes WASM_SYM_BINDING_WEAK. However, it seems that wasm-ld does not generate such entries for any weak definitions (i.e. exports) -- only for weak imports. So when the dynamic linker sees the same symbol exported by both libc.so and libc-printscan-long-double.so, it doesn't know which one takes precedence.

I'm not sure if that's a bug or intended behavior. @sbc100 do you know? Is the intention that the dynamic linker should use e.g. WASM_DYLINK_NEEDED to determine search order when more than one library exports the same symbol (i.e. rather than weak vs. strong symbols)?

@sbc100
Copy link
Member

sbc100 commented Aug 8, 2023

They're currently not weak symbols, but yes, we could make them weak in libc.so and strong in libc-printscan-long-double.so.

Well, I tried this, but I can't seem to make it work -- at least not the way I had planned.

My expectation was that adding __attribute__((__weak__)) to the relevant symbols would result in an WASM_DYLINK_EXPORT_INFO entry being added the dylink.0 for each symbol, with a flags field that includes WASM_SYM_BINDING_WEAK. However, it seems that wasm-ld does not generate such entries for any weak definitions (i.e. exports) -- only for weak imports. So when the dynamic linker sees the same symbol exported by both libc.so and libc-printscan-long-double.so, it doesn't know which one takes precedence.

I'm not sure if that's a bug or intended behavior. @sbc100 do you know? Is the intention that the dynamic linker should use e.g. WASM_DYLINK_NEEDED to determine search order when more than one library exports the same symbol (i.e. rather than weak vs. strong symbols)?

Yes, this order in which libraries are loaded determines which one gets to define a given symbol. At last this is how to works on linux and this is the behaviour we try to emulate in emscripten.

However, in think I I don't see why we wouldn't just have a single version of libc.so which has the printscan-long-double stuff. Trying to make libc.so smaller seems a little pointless given that it is already a completely superset of the entire libc. By definition isn't can't do any DCE to be tuned to a specific app. Unless the difference between the size of the two builds of libc.so is really enormous I don't think it makes sense to split in the same way we do for the static build.

Anther way of putting it: The point of libc.so is that the size cost of libc.so itself is not paid by the individual application but instead is amortized over all libc consumers.

@dicej
Copy link
Contributor Author

dicej commented Aug 8, 2023

Thanks for the insight, @sbc100. I agree that putting everything in libc.so is a good place to start given its intended use.

For reference, the sizes of libc.so with and without the printscan-long-double and emulated stuff are 636222 and 632310 bytes, respectively, so the difference is quite minimal.

Makefile Outdated
@@ -85,90 +85,90 @@ LIBC_TOP_HALF_MUSL_SRC_DIR = $(LIBC_TOP_HALF_MUSL_DIR)/src
LIBC_TOP_HALF_MUSL_INC = $(LIBC_TOP_HALF_MUSL_DIR)/include
LIBC_TOP_HALF_MUSL_SOURCES = \
$(addprefix $(LIBC_TOP_HALF_MUSL_SRC_DIR)/, \
misc/a64l.c \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file still seems to contain a ton of whitespace changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought I had fixed that; will try again.

Makefile Outdated
$(SYSROOT_LIB)/libc.so: $(SYSROOT_LIB)/libc.so.a $(BUILTINS_LIB)
$(CC) -nostdlib -shared -o $@ -Wl,--whole-archive $< -Wl,--no-whole-archive $(BUILTINS_LIB)

$(SYSROOT_LIB)/libc.so.a: \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this "so.a"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's to avoid the Windows command line overflow issue by collecting the .o files into a .a file prior to making the .so:

%.a:
	@mkdir -p "$(@D)"
	# On Windows, the commandline for the ar invocation got too long, so it needs to be split up.
	$(AR) crs $@ $(wordlist 1, 199, $(sort $^))
	$(AR) crs $@ $(wordlist 200, 399, $(sort $^))
	$(AR) crs $@ $(wordlist 400, 599, $(sort $^))
	$(AR) crs $@ $(wordlist 600, 799, $(sort $^))
	# This might eventually overflow again, but at least it'll do so in a loud way instead of
	# silently dropping the tail.
	$(AR) crs $@ $(wordlist 800, 100000, $(sort $^))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it into OBJDIR so it isn't included in the distribution.

@@ -3,6 +3,7 @@
#include <sysexits.h>

// The user's `main` function, expecting arguments.
__attribute__((__weak__, nodebug))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why nodebug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably copied the whole line from elsewhere; I'll remove that part.

@@ -1988,18 +1997,18 @@
#define YESSTR 0x50002
#define YXDOMAIN ns_r_yxdomain
#define YXRRSET ns_r_yxrrset
#define _ALLOCA_H
#define _ALLOCA_H
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ton of unrelated whitespace changes

@yamt
Copy link
Contributor

yamt commented Aug 9, 2023

Thanks for the insight, @sbc100. I agree that putting everything in libc.so is a good place to start given its intended use.

For reference, the sizes of libc.so with and without the printscan-long-double and emulated stuff are 636222 and 632310 bytes, respectively, so the difference is quite minimal.

well, my understanding is that the reason why the "emulated" stuff are separated is not size.
it's the fact they are emulations.

i have no strong opinions on long-double things.

@yamt
Copy link
Contributor

yamt commented Aug 9, 2023

If we don't build libc.so with long double print/scan support, how will applications enable it? They can't link against libc-printscan-long-double.a because that will result in duplicate definitions, and I believe we'd have the same problem if we built a libc-printscan-long-double.so since its definitions will also conflict with those of libc.so.

why? it's common to have a shared library which override some symbols in other library. not specific to wasm.

They're currently not weak symbols, but yes, we could make them weak in libc.so and strong in libc-printscan-long-double.so.

weak symbols are not related for this purpose. it's merely symbol search order.

Would you please elaborate on why you feel it's a kludge? Do you have a different approach in mind for supporting dlopen, dlsym, etc. that would work in a WASI runtime?

yes, different (and IMO more natural) approaches are possible. eg. https://github.com/yamt/toywasm/tree/master/examples/libdl

I'll move dl.c into a separate PR so we can discuss it separately.

can you move dlfcn.h related changes too?

@yamt
Copy link
Contributor

yamt commented Aug 9, 2023

I had to refactor errno support a bit to avoid a spurious _ZTH5errno (AKA "thread-local initialization routine for errno") import in libc++.so.

can you explain a bit?
with the patch reverted, i got:

wasm-ld: error: libcxx/src/CMakeFiles/cxx_shared.dir/atomic.cpp.o: relocation R_WASM_MEMORY_ADDR_TLS_SLEB cannot be used against an undefined symbol `errno`

is this what you meant?

Originally, I hit a different issue such that libc++.so would build without trouble, but import _ZTH5errno, leading to an undefined symbol error when using libc++.so. However, something must have changed since then because now I'm getting the same error you reported above.

this is not relevant as far as we don't build shared+threads, right?
i'd suggest to drop this part too.

This adds support for building WASI shared libraries per
https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md.

For the time being, the goal is to allow "pseudo-dynamic" linking using the
Component Model per
https://github.com/WebAssembly/component-model/blob/main/design/mvp/examples/SharedEverythingDynamicLinking.md.
This requires all libraries to be available when the component is created, but
still allows runtime symbol resolution via `dlopen`/`dlsym` backed by a static
lookup table.  This is sufficient to support Python native extensions, for
example.  A complete demo using `wit-component` is available at
https://github.com/dicej/component-linking-demo.

This commit adds support for building `libc.so`, `libc++.so`, and `libc++abi.so`
alongside their static counterparts.

Notes:

- I had to refactor `errno` support a bit to avoid a spurious `_ZTH5errno` (AKA "thread-local initialization routine for errno") import in `libc++.so`.
- Long double print and scan are included by default in `libc.so` rather than in a separate library.
- `__main_argc_argv` is now a weak symbol since it's not relevant for reactors.
- `dlopen`/`dlsym` rely on a lookup table provided by the "dynamic" linker via `__wasm_set_libraries`.  Not all flags are supported yet, and unrecognized flags will result in an error.
- This requires https://reviews.llvm.org/D153293, which we will need to backport to LLVM 16 until 17 is released.  I'll open a `wasi-sdk` PR with that change and various Makefile tweaks to support shared libraries.
- `libc.so` is temporarily disabled for the `wasi-threads` build until someone can make `wasi_thread_start.s` position-independent.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

build `-fPIC` .o files separately from non-`-fPIC` ones

This allows us to build both libc.so and libc.a without incurring indirection
penalties in the latter.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

only build libc.so when explicitly requested

Shared library support in LLVM for non-Emscripten Wasm targets will be added in
version 17, which has not yet been released, so we should not attempt to build
libc.so by default (at least not yet).

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

remove dl.c

I'll open a separate PR for this later.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>

update `check-symbols` files

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
@dicej dicej force-pushed the dynamic-linking branch from a7c7c09 to 0a5a707 Compare August 9, 2023 21:45
@dicej
Copy link
Contributor Author

dicej commented Aug 9, 2023

Originally, I hit a different issue such that libc++.so would build without trouble, but import _ZTH5errno, leading to an undefined symbol error when using libc++.so. However, something must have changed since then because now I'm getting the same error you reported above.

this is not relevant as far as we don't build shared+threads, right? i'd suggest to drop this part too.

I just tried that by updating the wasi-sdk Makefile to only build libc++.so and libc++abi.so for the non-threads build. That got me past the "relocation R_WASM_MEMORY_ADDR_TLS_SLEB cannot be used" issue, but now I'm hitting the _ZTH5errno issue again. So it seems it is relevant, even when we avoid shared+threads.

I'm open to ideas on this, but so far it seems that the errno changes are necessary.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
@dicej dicej force-pushed the dynamic-linking branch from a7a239f to 481368b Compare August 9, 2023 23:19
@yamt
Copy link
Contributor

yamt commented Aug 10, 2023

Originally, I hit a different issue such that libc++.so would build without trouble, but import _ZTH5errno, leading to an undefined symbol error when using libc++.so. However, something must have changed since then because now I'm getting the same error you reported above.

this is not relevant as far as we don't build shared+threads, right? i'd suggest to drop this part too.

I just tried that by updating the wasi-sdk Makefile to only build libc++.so and libc++abi.so for the non-threads build. That got me past the "relocation R_WASM_MEMORY_ADDR_TLS_SLEB cannot be used" issue, but now I'm hitting the _ZTH5errno issue again. So it seems it is relevant, even when we avoid shared+threads.

I'm open to ideas on this, but so far it seems that the errno changes are necessary.

ok.

_ZTH5errno is for dynamic initialization (ie. calling ctor) for C++ thread_local.
i wonder why clang generates these stuff even for int. maybe it's mandated by ABI? i dunno.

anyway, it seems that it's a bug for us to use C++ thread_local because it's incompatible with the C version which uses _Thread_local.

possible solutions i can think of right now are:
a. use _Thread_local in C++ too.
b. use __errno_location for C++. (follow #347)
c. use __errno_location for everything, including C.

for now, my suggestion is option a. or b. because:

  • the option c. involves a bit overhead for C
  • i'm conservative about changing libc ABI

@yamt
Copy link
Contributor

yamt commented Aug 10, 2023

i wonder why clang generates these stuff even for int. maybe it's mandated by ABI? i dunno.

answering myself, as C++ allows more complex initializers than C, accessing extern thread-local vars need to prepare for them.

@dicej
Copy link
Contributor Author

dicej commented Aug 10, 2023

possible solutions i can think of right now are: a. use _Thread_local in C++ too. b. use __errno_location for C++. (follow #347) c. use __errno_location for everything, including C.

I went with solution (a) by changing

#ifdef __cplusplus
extern thread_local int errno;
#else
extern _Thread_local int errno;
#endif

into just

extern _Thread_local int errno;

That seems to work fine. Does anyone have any concerns about that change? Is there a deeper reason we've been using thread_local for C++ besides being idiomatic?

@yamt pointed out there's an easier way to address the `_ZTH5errno` issue I
described in an earlier commit: use `_Thread_local` for both C and C++.  This
gives us a simpler ABI and avoids needing to import a thread-local initializer
for `errno` in libc++.so.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
@dicej
Copy link
Contributor Author

dicej commented Aug 10, 2023

I believe I've addressed all the feedback so far. Does anyone have other concerns?

dicej and others added 2 commits August 10, 2023 15:02
This and `__main_argc_argv` are only relevant for commands (not reactors), so it
makes sense to scope them accordingly.  In addition, the latter was being
imported from libc.so, forcing applications to provide it even if it wasn't
relevant.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
$(SYSROOT_LIB)/libwasi-emulated.so
endif

libc_so: include_dirs $(LIBC_SO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the main target to build shared library?
which target builds pic version of crt1*.o?

Copy link
Contributor Author

@dicej dicej Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the main target to build shared library?

Yes

which target builds pic version of crt1*.o?

My understanding is that crt1.o is intended to be linked into the main module, which doesn't necessarily need to be PIC since it's the one exporting memory, not importing it. I'll admit I've been focusing on reactor-style apps, though, so if there's something missing to support command-style apps I'm happy to add it.

Copy link
Contributor

@yamt yamt Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which target builds pic version of crt1*.o?

My understanding is that crt1.o is intended to be linked into the main module, which doesn't necessarily need to be PIC since it's the one exporting memory, not importing it. I'll admit I've been focusing on reactor-style apps, though, so if there's something missing to support command-style apps I'm happy to add it.

i was assuming that -pie was used to produce a dynamically linked executable.
https://github.com/WebAssembly/tool-conventions/blob/main/DynamicLinking.md#llvm-implementation
but as you say, it might be possible to produce a non-pie dynamically linked executable.
i haven't tried.

btw, a reactor module is linked with crt1-reactor.o as well.
i guess you meant shared libraries, for which if crt1-reactor.o should be linked isn't obvious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but as you say, it might be possible to produce a non-pie dynamically linked executable.
i haven't tried.

it seems clang can somehow produce non-pie dynamically linked main module.
eg.

-fPIC
-nodefaultlibs
-Xlinker --unresolved-symbols=import-dynamic
-Xlinker --export-table
-Xlinker --growable-table

a few things are not obvious to me though:

  • what to do for __stack_pointer. should the main module export it? or import it?
  • how about __heap_base and __heap_end?

on the other hand, pie main module is more straightforward and known to work as it's what emscripten -sMAIN_MODULE uses.
to produce pie main modules, we need pic (or at least dynamic-no-pic) version of crt1.

how do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe the main module must export __stack_pointer, __heap_base, and __heap_end (or at least some module must export them, because shared libraries will import them in general).

I'd be fine with building crt1-*.o with -fPIC unconditionally -- it's a very small amount of code, so I doubt there will be a noticeable performance penalty. Does anyone have concerns about that?

Copy link
Member

@sbc100 sbc100 Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think crt1.o (or something like it) will always be needed in order to define the _initialize function (who's primary job, at least in the llvm world is to call __wasm_call_ctors).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think crt1.o (or something like it) will always be needed in order to define the _initialize function (who's primary job, at least in the llvm world is to call __wasm_call_ctors).

wasm-tools component link also synthesizes an "init" module which contains a start function that calls __wasm_apply_data_relocs, __wasm_call_ctors, initializes GOT globals, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the long run we should use something like _initialize exported rather than __wasm_call_ctors (which I see as an llvm-specific thing).

__wasm_apply_data_relocs is a tricky one because (at least in cyclic module graphs) we need to call that on the whole module graph before we call __wasm_call_ctors on the whole module graph.. so perhaps we need to standardize on some kind of phase 1 and phase 2 _initialize functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed an update to build crt1-*.o with -fPIC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a data point, emscripten currently always builds the main module as -pie and then the runtime provides things like __stack_pointer, __heap_end, etc.

toywasm libdyld does the same.

I've been trying for a while to convert emscripten to use statically linked main modules but I have not had any success yet. See emscripten-core/emscripten#12682

here's my similar attempt: yamt/toywasm#112

@@ -28,7 +28,7 @@ extern hidden const struct __locale_struct __c_dot_utf8_locale;
hidden const struct __locale_map *__get_locale(int, const char *);
hidden const char *__mo_lookup(const void *, size_t, const char *);
hidden const char *__lctrans(const char *, const struct __locale_map *);
hidden const char *__lctrans_cur(const char *);
const char *__lctrans_cur(const char *);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - libc-top-half/musl/src/string/strsignal.c (which is included in LIBWASI_EMULATED_SIGNAL_MUSL_SOURCES) uses LCTRANS_CUR (which is defined to be __lctrans_cur), which means libc.so must export it so libwasi-emulated.so can import it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this warrents and #ifdef __wasilibc_unmodified_upstream along with a comment.

This is another reason why splitting libc into multiple libraries is maybe not the best idea.

dicej and others added 2 commits August 15, 2023 11:37
This ensures they can be used in a PIE or PIC context.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
@dicej
Copy link
Contributor Author

dicej commented Aug 15, 2023

Okay, so CI's failing again, and I think it's because of my most recent change to build crt1-*.o with -fPIC. It seems that LLVM 10 generates different code than the patched version of LLVM 16 that I've been using to test. Specifically, the latter produces code that references __memory_base while the former does not, hence the undefined-symbols.txt mismatch.

This wasn't an issue earlier because the check-symbols target only looked at .a and .o files that were built without -fPIC, but now we're building crt1-*.o with -fPIC, leading to a slightly different list of undefined symbols depending on the compiler used.

Any suggestions to address this?

@sbc100
Copy link
Member

sbc100 commented Aug 15, 2023

Okay, so CI's failing again, and I think it's because of my most recent change to build crt1-*.o with -fPIC. It seems that LLVM 10 generates different code than the patched version of LLVM 16 that I've been using to test. Specifically, the latter produces code that references __memory_base while the former does not, hence the undefined-symbols.txt mismatch.

This wasn't an issue earlier because the check-symbols target only looked at .a and .o files that were built without -fPIC, but now we're building crt1-*.o with -fPIC, leading to a slightly different list of undefined symbols depending on the compiler used.

Any suggestions to address this?

We normally solve these kinds of problems by excluded certain symbols from the comparison.

See the comment down in the check-symbols target:

wasi-libc/Makefile

Lines 641 to 642 in 9f51a71

@# Ignore certain llvm builtin symbols such as those starting with __mul
@# since these dependencies can vary between llvm versions.

Whether this symbol appears varies between LLVM versions.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
}

// Allocate memory for storing the argument chars.
char *argv_buf = malloc(argv_buf_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean to rely on --gc-sections? i'm not sure if it's a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure I understand what you mean. My understanding is that a command-style app will need to define a either main function that takes no arguments or one that takes argc and argv arguments. If the former, this __main_void will not be linked in at all (since it's a weak symbol). If the latter, this function will (and should) be used. Under what circumstances would --gc-sections affect the result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand either, but regardless, --gc-sections is the default in wasm-ld.

Furthermore, -ffunctions-sections and -fdata-sections are also enabled by default in the llvm backend, and actually there is no way to disable -ffunctions-sections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant it could break --no-gc-sections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yamt Can you provide an example of a case where this would break --no-gc-sections? I'm still having trouble seeing the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is certainly an option... I can see an argument though for wanting to keep libclang_rt.builtins-wasm32.a pure. Are there any other examples of files that we don't want to in the shared library but make more sense linking statically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is certainly an option... I can see an argument though for wanting to keep libclang_rt.builtins-wasm32.a pure. Are there any other examples of files that we don't want to in the shared library but make more sense linking statically?

Nothing else has come up to my knowledge.

BTW, having __main_void in the shared library is not really a problem -- just slightly awkward since it won't be used for reactors. As long as __main_argc_argv is declared weak (which is what started this whole discussion), there's no burden left to the application developer. So we always have an option of going back to where we started and adding a comment explaining the situation per #429 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess that putting back __main_void to libc involves the least effort for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbc100 How would you feel if I put __main_void back in libc for the time being? I can add a comment about why the __main_argc_argv declaration is weak and that our long-term goal is to move __main_void to a crt*.o file that is linked in as appropriate in a --no-gc-sections-friendly way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds fine, given that moving proved problematic.

err = __wasi_args_get((uint8_t **)argv, (uint8_t *)argv_buf);
if (err != __WASI_ERRNO_SUCCESS) {
free(argv_buf);
free(argv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this change but I wonder why we don't use alloca here to avoid pulling in malloc?

The emscripten version does this: https://github.com/emscripten-core/emscripten/blob/5ab750c822e47b0878e939f32d0f297edf632cf0/system/lib/standalone/__main_void.c#L41

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc the default stack size set by wasm-ld is 65536 which is exceptionally small, so using that for argv_buf sounds pretty risky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the fact that we have a small default stack size should be used as an argument for not using the stack in places where is makes sense (like this one).

Folks who want to accept long argv lists can easily link with -Wl,-zstack-size=XX. I think the downsize of linking in a malloc implementation into all programs high enough to warrant making this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose a better way to avoid the code size of malloc would be to replace malloc with an implementation that just does bump-pointer allocation and free is a no-op. That way, we don't need to roll the dice with how big we think command-line arguments might get, and it works for programs that have other malloc calls too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting some kind of auto-detection of the need to real malloc vs a bump allocator? Or some kind of opt-in for folks who think they only need a bump allocator? (e.g. -lfake-malloc?)

Copy link
Member

@sbc100 sbc100 Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could do it with weak symbols.

Code like this could call maybe_malloc and maybe_free, which are provided weakly by bumpmalloc.c but strongly as aliases of malloc/free in dlmalloc.c? That might work... but would require those aliases to be added to all malloc/free providers (dlmalloc being one of many)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more like just -lfake-malloc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be unfortunate since it would result is a "bloated-by-default" outcome for small programs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand many such small programs (.e.g hello world) are not that important since they don't represent real world uses?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect most real-world programs call malloc somewhere anyway. The other side of it is, if we had more people maintaining wasi-libc, more things would become possible.

dicej and others added 2 commits August 23, 2023 09:38
@dicej
Copy link
Contributor Author

dicej commented Aug 25, 2023

Any other concerns about this PR? I believe I've addressed everything raised so far.

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % a couple of final comments

That extra complexity we carry from splitting up libc in to chucks for different EMULATED_XX parts makes me a little sad but that is the directions we have chosen for now.

I do wonder if we should reconsider that and just emulate everything in a single libc.a/libc.so. How much value are we getting from bring stingy with emulation support by default? (I don't think we need to let that questions/discussion block that landing of this PR)

@@ -28,7 +28,7 @@ extern hidden const struct __locale_struct __c_dot_utf8_locale;
hidden const struct __locale_map *__get_locale(int, const char *);
hidden const char *__mo_lookup(const void *, size_t, const char *);
hidden const char *__lctrans(const char *, const struct __locale_map *);
hidden const char *__lctrans_cur(const char *);
const char *__lctrans_cur(const char *);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this warrents and #ifdef __wasilibc_unmodified_upstream along with a comment.

This is another reason why splitting libc into multiple libraries is maybe not the best idea.

@yamt
Copy link
Contributor

yamt commented Aug 28, 2023

+1 for sbc100's __wasilibc_unmodified_upstream comment.

otherwise, lgtm.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Copy link
Contributor

@yamt yamt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@xphoenix
Copy link

xphoenix commented Sep 25, 2023

What is the status of this work?
Im very interested in "Shared Everything" builds: will this become available in wasi-libc in the near future or I should use branch?

@dicej
Copy link
Contributor Author

dicej commented Sep 25, 2023

What is the status of this work?

It's ready to go AFAICT; just waiting for an approval from a maintainer. @sunfishcode and/or any other maintainers: any further concerns before this can be merged?

@abrown abrown closed this Sep 28, 2023
@abrown abrown reopened this Sep 28, 2023
@abrown abrown merged commit d4dae89 into WebAssembly:main Sep 28, 2023
8 checks passed
alexcrichton added a commit to alexcrichton/wasi-libc that referenced this pull request May 7, 2024
This commit updates all of the `*.a` archives built for the
`wasm32-wasip2` target to use `-fPIC` instead of the default non-PIC
flag. This makes them suitable for linking into either a dynamic library
or a non-dynamic binary without recompiling libc. This is intended to
help integrate shared libraries into other precompiled languages, such
as Rust's standard library, as it makes its way upstream.

Lots of discussion about this also happened on WebAssembly#429 so I wanted to
highlight that this is only happening for the wasm32-wasip2 target and
no other targets. For example neither wasm32-wasi nor wasm32-wasip1 are
affected. Only users of wasm32-wasip2, of which there aren't too too
many, are possibly affected by the runtime overheads here. I have not
measured the hypothetical runtime myself, however.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants