add an assembler to the toolchain #21169

andrewrk · 2024-08-22T20:10:32Z

Prerequisite for #16270.

Builds of Zig that do not link against LLVM and Clang still need to be able to compile assembly files.

The existing commands already work, and they already support compiling assembly files: zig build-obj, zig build-exe, zig build-lib. The logic needs to be modified to use Zig's own assembler rather than invoking Clang as a subprocess.

For the x86 family specifically, let us jump on the intel syntax train, embracing that as the better syntax. However, we also want to be able to compile the multitude of existing files from the wild without any changes. So it will need to support AT&T syntax as well.

I suggest we start by borrowing LLVM's CPU instruction data via another tool in the tools/ directory. At some point the backends should start using this data as well instead of using an ad-hoc parser, but that will be a follow-up issue.

In order to close this issue, Zig must use its own assembler for all input files, never calling the clang binary for assembly.

alexrp · 2024-12-13T04:30:15Z

Should this include a C preprocessor? A lot of assembly files in the wild (.S) are written with the assumption that they'll be run through one.

andrewrk · 2024-12-13T21:19:26Z

Yes I think so. Aro implements a C preprocessor.

Slackadays · 2024-12-24T02:01:36Z

Is a RISC-V assembler in the scope of this issue?

Rexicon226 · 2024-12-24T02:02:41Z

Is a RISC-V assembler in the scope of this issue?

Yes, all targets that Zig supports are in the scope of this issue.

Slackadays · 2024-12-24T02:10:40Z

I'm already writing a RISC-V assembler to make my own project independent of GCC/LLVM because there are absolutely no others out there, so I'd love to help with the same here. However, it's in C++ and I don't know any Zig, so porting might be the best strategy. Here's a direct link to it: https://github.com/Slackadays/Chata/blob/main/libchata/src/assembler.cpp

alexrp · 2024-12-24T10:23:57Z

Yes I think so. Aro implements a C preprocessor.

But this would have implications for whether the assembler is in-tree or in a separate repo like ziglang/translate-c, right? What's the thinking there?

Slackadays · 2024-12-24T17:19:24Z

But this would have implications for whether the assembler is in-tree or in a separate repo like ziglang/translate-c, right? What's the thinking there?

Why would this matter? The preprocessor could easily be its own thing since it doesn't actually need to know any C, just the C preprocessor language. Then, the assembler could choose to use it or not depending on the input file, and all's good.

alexrp · 2024-12-24T17:44:52Z

It matters because Aro (and its preprocessor) is not going to keep being an in-tree dependency.

Slackadays · 2024-12-24T18:39:06Z

So let's assume Aro is no longer an in-tree dependency. Then it is now a separate repo, which doesn't change anything because Aro's preprocessor can be its own binary or library, say zigcpp for Zig C PreProcessor. At this point, whether the preprocessor is a binary or library is merely an implementation detail because it doesn't change the end result. But since the preprocessor isn't something users typically run on their own it might be simpler to just have it as a separate library.

andrewrk · 2024-12-24T22:54:15Z

I've just pushed the sans-aro branch. I hope that helps to provide guidance to this discussion.

alexrp · 2024-12-24T23:10:55Z

Thanks, that's helpful. Seems like a reasonable direction.

andrewrk · 2024-12-24T23:13:36Z

Assemblers can start as independent processes (lib/compiler/foo.zig) and then we can determine how to integrate them into new inline assembly (#10761).

They should parse into MIR and use the common MIR lowering code because that will be the method of integration with the compiler.

Instruction data (i.e. arch/x86_64/encodings.zig) should take advantage of ZON as soon as possible (#20271) since it will provide a faster and more memory efficient representation than a large zig source file with the same data.

alexrp · 2024-12-26T05:20:36Z

They should parse into MIR and use the common MIR lowering code because that will be the method of integration with the compiler.

Hmm, I don't know if I agree that MIR is at the right level of abstraction for this - at least as it is today.

For inline assembly, it's probably fine, since we likely don't want to allow a lot of the nonsense that you can get away with in GCC-style inline assembly. I imagine that for #10761, for the most part, we will want to limit inline assembly to just machine code and data embedded directly in between instructions.

But for a full assembler, you're kind of in crazy land. You can be emitting machine code in a function and then do .pushsection into some completely unrelated section, emit whatever into it, do .popsection, and go right back to emitting machine code where you were previously. And of course, you can manipulate symbol state like ELF visibility at any point. (You might enjoy reading this page.)

As I understand it, MIR currently has a function view, but a full assembler really needs a whole-object view, and it doesn't seem to me like MIR is the right tool for the job.

andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Aug 22, 2024

andrewrk added this to the 0.15.0 milestone Aug 22, 2024

This was referenced Aug 22, 2024

make the main zig executable no longer depend on LLVM, LLD, and Clang libraries #16270

Open

CPU features are not passed to clang when assembling #10411

Open

alexrp mentioned this issue Aug 26, 2024

parse inline assembly syntax according to a set of dialects; integrate inline assembly more closely with the zig language #10761

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add an assembler to the toolchain #21169

add an assembler to the toolchain #21169

andrewrk commented Aug 22, 2024

alexrp commented Dec 13, 2024 •

edited

Loading

andrewrk commented Dec 13, 2024

Slackadays commented Dec 24, 2024

Rexicon226 commented Dec 24, 2024

Slackadays commented Dec 24, 2024

alexrp commented Dec 24, 2024

Slackadays commented Dec 24, 2024

alexrp commented Dec 24, 2024

Slackadays commented Dec 24, 2024

andrewrk commented Dec 24, 2024

alexrp commented Dec 24, 2024

andrewrk commented Dec 24, 2024 •

edited

Loading

alexrp commented Dec 26, 2024

add an assembler to the toolchain #21169

add an assembler to the toolchain #21169

Comments

andrewrk commented Aug 22, 2024

alexrp commented Dec 13, 2024 • edited Loading

andrewrk commented Dec 13, 2024

Slackadays commented Dec 24, 2024

Rexicon226 commented Dec 24, 2024

Slackadays commented Dec 24, 2024

alexrp commented Dec 24, 2024

Slackadays commented Dec 24, 2024

alexrp commented Dec 24, 2024

Slackadays commented Dec 24, 2024

andrewrk commented Dec 24, 2024

alexrp commented Dec 24, 2024

andrewrk commented Dec 24, 2024 • edited Loading

alexrp commented Dec 26, 2024

alexrp commented Dec 13, 2024 •

edited

Loading

andrewrk commented Dec 24, 2024 •

edited

Loading