Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReleaseSafe executables for aarch64-linux only prints out first step of backtrace #22161

Closed
leroycep opened this issue Dec 6, 2024 · 9 comments · Fixed by #22180
Closed

ReleaseSafe executables for aarch64-linux only prints out first step of backtrace #22161

leroycep opened this issue Dec 6, 2024 · 9 comments · Fixed by #22180
Milestone

Comments

@leroycep
Copy link
Contributor

leroycep commented Dec 6, 2024

Zig Version

0.14.0-dev.2379+6188cb8e5

Steps to Reproduce and Observed Behavior

// ./main.zig
const std = @import("std");

noinline fn foo(x: u32) u32 {
    return x * x;
}

noinline fn bar() u32 {
    return foo(std.math.maxInt(u32));
}

pub fn main() !void {
    std.debug.print("{}", .{bar()});
}
~/code/zig/build/example-elfsymtab-backtrace> ../stage3/bin/zig build-exe main.zig -target aarch64-linux-none -OReleaseSafe; try { /usr/bin/qemu-aarch64 ./main }
thread 3134781 panic: integer overflow
/home/geemili/code/zig/build/example-elfsymtab-backtrace/main.zig:4:14: 0x1018b87 in foo (main)
    return x * x;
             ^
qemu: uncaught target signal 6 (Aborted) - core dumped

Expected Behavior

Full backtrace, like in debug mode:

~/code/zig/build/example-elfsymtab-backtrace> ../stage3/bin/zig build-exe main.zig -target aarch64-linux-none; try { /usr/bin/qemu-aarch64 ./main }
thread 3136170 panic: integer overflow
/home/geemili/code/zig/build/example-elfsymtab-backtrace/main.zig:4:14: 0x104bf63 in foo (main)
    return x * x;
             ^
/home/geemili/code/zig/build/example-elfsymtab-backtrace/main.zig:8:15: 0x104a667 in bar (main)
    return foo(std.math.maxInt(u32));
              ^
/home/geemili/code/zig/build/example-elfsymtab-backtrace/main.zig:12:32: 0x104a637 in main (main)
    std.debug.print("{}", .{bar()});
                               ^
/home/geemili/code/zig/lib/std/start.zig:617:37: 0x104a60f in posixCallMainAndExit (main)
            const result = root.main() catch |err| {
                                    ^
???:?:?: 0x0 in ??? (???)
qemu: uncaught target signal 6 (Aborted) - core dumped
@leroycep leroycep added the bug Observed behavior contradicts documented or intended behavior label Dec 6, 2024
@leroycep
Copy link
Contributor Author

leroycep commented Dec 6, 2024

Just thought to check out the behavior on 0.13:

~/code/zig/build/example-elfsymtab-backtrace> zig build-exe main.zig -target aarch64-linux-none; try { /usr/bin/qemu-aarch64 ./main }
thread 3179766 panic: integer overflow
/home/geemili/code/zig/build/example-elfsymtab-backtrace/main.zig:4:14: 0x1049bc7 in foo (main)
    return x * x;
             ^
/home/geemili/code/zig/build/example-elfsymtab-backtrace/main.zig:8:15: 0x10473ef in bar (main)
    return foo(std.math.maxInt(u32));
              ^
/home/geemili/code/zig/build/example-elfsymtab-backtrace/main.zig:12:32: 0x10473bf in main (main)
    std.debug.print("{}", .{bar()});
                               ^
/usr/lib/zig/std/start.zig:524:37: 0x1047397 in posixCallMainAndExit (main)
            const result = root.main() catch |err| {
                                    ^
???:?:?: 0x0 in ??? (???)
qemu: uncaught target signal 6 (Aborted) - core dumped
~/code/zig/build/example-elfsymtab-backtrace> zig build-exe main.zig -target aarch64-linux-none -OReleaseSafe; try { /usr/bin/qemu-aarch64 ./main }
thread 3179827 panic: integer overflow
qemu: uncaught target signal 6 (Aborted) - core dumped
~/code/zig/build/example-elfsymtab-backtrace> zig build-exe main.zig -target aarch64-linux-none -OReleaseSafe -fno-strip; try { /usr/bin/qemu-aarch64 ./main }
thread 3179901 panic: integer overflow
qemu: uncaught target signal 6 (Aborted) - core dumped
~/code/zig/build/example-elfsymtab-backtrace> zig version
0.13.0

Debug stack traces still work, ReleaseSafe binaries produce no stack trace at all.

@alexrp
Copy link
Member

alexrp commented Dec 7, 2024

It works with -fno-omit-frame-pointer. Same story for arm-linux-none and riscv64-linux-none.

However, this is not the case for other targets, e.g. powerpc64le-linux-none.

@leroycep
Copy link
Contributor Author

leroycep commented Dec 7, 2024

Oh, I hadn't realized that -fomit-frame-pointer worked on architectures other than x86.

Hmm, I might check if it works on even older versions of zig.

@alexrp
Copy link
Member

alexrp commented Dec 7, 2024

I think we should strongly consider defaulting omit_frame_pointer to false for any configuration that includes debug symbols or unwind tables.

Specifically, this:

zig/src/Package/Module.zig

Lines 205 to 210 in 4894ac4

const omit_frame_pointer = b: {
if (options.inherited.omit_frame_pointer) |x| break :b x;
if (options.parent) |p| break :b p.omit_frame_pointer;
if (optimize_mode == .Debug) break :b false;
break :b true;
};

Should instead be something like:

    const omit_frame_pointer = b: {
        if (options.inherited.omit_frame_pointer) |x| break :b x;
        if (options.parent) |p| break :b p.omit_frame_pointer;
        if (unwind_tables) break :b false;
        if (!strip) break :b false;
        break :b true;
    };

@leroycep
Copy link
Contributor Author

leroycep commented Dec 7, 2024

Turning on frame pointers by default would also match several Linux distributions (Fedora, Ubuntu, Arch) that are turned frame pointers back on by default; see Brendan Gregg's post on The Return of Frame Pointers.

@alexrp
Copy link
Member

alexrp commented Dec 7, 2024

Hmmmm... maybe just having frame pointers on by default except for ReleaseSmall would make sense. If the overhead for the overwhelming majority of applications is around 1%, this seems like a completely reasonable call to me, especially given the debugging and profiling advantages.

alexrp added a commit to alexrp/zig that referenced this issue Dec 7, 2024
Frame pointers make both debugging and profiling work better, and the overhead
is reportedly 1% or less for typical programs [0]. I think the pros outweigh the
cons here. People who *really* care about that 1% can simply use the
-fomit-frame-pointer option to reclaim it. For ReleaseSmall, though, it makes
sense to omit frame pointers by default for the sake of code size, as we already
strip the binary in this case anyway.

Closes ziglang#22161.

[0] https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html
@leroycep
Copy link
Contributor Author

leroycep commented Dec 7, 2024

I think -fno-omit-frame-pointer will fix stack traces for most cases, but it still means that DWARF unwinding is broken on aarch64, which is probably something that should be fixed?

@alexrp
Copy link
Member

alexrp commented Dec 7, 2024

DWARF-based unwinding should ideally work for every target, yeah. But what makes you think DWARF unwinding actually is broken for aarch64?

@andrewrk andrewrk added this to the 0.14.0 milestone Dec 7, 2024
@andrewrk andrewrk removed the bug Observed behavior contradicts documented or intended behavior label Dec 7, 2024
@leroycep
Copy link
Contributor Author

leroycep commented Dec 7, 2024

I don't know, but I do suspect it. It's doesn't seem to be bad DWARF debug info, because gdb can generate a stack trace:

~/code/zig-elf-symbol-debuginfo/debug/example-elfsymtab-backtrace> zig build-exe main.zig -target aarch64-linux-none -OReleaseSafe
~/code/zig-elf-symbol-debuginfo/debug/example-elfsymtab-backtrace> ~/Downloads/ziglang-qemu-static/qemu-linux-x86_64-9.2.0-rc1/bin/qemu-aarch64 -g 12345 ./main
thread 3864598 panic: integer overflow
/home/geemili/code/zig-elf-symbol-debuginfo/debug/example-elfsymtab-backtrace/main.zig:4:14: 0x1018bbf in foo (main)
    return x * x;
             ^
qemu-aarch64: QEMU: Terminated via GDBstub
~/code/zig-elf-symbol-debuginfo/debug/example-elfsymtab-backtrace> ~/Downloads/ziglang-qemu-static/qemu-linux-x86_64-9.2.0-rc1/bin/qemu-aarch64 -g 12345 ./main
nu ❯ gdb -ex "set confirm off" -ex "target remote aitxero:12345" -ex "continue" -ex "bt" -ex "continue" -ex "quit"
GNU gdb (GDB) 15.2
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Remote debugging using aitxero:12345
Reading /home/geemili/code/zig-elf-symbol-debuginfo/debug/example-elfsymtab-backtrace/main from remote target...
warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
Reading /home/geemili/code/zig-elf-symbol-debuginfo/debug/example-elfsymtab-backtrace/main from remote target...
Reading symbols from target:/home/geemili/code/zig-elf-symbol-debuginfo/debug/example-elfsymtab-backtrace/main...
Reading /nix/store/wplxj47zl5y5hslzcy8ibwvhvxs87i69-gdb-15.2/lib/debug/.build-id/35/261b9dc23e3b6de986ae686cf55fd868b67edc.debug from remote target...
Downloading separate debug info for system-supplied DSO at 0x7f2b22dd6000
start._start () at /home/geemili/code/zig-elf-symbol-debuginfo/lib/std/start.zig:248                                                        
warning: 248	/home/geemili/code/zig-elf-symbol-debuginfo/lib/std/start.zig: No such file or directory
Continuing.

Program received signal SIGABRT, Aborted.
posix.sigprocmask (flags=2, set=0x7f2b22dbff60, oldset=0x0) at /home/geemili/code/zig-elf-symbol-debuginfo/lib/std/posix.zig:5728
warning: 5728	/home/geemili/code/zig-elf-symbol-debuginfo/lib/std/posix.zig: No such file or directory
#0  posix.sigprocmask (flags=2, set=0x7f2b22dbff60, oldset=0x0) at /home/geemili/code/zig-elf-symbol-debuginfo/lib/std/posix.zig:5728
#1  posix.raise (sig=6 '\006') at /home/geemili/code/zig-elf-symbol-debuginfo/lib/std/posix.zig:729
#2  0x000000000103cf4c in posix.abort () at /home/geemili/code/zig-elf-symbol-debuginfo/lib/std/posix.zig:673
#3  0x000000000103c638 in debug.defaultPanic (error_return_trace=0x0, first_trace_addr=...)
    at /home/geemili/code/zig-elf-symbol-debuginfo/lib/std/io.zig:324
#4  0x0000000001018bc0 in main.foo (x=4294967295) at main.zig:4
#5  0x0000000001018a28 in main.bar () at main.zig:8
#6  0x0000000001018a20 in main.main () at main.zig:12
#7  0x0000000001018968 in start.callMain () at /home/geemili/code/zig-elf-symbol-debuginfo/lib/std/start.zig:617
#8  start.callMainWithArgs () at /home/geemili/code/zig-elf-symbol-debuginfo/lib/std/start.zig:577
#9  start.posixCallMainAndExit (argc_argv_ptr=<optimized out>) at /home/geemili/code/zig-elf-symbol-debuginfo/lib/std/start.zig:532
#10 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Continuing.

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.

However, if it is in the DWARF unwinding, then it's either a miscompilation or some kind of memory referencing issue.

It reminds me of an error I introduced in the x86_64 during #22077, which made release binaries not work:

~/code/zig/build-example-elfsymtab-backtrace> ./main                                                                                                                                    11/26/2024 07:37:00 PM
thread 76315 panic: integer overflow
Unwind error at address `exe:0x101b443` (error.MissingFDE), trace may be incomplete

Error: nu::shell::core_dumped

  × External command core dumped
   ╭─[entry #12:1:1]
 1 │ ./main
   · ───┬──
   ·    ╰── core dumped with SIGABRT (6)
   ╰────

~/code/zig/build-example-elfsymtab-backtrace> readelf -S main | rg eh_frame                                                                                                          -6 11/26/2024 07:37:05 PM
  [ 2] .eh_frame_hdr     PROGBITS         0000000001007da4  00007da4
  [ 3] .eh_frame         PROGBITS         0000000001008448  00008448

Which I then fixed with this change:

diff --git a/lib/std/debug/SelfInfo.zig b/lib/std/debug/SelfInfo.zig
index 81e489dfc4..1e06ac10cb 100644
--- a/lib/std/debug/SelfInfo.zig
+++ b/lib/std/debug/SelfInfo.zig
@@ -1112,7 +1112,7 @@
         _ = allocator;
         _ = address;
         return switch (this.*) {
-            .dwarf => |dwarf_info| &dwarf_info.dwarf,
+            .dwarf => |*dwarf_info| &dwarf_info.dwarf,
             .symtab => null,
         };
     }

But it isn't exactly the same, obviously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants