Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always generate GEP i8 / ptradd for struct offsets #121665

Merged
merged 5 commits into from
Mar 4, 2024

Conversation

erikdesjardins
Copy link
Contributor

@erikdesjardins erikdesjardins commented Feb 27, 2024

This implements #98615, and goes a bit further to remove struct_gep entirely.

Upstream LLVM is in the beginning stages of migrating to ptradd. LLVM 19 will canonicalize all constant-offset GEPs to i8, which has roughly the same effect as this change.

Fixes #121719.

Split out from #121577.

r? @nikic

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 27, 2024
@rustbot
Copy link
Collaborator

rustbot commented Feb 27, 2024

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

Comment on lines +97 to +115
// typedef struct va_list {
// void * stack; // next stack param
// void * gr_top; // end of GP arg reg save area
// void * vr_top; // end of FP/SIMD arg reg save area
// int gr_offs; // offset from gr_top to next GP register arg
// int vr_offs; // offset from vr_top to next FP/SIMD register arg
// } va_list;
let va_list_addr = list.immediate();
let va_list_layout = list.deref(bx.cx).layout;
let va_list_ty = va_list_layout.llvm_type(bx);

// There is no padding between fields since `void*` is size=8 align=8, `int` is size=4 align=4.
// See https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst
// Table 1, Byte size and byte alignment of fundamental data types
// Table 3, Mapping of C & C++ built-in data types
let ptr_offset = 8;
let i32_offset = 4;
let gr_top = bx.inbounds_ptradd(va_list_addr, bx.cx.const_usize(ptr_offset));
let vr_top = bx.inbounds_ptradd(va_list_addr, bx.cx.const_usize(2 * ptr_offset));
let gr_offs = bx.inbounds_ptradd(va_list_addr, bx.cx.const_usize(3 * ptr_offset));
let vr_offs = bx.inbounds_ptradd(va_list_addr, bx.cx.const_usize(3 * ptr_offset + i32_offset));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I'd feel bad about this kind of manual layout computation, but this file is already packed full of hardcoded offsets and assumptions.

At some point we should re-evaluate whether we can use the LLVM va_arg directly.

To that point, I'm confused about the comment

The LLVM va_arg instruction is lacking in some instances, so we should only use it as a fallback.

...does Clang also implement this logic manually?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. LLVM types are not rich enough to encode all the necessary ABI requirements to make something like the va_arg instruction work correctly. It would probably be best to remove it entirely.

Comment on lines 105 to 109
let llval = match self.layout.abi {
_ if offset.bytes() == 0 => {
// Unions and newtypes only use an offset of 0.
// Also handles the first field of Scalar, ScalarPair, and Vector layouts.
self.llval
}
Abi::ScalarPair(..) => {
// FIXME(nikic): Generate this for all ABIs.
bx.inbounds_gep(bx.type_i8(), self.llval, &[bx.const_usize(offset.bytes())])
}
Abi::Scalar(_) | Abi::Vector { .. } if field.is_zst() => {
// ZST fields (even some that require alignment) are not included in Scalar,
// ScalarPair, and Vector layouts, so manually offset the pointer.
bx.gep(bx.cx().type_i8(), self.llval, &[bx.const_usize(offset.bytes())])
}
Abi::Scalar(_) => {
// All fields of Scalar layouts must have been handled by this point.
// Vector layouts have additional fields for each element of the vector, so don't panic in that case.
bug!(
"offset of non-ZST field `{:?}` does not match layout `{:#?}`",
field,
self.layout
);
}
_ => {
let ty = bx.backend_type(self.layout);
bx.struct_gep(ty, self.llval, bx.cx().backend_field_index(self.layout, ix))
}
let llval = if offset.bytes() == 0 {
self.llval
} else {
bx.inbounds_ptradd(self.llval, bx.const_usize(offset.bytes()))
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code used non-inbounds GEP for ZSTs. But it's okay for inbounds to point one-past-the end:

The base pointer has an in bounds address of an allocated object, which means that it points into an allocated object, or to its end.

...which handles the ZST being "outside" of e.g. struct Foo(u64, ()) (where it is at an offset of 8 bytes), so I think using inbounds for ZST fields is okay. Unless ZSTs can be arbitrarily far outside the bounds of a layout? (@oli-obk?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miri doesn't catch it yet: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=682be508e94b238472eccdb9e1fdef3b

And it seems to be an open question on whether it is ok or not: rust-lang/unsafe-code-guidelines#93

Tho looks like it's leaning towards it being ok I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the initial field offset in that example (x.1) will go through this code, so I think it would be fine...but I'll change it back to non-inbounds for ZSTs anyways, so that the only functional change in this PR is the GEP i8/ptradd change, and since ZST field offsets should never be performance critical, so it doesn't really matter.

compiler/rustc_codegen_ssa/src/mir/place.rs Outdated Show resolved Hide resolved
@Noratrieb
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 27, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 27, 2024
Always generate GEP i8 / ptradd for struct offsets

This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely.

Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change.

Split out from rust-lang#121577.

r? `@nikic`
@bors
Copy link
Contributor

bors commented Feb 27, 2024

⌛ Trying commit 4724cd4 with merge 85edff9...

@bors
Copy link
Contributor

bors commented Feb 27, 2024

☀️ Try build successful - checks-actions
Build commit: 85edff9 (85edff996ef166238dcfa53fcba0386f7a6cc2f5)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (85edff9): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.4%] 3
Regressions ❌
(secondary)
0.5% [0.1%, 1.0%] 15
Improvements ✅
(primary)
-0.6% [-1.6%, -0.3%] 11
Improvements ✅
(secondary)
-0.6% [-1.2%, -0.2%] 10
All ❌✅ (primary) -0.4% [-1.6%, 0.4%] 14

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.1% [3.1%, 3.1%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.4% [2.3%, 2.5%] 2
Improvements ✅
(primary)
-1.0% [-1.6%, -0.6%] 20
Improvements ✅
(secondary)
-1.0% [-1.6%, -0.7%] 3
All ❌✅ (primary) -1.0% [-1.6%, -0.6%] 20

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.1%, 1.0%] 12
Regressions ❌
(secondary)
0.4% [0.0%, 1.0%] 18
Improvements ✅
(primary)
-0.4% [-0.7%, -0.0%] 9
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.7%, 1.0%] 21

Bootstrap: 649.3s -> 644.43s (-0.75%)
Artifact size: 311.08 MiB -> 311.01 MiB (-0.02%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 27, 2024
@nikic
Copy link
Contributor

nikic commented Mar 1, 2024

We did not encounter any significant issues when we started canonicalizing these upstream, so I think it's fairly safe to do this step now. We could wait until the LLVM 19 release, but it doesn't really seem necessary.

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Mar 1, 2024

📌 Commit 4724cd4 has been approved by nikic

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 1, 2024
@RalfJung
Copy link
Member

RalfJung commented Mar 2, 2024

This PR is at the top of the bors queue but not being processed...

@RalfJung RalfJung closed this Mar 2, 2024
@RalfJung RalfJung reopened this Mar 2, 2024
@nikic
Copy link
Contributor

nikic commented Mar 2, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Mar 2, 2024

📌 Commit 4016510 has been approved by nikic

It is now in the queue for this repository.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (70aa0b8): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.4%] 3
Regressions ❌
(secondary)
0.5% [0.2%, 1.0%] 12
Improvements ✅
(primary)
-0.7% [-1.6%, -0.3%] 14
Improvements ✅
(secondary)
-0.5% [-1.0%, -0.3%] 12
All ❌✅ (primary) -0.5% [-1.6%, 0.4%] 17

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.6% [-1.6%, -1.6%] 1
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.2% [-1.6%, -0.9%] 4
Improvements ✅
(secondary)
-7.3% [-7.4%, -7.2%] 2
All ❌✅ (primary) -1.2% [-1.6%, -0.9%] 4

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.1%, 1.0%] 12
Regressions ❌
(secondary)
1.0% [0.0%, 1.3%] 37
Improvements ✅
(primary)
-0.4% [-0.7%, -0.0%] 9
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.7%, 1.0%] 21

Bootstrap: 649.87s -> 644.02s (-0.90%)
Artifact size: 175.04 MiB -> 174.95 MiB (-0.06%)

@erikdesjardins erikdesjardins deleted the ptradd branch March 4, 2024 01:55
@nnethercote
Copy link
Contributor

Are the binary size increases expected?

@erikdesjardins
Copy link
Contributor Author

It's not unexpected per se, since an intended consequence of this is more GEPs being merged, which can push things under inlining/vectorization thresholds.

It looks like almost all of the size regressions are nearly the exact same magnitude:

image
js diff
[...document.querySelectorAll('tr.toggle')].forEach(e => {
    const [before, after] = [...e.querySelectorAll('td.numeric')];

    const td = document.createElement('td');
    const diff = after.querySelector('abbr').title - before.querySelector('abbr').title;

    td.textContent = diff > 0 ? '+' + diff : diff;
    
    e.append(td);
});

which suggests that something in the std "runtime" is getting slightly bigger.

I'll diff one of those and see if I can figure out what it is

@Mark-Simulacrum Mark-Simulacrum added the perf-regression-triaged The performance regression has been triaged. label Mar 5, 2024
GuillaumeGomez pushed a commit to GuillaumeGomez/rust that referenced this pull request Mar 5, 2024
Always generate GEP i8 / ptradd for struct offsets

This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely.

Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change.

Fixes rust-lang#121719.

Split out from rust-lang#121577.

r? `@nikic`
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 6, 2024
Use GEP inbounds for ZST and DST field offsets

ZST field offsets have been non-`inbounds` since I made [this old layout change](https://github.com/rust-lang/rust/pull/73453/files#diff-160634de1c336f2cf325ff95b312777326f1ab29fec9b9b21d5ee9aae215ecf5). Before that, they would have been `inbounds` due to using `struct_gep`. Using `inbounds` for ZSTs likely doesn't matter for performance, but I'd like to remove the special case.

DST field offsets have been non-`inbounds` since the alignment-aware DST field offset computation was first [implemented](erikdesjardins@a2557d4#diff-04fd352da30ca186fe0bb71cc81a503d1eb8a02ca17a3769e1b95981cd20964aR1188) in 1.6 (back then `GEPi()` would be used for `inbounds`), but I don't think there was any reason for it.

Split out from rust-lang#121577 / rust-lang#121665.

r? `@oli-obk`

cc `@RalfJung` -- is there some weird situation where field offsets can't be `inbounds`?

Note that it's fine for `inbounds` offsets to be one-past-the-end, so it's okay even if there's a ZST as the last field in the layout:

> The base pointer has an in bounds address of an allocated object, which means that it points into an allocated object, or to its end. [(link)](https://llvm.org/docs/LangRef.html#getelementptr-instruction)

For rust-lang/unsafe-code-guidelines#93, zero-offset GEP is (now) always `inbounds`:

> Note that getelementptr with all-zero indices is always considered to be inbounds, even if the base pointer does not point to an allocated object. [(link)](https://llvm.org/docs/LangRef.html#getelementptr-instruction)
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 8, 2024
Use GEP inbounds for ZST and DST field offsets

ZST field offsets have been non-`inbounds` since I made [this old layout change](https://github.com/rust-lang/rust/pull/73453/files#diff-160634de1c336f2cf325ff95b312777326f1ab29fec9b9b21d5ee9aae215ecf5). Before that, they would have been `inbounds` due to using `struct_gep`. Using `inbounds` for ZSTs likely doesn't matter for performance, but I'd like to remove the special case.

DST field offsets have been non-`inbounds` since the alignment-aware DST field offset computation was first [implemented](erikdesjardins@a2557d4#diff-04fd352da30ca186fe0bb71cc81a503d1eb8a02ca17a3769e1b95981cd20964aR1188) in 1.6 (back then `GEPi()` would be used for `inbounds`), but I don't think there was any reason for it.

Split out from rust-lang#121577 / rust-lang#121665.

r? `@oli-obk`

cc `@RalfJung` -- is there some weird situation where field offsets can't be `inbounds`?

Note that it's fine for `inbounds` offsets to be one-past-the-end, so it's okay even if there's a ZST as the last field in the layout:

> The base pointer has an in bounds address of an allocated object, which means that it points into an allocated object, or to its end. [(link)](https://llvm.org/docs/LangRef.html#getelementptr-instruction)

For rust-lang/unsafe-code-guidelines#93, zero-offset GEP is (now) always `inbounds`:

> Note that getelementptr with all-zero indices is always considered to be inbounds, even if the base pointer does not point to an allocated object. [(link)](https://llvm.org/docs/LangRef.html#getelementptr-instruction)
@erikdesjardins
Copy link
Contributor Author

erikdesjardins commented Mar 10, 2024

The regression is all in backtrace-related code:

_ZN3std12backtrace_rs9symbolize5gimli7Context3new                                                              +288 bytes (17760 -> 18048) +047 insts (3525 -> 3572), +0 funcs ( 1 ->  1)
_ZN9addr2line16ResUnit$LT$R$GT$25find_function_or_location28_$u7b$$u7b$closure$u7d$$u7d$                       +256 bytes (11792 -> 12048) +023 insts (2507 -> 2530), +0 funcs ( 1 ->  1)
_ZN3std12backtrace_rs9symbolize5gimli7resolve                                                                  +240 bytes (14800 -> 15040) +025 insts (2709 -> 2734), +0 funcs ( 1 ->  1)
_ZN4core6option19Option$LT$$RF$T$GT$6cloned                                                                    +176 bytes (    0 ->   176) +056 insts (   0 ->   56), +1 funcs ( 0 ->  1)
_ZN5gimli4read8rnglists20RngListIter$LT$R$GT$4next                                                             +160 bytes ( 3248 ->  3408) +043 insts ( 813 ->  856), +0 funcs ( 1 ->  1)
_ZN9addr2line11render_file                                                                                     +112 bytes (  848 ->   960) +034 insts ( 201 ->  235), +0 funcs ( 1 ->  1)
_ZN93_$LT$gimli..read..line..LineProgramHeader$LT$R$C$Offset$GT$$u20$as$u20$core..clone..Clone$GT$5clone       +096 bytes ( 1120 ->  1216) +033 insts ( 244 ->  277), +0 funcs ( 1 ->  1)
_ZN5gimli4read5dwarf13Unit$LT$R$GT$3new                                                                        +064 bytes (10224 -> 10288) +016 insts (2063 -> 2079), +0 funcs ( 1 ->  1)
_ZN5gimli4read4unit18Attribute$LT$R$GT$5value                                                                  +032 bytes ( 1328 ->  1360) +006 insts ( 382 ->  388), +0 funcs ( 1 ->  1)
_ZN9addr2line30LoopingLookup$LT$T$C$L$C$F$GT$10new_lookup                                                      +032 bytes ( 1536 ->  1568) -011 insts ( 352 ->  341), +0 funcs ( 1 ->  1)
_ZN9addr2line5Lines5parse                                                                                      +016 bytes ( 9920 ->  9936) +012 insts (2144 -> 2156), +0 funcs ( 1 ->  1)
_ZN5gimli4read5dwarf14Dwarf$LT$R$GT$11attr_string                                                              +016 bytes (  592 ->   608) +004 insts ( 165 ->  169), +0 funcs ( 1 ->  1)
_ZN5alloc7raw_vec11finish_grow                                                                                 +016 bytes (  576 ->   592) +003 insts ( 188 ->  191), +0 funcs ( 4 ->  4)
_ZN3std2io5Write18write_all_vectored                                                                           +016 bytes ( 1328 ->  1344) +003 insts ( 347 ->  350), +0 funcs ( 2 ->  2)
_ZN9addr2line16ResUnit$LT$R$GT$18dwarf_and_unit_dwo                                                            +016 bytes ( 1376 ->  1392) +002 insts ( 289 ->  291), +0 funcs ( 1 ->  1)
_ZN11miniz_oxide7inflate4core10decompress                                                                      +016 bytes ( 7888 ->  7904) +001 insts (1913 -> 1914), +0 funcs ( 1 ->  1)
_ZN5alloc7raw_vec19RawVec$LT$T$C$A$GT$7reserve21do_reserve_and_handle                                          -016 bytes (  944 ->   928) +008 insts ( 232 ->  240), +0 funcs ( 5 ->  5)
_ZN5gimli4read6abbrev13Abbreviations6insert                                                                    -016 bytes ( 4864 ->  4848) +006 insts ( 918 ->  924), +0 funcs ( 1 ->  1)
_ZN9addr2line8function17Function$LT$R$GT$14parse_children                                                      -016 bytes ( 5376 ->  5360) +003 insts (1088 -> 1091), +0 funcs ( 1 ->  1)
_ZN3std4path7PathBuf14_set_extension                                                                           -016 bytes (  624 ->   608) +000 insts ( 167 ->  167), +0 funcs ( 1 ->  1)
_ZN5alloc3ffi5c_str7CString19_from_vec_unchecked                                                               -016 bytes (  352 ->   336) +000 insts (  97 ->   97), +0 funcs ( 1 ->  1)
_ZN91_$LT$std..sys_common..backtrace.._print..DisplayBacktrace$u20$as$u20$core..fmt..Display$GT$3fmt           -016 bytes (  592 ->   576) -002 insts ( 123 ->  121), +0 funcs ( 1 ->  1)
_ZN69_$LT$std..sys..pal..unix..stdio..Stderr$u20$as$u20$std..io..Write$GT$14write_vectored                     -016 bytes (   96 ->    80) -004 insts (  29 ->   25), +0 funcs ( 1 ->  1)
_ZN5alloc7raw_vec19RawVec$LT$T$C$A$GT$16reserve_for_push                                                       -032 bytes ( 3296 ->  3264) +027 insts ( 834 ->  861), +0 funcs (17 -> 17)

(Option::cloned doesn't get inlined...in _ZN9addr2line11render_file.)

(Generated with this ad-hoc script. There's probably some existing way to do this though, isn't there...)

It's kind of interesting that the regression is spread across a bunch of different functions but all in backtrace-related code. Although I guess it's just because one function was optimized differently and the butterfly effect cascaded to everything else in its call graph.

Looking at gimli7Context3new, it's very hard to see anything meaningful in the asm diff, even with all constants and registers normalized, hundreds of instructions are moved around. It doesn't look obviously worse, just...perturbed. Same with find_function_or_location.

So I don't really see anything actionable here.

It does make me wonder if we should be compiling backtrace-related crates with opt-level=s though--there's a ton of enormous functions and just scanning through (not necessarily in the changed functions) I see some unrolled loops. Size is probably more of a priority than runtime performance for backtrace code.

@Kobzol
Copy link
Contributor

Kobzol commented Mar 10, 2024

(Generated with this ad-hoc script. There's probably some existing way to do this though, isn't there...)

$  cargo run --bin collector binary_stats --symbols <rustc>

But it will show slightly different stuff than your script.

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 10, 2024
Add #[inline] to Option::copied/cloned

In rust-lang#121665 (comment), I noticed that `Option::cloned` stopped being inlined in some backtrace code. Let's see if this helps.

r? `@ghost`
@workingjubilee
Copy link
Member

It does make me wonder if we should be compiling backtrace-related crates with opt-level=s though--there's a ton of enormous functions and just scanning through (not necessarily in the changed functions) I see some unrolled loops. Size is probably more of a priority than runtime performance for backtrace code.

Runtime performance isn't zero concern for backtrace code, oddly enough (mainly for the "forced capture" case), but backtrace-rs contains algorithmic optimizations which already make the appropriate trades, so yes, if you can get a significantly smaller code size by optimizing in that direction, that would be good.

We can also look at working with the addr2line and gimli maintainers to reduce the code size a bit, if we can identify the heaviest-hitters in those crates.

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 11, 2024
Use ptradd for vtable indexing

Extension of rust-lang#121665.

After this, the only remaining usages of GEP are [this](https://github.com/rust-lang/rust/blob/cd81f5b27ee00b49d413db50b5e6af871cebcf23/compiler/rustc_codegen_llvm/src/intrinsic.rs#L909-L920) kinda janky Emscription EH code, which I'll change in a future PR, and array indexing / pointer offsets, where there isn't yet a canonical `ptradd` form. (Out of curiosity I tried converting the latter to `ptradd(ptr, mul(size, index))`, but that causes codegen regressions right now.)

r? `@nikic`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 11, 2024
…=davidtwco

Remove the unused `field_remapping` field from `TypeLowering`

The `field_remapping` field of `TypeLowering` has  been unused since rust-lang#121665. This PR removes it, then replaces the `TypeLowering` struct with its only remaining member `&'ll Type`.
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Mar 11, 2024
…=davidtwco

Remove the unused `field_remapping` field from `TypeLowering`

The `field_remapping` field of `TypeLowering` has  been unused since rust-lang#121665. This PR removes it, then replaces the `TypeLowering` struct with its only remaining member `&'ll Type`.
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Mar 11, 2024
Use ptradd for vtable indexing

Extension of rust-lang#121665.

After this, the only remaining usages of GEP are [this](https://github.com/rust-lang/rust/blob/cd81f5b27ee00b49d413db50b5e6af871cebcf23/compiler/rustc_codegen_llvm/src/intrinsic.rs#L909-L920) kinda janky Emscription EH code, which I'll change in a future PR, and array indexing / pointer offsets, where there isn't yet a canonical `ptradd` form. (Out of curiosity I tried converting the latter to `ptradd(ptr, mul(size, index))`, but that causes codegen regressions right now.)

r? `@nikic`
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Mar 11, 2024
Rollup merge of rust-lang#122320 - erikdesjardins:vtable, r=nikic

Use ptradd for vtable indexing

Extension of rust-lang#121665.

After this, the only remaining usages of GEP are [this](https://github.com/rust-lang/rust/blob/cd81f5b27ee00b49d413db50b5e6af871cebcf23/compiler/rustc_codegen_llvm/src/intrinsic.rs#L909-L920) kinda janky Emscription EH code, which I'll change in a future PR, and array indexing / pointer offsets, where there isn't yet a canonical `ptradd` form. (Out of curiosity I tried converting the latter to `ptradd(ptr, mul(size, index))`, but that causes codegen regressions right now.)

r? `@nikic`
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Mar 11, 2024
Rollup merge of rust-lang#122166 - beetrees:remove-field-remapping, r=davidtwco

Remove the unused `field_remapping` field from `TypeLowering`

The `field_remapping` field of `TypeLowering` has  been unused since rust-lang#121665. This PR removes it, then replaces the `TypeLowering` struct with its only remaining member `&'ll Type`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Common branch folding for enum types fails when there are more than three enum variants.