Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoongArch: support relaxation of pcalau12i/ld.d to pcalau12i/addi.d or pcaddi #1322

Merged
merged 7 commits into from
Aug 5, 2024

Conversation

ywgrit
Copy link
Contributor

@ywgrit ywgrit commented Aug 3, 2024

If we get symval by accessing to local got, then we can omit one memory loading.
The first phase: pcalau12i/ld.d to pcala12i/addi.d
The second phase: pcalau12i/addi.d to pcaddi

@ywgrit
Copy link
Contributor Author

ywgrit commented Aug 3, 2024

@MQ-mengqing

@ywgrit ywgrit closed this Aug 3, 2024
@ywgrit ywgrit reopened this Aug 3, 2024
u32 insn2 = *(ul32 *)(isec.contents.data() + rels[i].r_offset + 4);

// relax pcalau12i/ld.d to pcalau12i/addi.d
if (get_rd(insn1) != get_rd(insn2))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more reliable, check rj is also necessary.


i64 dist = compute_distance(ctx, sym, isec, r);
// the second phase: relax pcalau12i/addi.d to pcaddi
if (dist % 4 == 0 && -(1 << 21) < dist && dist < (1 << 21))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an eage case "-(1<<21) == dist".

rels[i+2].r_offset == rel.r_offset + 4) {
u32 insn1 = *(ul32 *)(contents.data() + rel.r_offset);
u32 insn2 = *(ul32 *)(contents.data() + rels[i+2].r_offset);
u32 rd = get_rd(insn1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a range check for pcalau12i.

@ywgrit ywgrit force-pushed the pull_to_upstream branch from 79b1e2d to 3688c43 Compare August 3, 2024 03:42
// pcalau12i/ld.d has been relaxed to pcalau12i/addi.d
// reloc the pcalau12i as R_LARCH_PLACA_HI20
write_j20(loc, hi20(S + A, P));
break;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should directly write to loc + 4 and then i += 3 here, instead of doing the same check in case R_LARCH_GOT_PC_LO12.

Comment on lines 408 to 410
sym.is_local(ctx) &&
!sym.is_absolute() &&
!sym.is_ifunc() &&
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you want to do the same as we do for R_RISCV_GOT_HI20. Did you check that in arch-riscv.cc? We have sym.is_pcrel_linktime_const(ctx) to check if the symbol's PC-relative address is link-time constant.

!sym.is_absolute() &&
!sym.is_ifunc() &&
ctx.arg.relax &&
rels[i+1].r_type == R_LARCH_RELAX &&
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to pay more attention to the details -- add spaces around +. This is how we format it in other places.

rels[i+3].r_type == R_LARCH_RELAX &&
rels[i+2].r_type == R_LARCH_GOT_PC_LO12 &&
rels[i+2].r_offset == rel.r_offset + 4) {
u32 insn1 = *(ul32 *)(contents.data() + rel.r_offset);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also indentation.

Comment on lines 5 to 9
.globl get_sym1, get_sym2, get_sym3, get_sym4, get_sym5
get_sym1:
la.global $a0, sym1
ld.w $a0, $a0, 0
ret
get_sym2:
la.global $a0, sym2
ld.w $a0, $a0, 0
ret
get_sym3:
la.global $a0, sym3
ld.w $a0, $a0, 0
ret
get_sym4:
la.global $a0, sym4
ld.w $a0, $a0, 0
ret
get_sym5:
la.global $a0, sym5
ld.w $a0, $a0, 0
ret
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need five symbols for this test, as they are testing exactly the same relaxation.

Comment on lines 367 to 396
if (i >= 2 && get_r_delta(i-1) - get_r_delta(i-2) == 4) {
// pcalau12i/ld.d has been relaxed to pcalau12i/addi.d, and
// then the pair has been relaxed to pcaddi.
// loc stores 'ld.d', rewrite ld.d with pcaddi
*(ul32 *)loc = 0x1800'0000 | get_rd(*(ul32 *)loc);
write_j20(loc, (S + A - P) >> 2);
} else {
if (i >= 2 &&
i + 1 < rels.size() &&
sym.is_local(ctx) &&
!sym.is_absolute() &&
!sym.is_ifunc() &&
ctx.arg.relax &&
rels[i-1].r_type == R_LARCH_RELAX &&
rels[i+1].r_type == R_LARCH_RELAX &&
rels[i-2].r_type == R_LARCH_GOT_PC_HI20 &&
rels[i-2].r_offset == rel.r_offset - 4) {
u32 insn1 = *(ul32 *)(contents.data() + rels[i-2].r_offset);
u32 insn2 = *(ul32 *)(contents.data() + rel.r_offset);
u32 rd = get_rd(insn1);

if (rd == get_rd(insn2)) {
// pcalau12i/ld.d has been relaxed to pcalau12i/addi.d
// rewrite the ld.d with addi.d
*(ul32 *)loc = 0x02c00000 | rd | (rd << 5);
write_k12(loc, S + A);
break;
}
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can revert this change.

@ywgrit
Copy link
Contributor Author

ywgrit commented Aug 3, 2024

Do not care about the last two commits now, there are still some issues, I'll commit the new code once I've tested it out.

@ywgrit ywgrit force-pushed the pull_to_upstream branch from 6d525b7 to 3094d7d Compare August 5, 2024 02:47
@rui314 rui314 merged commit 121f917 into rui314:main Aug 5, 2024
12 checks passed
@rui314
Copy link
Owner

rui314 commented Aug 5, 2024

I'll fix a few minor issues after merging this change. Next time, please pay more attention to indentation. Thank you!

@ywgrit
Copy link
Contributor Author

ywgrit commented Aug 5, 2024

I'll fix a few minor issues after merging this change. Next time, please pay more attention to indentation. Thank you!

Yeah, Sorry for that.

@rui314
Copy link
Owner

rui314 commented Aug 5, 2024

Thank you for doing this anyway. Did you have anything other than this, or is this the last one?

@ywgrit
Copy link
Contributor Author

ywgrit commented Aug 5, 2024

Thank you for doing this anyway. Did you have anything other than this, or is this the last one?

TLS LD/GD relaxation is the next plan.

@rui314
Copy link
Owner

rui314 commented Aug 5, 2024

As far as I know, LoongArch (wrongly) handles LD as a synonym for GD, so there should be only GD.

I believe you guys' long-term plan is to support TLSDESC and make it the default. Is this correct?

By the way, are you guys actually using mold as a linker, or just doing this for completeness? Just curious.

@ywgrit
Copy link
Contributor Author

ywgrit commented Aug 5, 2024

As far as I know, LoongArch (wrongly) handles LD as a synonym for GD, so there should be only GD.

Yeah, you are right, rui. At this stage, the implementation of ld is consistent with gd; to be precise, ld is synonymous with gd.

I believe you guys' long-term plan is to support TLSDESC and make it the default. Is this correct?

Yes.

By the way, are you guys actually using mold as a linker, or just doing this for completeness? Just curious.

Since we often need to build other tools via clang, mold greatly improves the speed of building clang for us. Also software like postgres is being tried out.

@rui314
Copy link
Owner

rui314 commented Aug 5, 2024

I actually don't think you can relax TLS GD/LD on LoongArch, because calls to __tls_get_addr are not annotated by any relocation, so you cannot eliminate that function call. You might still be able to relax a R_LARCH_TLS_GD_PC_HI20 and R_LARCH_GOT_PC_LO12 pair into a single instruction, but that optimization is marginal and probably meaningless. __tls_get_addr is expensive anyways, so I wouldn't try to do that relaxation. What were you planning to do?

@ywgrit
Copy link
Contributor Author

ywgrit commented Aug 5, 2024

I actually don't think you can relax TLS GD/LD on LoongArch, because calls to __tls_get_addr are not annotated by any relocation, so you cannot eliminate that function call. You might still be able to relax a R_LARCH_TLS_GD_PC_HI20 and R_LARCH_GOT_PC_LO12 pair into a single instruction, but that optimization is marginal and probably meaningless. __tls_get_addr is expensive anyways, so I wouldn't try to do that relaxation. What were you planning to do?

You're right, actually tls_desc is currently supported by loongarch, though it hasn't been introduced on mold yet. Considering the negligible performance gain from saving an instruction, it seems there is no need to apply relaxation on tls.

@rui314
Copy link
Owner

rui314 commented Aug 5, 2024

Then maybe you want to add TLSDESC support?

@ywgrit
Copy link
Contributor Author

ywgrit commented Aug 6, 2024

Then maybe you want to add TLSDESC support?

Yeah, I want have a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants