Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Improve x86 unsigned to floating cast codegen #111595

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Jan 19, 2025

Fixes #77658

This improves codegen mostly for unsigned to floating types but catches a few other redundant conversions.

  • Adds support for using AVX-512 vcvtusi2s[sd] for uint -> float/double (only ulong was handled previously) on both x64 and x86.

    -       mov      eax, edx
            vxorps   xmm0, xmm0, xmm0
    -       vcvtsi2sd xmm0, xmm0, rax
    -       vcvtsd2ss xmm0, xmm0, xmm0
    +       vcvtusi2ss xmm0, edx
    -       mov      eax, dword ptr [rbp-0x04]
    -       mov      eax, eax
            vxorps   xmm0, xmm0, xmm0
    -       vcvtsi2sd xmm0, xmm0, rax
    +       vcvtusi2sd xmm0, dword ptr [rbp-0x04]
  • Improves codegen for uint -> float conversions on x64 without AVX-512, removing the intermediate conversion to double.

            mov      eax, edi
            xorps    xmm0, xmm0
    -       cvtsi2sd xmm0, rax
    -       cvtsd2ss xmm0, xmm0
    +       cvtsi2ss xmm0, rax
  • Adds support for direct ulong -> float cast to the x64 SSE2 fallback, resolving a difference in behavior between hardware with AVX-512 vs without, and saving an extra cvtsd2ss instruction.

            xorps    xmm0, xmm0
            mov      rax, rdi
            shr      rax, 1
            mov      rsi, edi
            and      rsi, 1
            or       rsi, rax
            test     rdi, rdi
            cmovns   rsi, rdi
    -       cvtsi2sd xmm0, rsi
    +       cvtsi2ss xmm0, rsi
            jns      SHORT G_M37561_IG56
    -       addsd    xmm0, xmm0
    +       addss    xmm0, xmm0
     G_M37561_IG56:
    -       cvtsd2ss xmm0, xmm0
  • Removes some redundant float -> double -> float casts.

    -       vcvtss2sd xmm1, xmm1, xmm1
    -       vcvtsd2ss xmm1, xmm1, xmm1
            vbroadcastss xmm1, xmm1

SPMI Diffs

The only code size regressions are the insertion of xorps to clear the upper elements of the target reg for the AVX-512 unsigned conversion instructions. These were previously omitted but should have been there since the unsigned conversions have the same behavior as the signed (i.e. preserving/copying upper elements) and are subject to the same false dependency penalties.

GCC emits the xorps for all conversions; Clang skips it for all conversions in simple examples but may emit it in more complex scenarios.
https://godbolt.org/z/6aY7fdE3d

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 19, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 19, 2025
@saucecontrol
Copy link
Member Author

@MihuBot

@saucecontrol saucecontrol marked this pull request as ready for review January 19, 2025 22:17
@saucecontrol saucecontrol changed the title JIT: Improve x86 integral to floating cast codegen JIT: Improve x86 unsigned to floating cast codegen Jan 19, 2025
@saucecontrol
Copy link
Member Author

cc @dotnet/jit-contrib this is ready for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance: JIT is emitting multiple conversion instructions when using float-math
1 participant