JIT: Improve x86 unsigned to floating cast codegen #111595
Open
+94
−151
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #77658
This improves codegen mostly for unsigned to floating types but catches a few other redundant conversions.
Adds support for using AVX-512
vcvtusi2s[sd]
for uint -> float/double (only ulong was handled previously) on both x64 and x86.Improves codegen for uint -> float conversions on x64 without AVX-512, removing the intermediate conversion to double.
Adds support for direct ulong -> float cast to the x64 SSE2 fallback, resolving a difference in behavior between hardware with AVX-512 vs without, and saving an extra
cvtsd2ss
instruction.Removes some redundant float -> double -> float casts.
SPMI Diffs
The only code size regressions are the insertion of
xorps
to clear the upper elements of the target reg for the AVX-512 unsigned conversion instructions. These were previously omitted but should have been there since the unsigned conversions have the same behavior as the signed (i.e. preserving/copying upper elements) and are subject to the same false dependency penalties.GCC emits the
xorps
for all conversions; Clang skips it for all conversions in simple examples but may emit it in more complex scenarios.https://godbolt.org/z/6aY7fdE3d