Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SYCLCompat] Optimize/(fix?) permute_sub_group_by_xor if `logical_sub…
…_group_size == 32` (#16646) `syclcompat::permute_sub_group_by_xor` was reported to flakily fail on L0. Closer inspection revealed that the implementation of `permute_sub_group_by_xor` is incorrect for cases where `logical_sub_group_size != 32`, which is one of the test cases. This implies that the test itself is wrong. In this PR we first optimize the part of the implementation that is valid assuming that Intel spirv builtins are correct (which is also the only case realistically a user will program): case `logical_sub_group_size == 32`, in order to: - Ensure the only useful case is working via the correct optimized route. - Check that this improvement doesn't break the suspicious test. A follow on PR can fix the other cases where `logical_sub_group_size != 32`: this is better to do later, since - the only use case I know of for this is to implement non-uniform group algorithms that we already have implemented (e.g. see #9671) and any user is advised to use such algorithms instead of reimplementing them themselves. - This must I think require a complete reworking of the test and would otherwise delay the more important change here. --------- Signed-off-by: JackAKirk <jack.kirk@codeplay.com>