Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
autovec specialization framework (#3393)
Summary: X-link: facebookresearch/FBGEMM#481 To have auto-vectorized code be competitive with asmjit we need to specialize the generic code to a some fixed parameters. We cannot specialize at runtime, so this introduce a framework to specialize for a given set of parameters at compile time and choose between existing specializations at runtime. The framework added here allows to specify lines like the following for a given function. Each parameter can be set to `var` to not specialize it or `fixed(C)` to create a specialized version with that parameter set to the constant value `C`. Example: ``` SPECIALIZE( /*BIT_RATE=*/fixed(2), /*BLOCK_SIZE=*/var, /*HAS_WEIGHT=*/fixed(true), /*NORMALIZE_BY_LENGTHS=*/var, /*PREFETCH=*/var, /*IS_WEIGHT_POSITIONAL=*/var, /*USE_OFFSETS=*/var, /*OUTPUT_STRIDE=*/fixed(int64_t{-1}), /*INPUT_STRIDE=*/fixed(int64_t{-1}), /*SCALE_BIAS_LAST=*/fixed(true), /*NO_BAG=*/fixed(false), /*IS_BF16_OUT=*/var, /*IS_BF16_IN=*/var) ``` This diff introduces some exemplary specialization for `GenerateEmbeddingSpMDMWithStrides_autovec` and `GenerateEmbeddingSpMDMNBitWithStrides_autovec` specializing them for bit_rate 2, 4 and block sizes 32, 64, 128. This framework should make it easy to tune for common use-cases in production by specializing the commonly used parameters or remove specializations to conserve code size. Differential Revision: D62984408
- Loading branch information