-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
autovec specialization framework #3393
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This pull request was exported from Phabricator. Differential Revision: D62984408 |
MatzeB
added a commit
to MatzeB/FBGEMM
that referenced
this pull request
Nov 19, 2024
Summary: X-link: facebookresearch/FBGEMM#481 To have auto-vectorized code be competitive with asmjit we need to specialize the generic code to a some fixed parameters. We cannot specialize at runtime, so this introduce a framework to specialize for a given set of parameters at compile time and choose between existing specializations at runtime. The framework added here allows to specify lines like the following for a given function. Each parameter can be set to `var` to not specialize it or `fixed(C)` to create a specialized version with that parameter set to the constant value `C`. Example: ``` SPECIALIZE( /*BIT_RATE=*/fixed(2), /*BLOCK_SIZE=*/var, /*HAS_WEIGHT=*/fixed(true), /*NORMALIZE_BY_LENGTHS=*/var, /*PREFETCH=*/var, /*IS_WEIGHT_POSITIONAL=*/var, /*USE_OFFSETS=*/var, /*OUTPUT_STRIDE=*/fixed(int64_t{-1}), /*INPUT_STRIDE=*/fixed(int64_t{-1}), /*SCALE_BIAS_LAST=*/fixed(true), /*NO_BAG=*/fixed(false), /*IS_BF16_OUT=*/var, /*IS_BF16_IN=*/var) ``` This diff introduces some exemplary specialization for `GenerateEmbeddingSpMDMWithStrides_autovec` and `GenerateEmbeddingSpMDMNBitWithStrides_autovec` specializing them for bit_rate 2, 4 and block sizes 32, 64, 128. This framework should make it easy to tune for common use-cases in production by specializing the commonly used parameters or remove specializations to conserve code size. Differential Revision: D62984408
MatzeB
force-pushed
the
export-D62984408
branch
from
November 19, 2024 01:23
44dd829
to
3d01bc8
Compare
This pull request was exported from Phabricator. Differential Revision: D62984408 |
Summary: X-link: facebookresearch/FBGEMM#481 To have auto-vectorized code be competitive with asmjit we need to specialize the generic code to a some fixed parameters. We cannot specialize at runtime, so this introduce a framework to specialize for a given set of parameters at compile time and choose between existing specializations at runtime. The framework added here allows to specify lines like the following for a given function. Each parameter can be set to `var` to not specialize it or `fixed(C)` to create a specialized version with that parameter set to the constant value `C`. Example: ``` SPECIALIZE( /*BIT_RATE=*/fixed(2), /*BLOCK_SIZE=*/var, /*HAS_WEIGHT=*/fixed(true), /*NORMALIZE_BY_LENGTHS=*/var, /*PREFETCH=*/var, /*IS_WEIGHT_POSITIONAL=*/var, /*USE_OFFSETS=*/var, /*OUTPUT_STRIDE=*/fixed(int64_t{-1}), /*INPUT_STRIDE=*/fixed(int64_t{-1}), /*SCALE_BIAS_LAST=*/fixed(true), /*NO_BAG=*/fixed(false), /*IS_BF16_OUT=*/var, /*IS_BF16_IN=*/var) ``` This diff introduces some exemplary specialization for `GenerateEmbeddingSpMDMWithStrides_autovec` and `GenerateEmbeddingSpMDMNBitWithStrides_autovec` specializing them for bit_rate 2, 4 and block sizes 32, 64, 128. This framework should make it easy to tune for common use-cases in production by specializing the commonly used parameters or remove specializations to conserve code size. Reviewed By: excelle08 Differential Revision: D62984408
MatzeB
force-pushed
the
export-D62984408
branch
from
November 20, 2024 18:39
3d01bc8
to
53aec86
Compare
This pull request was exported from Phabricator. Differential Revision: D62984408 |
This pull request has been merged in 7c35026. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
To have auto-vectorized code be competitive with asmjit we need to specialize the generic code to a some fixed parameters. We cannot specialize at runtime, so this introduce a framework to specialize for a given set of parameters at compile time and choose between existing specializations at runtime.
The framework added here allows to specify lines like the following for a given function.
Each parameter can be set to
var
to not specialize it orfixed(C)
to create a specialized version with that parameter set to the constant valueC
. Example:This diff introduces some exemplary specialization for
GenerateEmbeddingSpMDMWithStrides_autovec
andGenerateEmbeddingSpMDMNBitWithStrides_autovec
specializing them for bit_rate 2, 4 and block sizes 32, 64, 128.This framework should make it easy to tune for common use-cases in production by specializing the commonly used parameters or remove specializations to conserve code size.
Differential Revision: D62984408