Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autovec specialization framework #3393

Closed
wants to merge 1 commit into from
Closed

Conversation

MatzeB
Copy link
Contributor

@MatzeB MatzeB commented Nov 19, 2024

Summary:
To have auto-vectorized code be competitive with asmjit we need to specialize the generic code to a some fixed parameters. We cannot specialize at runtime, so this introduce a framework to specialize for a given set of parameters at compile time and choose between existing specializations at runtime.

The framework added here allows to specify lines like the following for a given function.
Each parameter can be set to var to not specialize it or fixed(C) to create a specialized version with that parameter set to the constant value C. Example:

SPECIALIZE(
      /*BIT_RATE=*/fixed(2),
      /*BLOCK_SIZE=*/var,
      /*HAS_WEIGHT=*/fixed(true),
      /*NORMALIZE_BY_LENGTHS=*/var,
      /*PREFETCH=*/var,
      /*IS_WEIGHT_POSITIONAL=*/var,
      /*USE_OFFSETS=*/var,
      /*OUTPUT_STRIDE=*/fixed(int64_t{-1}),
      /*INPUT_STRIDE=*/fixed(int64_t{-1}),
      /*SCALE_BIAS_LAST=*/fixed(true),
      /*NO_BAG=*/fixed(false),
      /*IS_BF16_OUT=*/var,
      /*IS_BF16_IN=*/var)

This diff introduces some exemplary specialization for GenerateEmbeddingSpMDMWithStrides_autovec and GenerateEmbeddingSpMDMNBitWithStrides_autovec specializing them for bit_rate 2, 4 and block sizes 32, 64, 128.

This framework should make it easy to tune for common use-cases in production by specializing the commonly used parameters or remove specializations to conserve code size.

Differential Revision: D62984408

Copy link

netlify bot commented Nov 19, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 53aec86
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/673e2cd71b55570008d4b105
😎 Deploy Preview https://deploy-preview-3393--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62984408

MatzeB added a commit to MatzeB/FBGEMM that referenced this pull request Nov 19, 2024
Summary:

X-link: facebookresearch/FBGEMM#481

To have auto-vectorized code be competitive with asmjit we need to specialize the generic code to a some fixed parameters. We cannot specialize at runtime, so this introduce a framework to specialize for a given set of parameters at compile time and choose between existing specializations at runtime.

The framework added here allows to specify lines like the following for a given function.
Each parameter can be set to `var` to not specialize it or `fixed(C)` to create a specialized version with that parameter set to the constant value `C`. Example:

```
SPECIALIZE(
      /*BIT_RATE=*/fixed(2),
      /*BLOCK_SIZE=*/var,
      /*HAS_WEIGHT=*/fixed(true),
      /*NORMALIZE_BY_LENGTHS=*/var,
      /*PREFETCH=*/var,
      /*IS_WEIGHT_POSITIONAL=*/var,
      /*USE_OFFSETS=*/var,
      /*OUTPUT_STRIDE=*/fixed(int64_t{-1}),
      /*INPUT_STRIDE=*/fixed(int64_t{-1}),
      /*SCALE_BIAS_LAST=*/fixed(true),
      /*NO_BAG=*/fixed(false),
      /*IS_BF16_OUT=*/var,
      /*IS_BF16_IN=*/var)
```

This diff introduces some exemplary specialization for `GenerateEmbeddingSpMDMWithStrides_autovec` and `GenerateEmbeddingSpMDMNBitWithStrides_autovec` specializing them for bit_rate 2, 4 and block sizes 32, 64, 128.

This framework should make it easy to tune for common use-cases in production by specializing the commonly used parameters or remove specializations to conserve code size.

Differential Revision: D62984408
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62984408

Summary:

X-link: facebookresearch/FBGEMM#481

To have auto-vectorized code be competitive with asmjit we need to specialize the generic code to a some fixed parameters. We cannot specialize at runtime, so this introduce a framework to specialize for a given set of parameters at compile time and choose between existing specializations at runtime.

The framework added here allows to specify lines like the following for a given function.
Each parameter can be set to `var` to not specialize it or `fixed(C)` to create a specialized version with that parameter set to the constant value `C`. Example:

```
SPECIALIZE(
      /*BIT_RATE=*/fixed(2),
      /*BLOCK_SIZE=*/var,
      /*HAS_WEIGHT=*/fixed(true),
      /*NORMALIZE_BY_LENGTHS=*/var,
      /*PREFETCH=*/var,
      /*IS_WEIGHT_POSITIONAL=*/var,
      /*USE_OFFSETS=*/var,
      /*OUTPUT_STRIDE=*/fixed(int64_t{-1}),
      /*INPUT_STRIDE=*/fixed(int64_t{-1}),
      /*SCALE_BIAS_LAST=*/fixed(true),
      /*NO_BAG=*/fixed(false),
      /*IS_BF16_OUT=*/var,
      /*IS_BF16_IN=*/var)
```

This diff introduces some exemplary specialization for `GenerateEmbeddingSpMDMWithStrides_autovec` and `GenerateEmbeddingSpMDMNBitWithStrides_autovec` specializing them for bit_rate 2, 4 and block sizes 32, 64, 128.

This framework should make it easy to tune for common use-cases in production by specializing the commonly used parameters or remove specializations to conserve code size.

Reviewed By: excelle08

Differential Revision: D62984408
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62984408

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 7c35026.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants