Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module weight quantization #2000

Merged
merged 13 commits into from
Jul 15, 2024
Merged

Module weight quantization #2000

merged 13 commits into from
Jul 15, 2024

Conversation

laggui
Copy link
Member

@laggui laggui commented Jul 9, 2024

Checklist

  • Confirmed that run-checks all script has been executed.
  • Made sure the book is up to date with changes in this PR.

Changes

Static per-tensor module quantization support added with quantize_weights method

  • Added Quantizer, MinMaxCalibration and QuantizationScheme to define the quantization
  • Changed the tensor record implementation to keep DType::QFloat tensors as is (no conversion)
  • Added methods to QTensorOps to support quantized float tensors
  • Added quantization docs

Testing

Added unit tests for MinMaxCalibration

Copy link

codecov bot commented Jul 9, 2024

Codecov Report

Attention: Patch coverage is 20.72727% with 218 lines in your changes missing coverage. Please review.

Project coverage is 84.25%. Comparing base (c30ffcf) to head (ddc8791).
Report is 4 commits behind head on main.

Files Patch % Lines
crates/burn-tch/src/ops/qtensor.rs 0.00% 53 Missing ⚠️
crates/burn-ndarray/src/ops/qtensor.rs 0.00% 29 Missing ⚠️
crates/burn-tch/src/ops/base.rs 0.00% 22 Missing ⚠️
crates/burn-autodiff/src/ops/qtensor.rs 0.00% 17 Missing ⚠️
crates/burn-candle/src/ops/qtensor.rs 0.00% 17 Missing ⚠️
crates/burn-jit/src/ops/qtensor.rs 0.00% 16 Missing ⚠️
crates/burn-fusion/src/ops/qtensor.rs 0.00% 15 Missing ⚠️
crates/burn-tensor/src/tensor/data.rs 0.00% 11 Missing ⚠️
crates/burn-tensor/src/tensor/ops/qtensor.rs 0.00% 11 Missing ⚠️
crates/burn-tensor/src/tensor/api/base.rs 43.75% 9 Missing ⚠️
... and 5 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2000      +/-   ##
==========================================
- Coverage   84.45%   84.25%   -0.20%     
==========================================
  Files         840      845       +5     
  Lines      104346   105439    +1093     
==========================================
+ Hits        88125    88838     +713     
- Misses      16221    16601     +380     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@laggui laggui force-pushed the feat/quant/module branch from e07739d to b07f980 Compare July 11, 2024 13:06
@laggui laggui marked this pull request as ready for review July 11, 2024 17:30
@laggui laggui requested a review from nathanielsimard July 11, 2024 17:33
@laggui laggui marked this pull request as draft July 11, 2024 19:21
@laggui
Copy link
Member Author

laggui commented Jul 11, 2024

Turns out the weights were still being automatically dequantized to a TensorPrimitive::Float because Param sets tensor.require_grad() when loading, which currently always returns a float primitive (i.e., dequantizes QFloat tensors).

In my tests with TinyLlama, running inference in f16 is slower when we have to dequantize before every op instead of already having the dequantized weights loaded.

So until we introduce layers with ops supported in quantized types (e.g., int8), we should dequantize the weights. Once I figure out how this should be handled (as cleanly and explicitly as possible), I'll re-open for review.

/edit: I think the best way to go about this is documentation. I feel like adding a dequantize() or float() method on modules is not required, so instead I added subsection to describe how this can be achieved (and why someone might want to do this) with a ModuleMapper. This is the solution I am currently using for loading quantized Llama weights.

@laggui laggui marked this pull request as ready for review July 12, 2024 14:44
Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM very clean

@laggui laggui merged commit 3afff43 into main Jul 15, 2024
15 checks passed
@laggui laggui deleted the feat/quant/module branch July 15, 2024 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants