-
Notifications
You must be signed in to change notification settings - Fork 27.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quantization] Switch to optimum-quanto #31732
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for handling this!
Do we have numbers on the use of quanto? This will determine how cautious we should be.
I'd suggest instead having a fallback for one version cycle e.g. for imports
if is_optimim_quanto_available():
from optimum.quanto import y
elif is_quanto_available():
warnings.warn("Importing from quanto will be deprecated in v4.44. Please install optimum-quanto instrad `pip install optimum-quanto`")
from quanto import y
and update the quanto check
def is_quanto_available():
warning.warn("....")
...
In the tests etc. we can still use just is_optimum_quanto_available
directly.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
It would be great if this PR could be revived :) Btw if you do, I think you can set import os
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, DataCollatorForLanguageModeling, TrainingArguments, Trainer
from transformers.utils.quantization_config import QuantizationMethod
from peft import LoraConfig, get_peft_model
from optimum.quanto import QuantizedModelForCausalLM, qint2, qint4, qint8
os.environ["CUDA_VISISBLE_DEVICES"] = "0"
device = "cpu"
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m")
data = load_dataset("ybelkada/english_quotes_copy")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
def main(weights):
config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m", device_map=device)
QuantizedModelForCausalLM.quantize(model, weights=weights)
class QuantizationConfig:
quant_method = QuantizationMethod.QUANTO
class HfQuantizer:
is_trainable = True # <======= is False right now
quantization_config = QuantizationConfig()
model.hf_quantizer = HfQuantizer()
model.is_quantized = True
model = get_peft_model(model, config)
assert "QLinear" in repr(model)
assert "lora.Linear" in repr(model)
trainer = Trainer(
model=model,
train_dataset=data["train"],
args=TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
warmup_steps=2,
max_steps=5,
learning_rate=2e-4,
logging_steps=1,
output_dir="/tmp/peft/quanto",
),
data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
trainer.train()
if __name__ == "__main__":
main(qint2)
main(qint4)
main(qint8) (I do get a |
Hi there. I am considering pushing one last
This is because there are more than 50k downloads of the obsolete I am however worried about the implications in |
Thanks for the heads-up ! I will spend a bit of time today to finish this PR ! |
Hey! 🤗 Thanks for your contribution to the Before merging this pull request, slow tests CI should be triggered. To enable this:
(For maintainers) The documentation for slow tests CI on PRs is here. |
src/transformers/cache_utils.py
Outdated
# We have two different API since in optimum-optimun, we don't use AffineQuantizer anymore | ||
if is_optimum_quanto_available(): | ||
from optimum.quanto import QBitsTensor | ||
|
||
qtensor = QBitsTensor.quantize(tensor, self.qtype, axis, self.q_group_size, scale, zeropoint) | ||
return qtensor | ||
elif is_quanto_available(): | ||
logger.warning_once( | ||
"Importing from quanto will be deprecated in v4.47. Please install optimum-quanto instead `pip install optimum-quanto`" | ||
) | ||
from quanto import AffineQuantizer | ||
|
||
qtensor = AffineQuantizer.apply(tensor, self.qtype, axis, self.q_group_size, scale, zeropoint) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this sounds good @dacorvo ? I checked quantize_weights but there are breaking changes between main and the latest version with args being removed. Also you use introduced WeightQBitsTensor
recently in quantize_weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break on next release: I am doing it right now so that at least you can use the latest API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the new release: https://github.com/huggingface/optimum-quanto/releases/tag/v0.2.5
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, thanks Marc (and David for the recent release).
Just a question, should the is_trainable
property also be updated (see my comment above)?
transformers/src/transformers/quantizers/quantizer_quanto.py
Lines 195 to 196 in 22266be
def is_trainable(self, model: Optional["PreTrainedModel"] = None): | |
return False |
Oh indeed ! I'll change it ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - nice handling of the versions. Thanks for adding!
08da020
to
f4ab762
Compare
@@ -193,7 +213,7 @@ def _process_model_after_weight_loading(self, model): | |||
|
|||
@property | |||
def is_trainable(self, model: Optional["PreTrainedModel"] = None): | |||
return False | |||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switching to optimum quanto makes them trainable? 👁️ 👁️
If yes, should be version specific no? Or am I missing something?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trainable with peft ! No changes in quanto, just in peft cc @BenjaminBossan and this comment: #31732 (comment)
* switch to optimum-quanto rebase squach * fix import check * again * test try-except * style
* switch to optimum-quanto rebase squach * fix import check * again * test try-except * style
* switch to optimum-quanto rebase squach * fix import check * again * test try-except * style
What does this PR do ?
This PR updates the quanto library package since we moved it under
optimum
. The new package name is optimum-quanto. Now, we need to import withoptimum.quanto
insteadquanto
.quanto will be deprecated in v4.47 as suggested by Amy !
Tests are passing ! We only check for optimum-quanto now but I checked locally that there were no issues with quanto.