Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quantization] Switch to optimum-quanto #31732

Merged
merged 6 commits into from
Oct 2, 2024
Merged

[Quantization] Switch to optimum-quanto #31732

merged 6 commits into from
Oct 2, 2024

Conversation

SunMarc
Copy link
Member

@SunMarc SunMarc commented Jul 1, 2024

What does this PR do ?

This PR updates the quanto library package since we moved it under optimum. The new package name is optimum-quanto. Now, we need to import with optimum.quanto instead quanto.
quanto will be deprecated in v4.47 as suggested by Amy !

Tests are passing ! We only check for optimum-quanto now but I checked locally that there were no issues with quanto.

@SunMarc SunMarc requested review from amyeroberts and dacorvo July 1, 2024 15:59
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for handling this!

Do we have numbers on the use of quanto? This will determine how cautious we should be.

I'd suggest instead having a fallback for one version cycle e.g. for imports

if is_optimim_quanto_available():
    from optimum.quanto import y
elif is_quanto_available():
    warnings.warn("Importing from quanto will be deprecated in v4.44. Please install optimum-quanto instrad `pip install optimum-quanto`")
    from quanto import y

and update the quanto check

def is_quanto_available():
    warning.warn("....")
    ...

In the tests etc. we can still use just is_optimum_quanto_available directly.

src/transformers/generation/utils.py Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Aug 1, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Aug 10, 2024
@SunMarc SunMarc reopened this Aug 12, 2024
@github-actions github-actions bot closed this Aug 21, 2024
@BenjaminBossan
Copy link
Member

BenjaminBossan commented Sep 6, 2024

It would be great if this PR could be revived :)

Btw if you do, I think you can set is_trainable to True. Even though the PEFT support for quanto is still in PR state, it still works as is. The reason is that quanto QLinear modules are subclasses of nn.Linear, thus PEFT just applies a normal lora.Linear layer. Some features like merging won't work (hence the need for the PR), but training and inference work. Here is some code for testing, which passed with PEFT v0.12.0:

import os

from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, DataCollatorForLanguageModeling, TrainingArguments, Trainer
from transformers.utils.quantization_config import QuantizationMethod
from peft import LoraConfig, get_peft_model
from optimum.quanto import QuantizedModelForCausalLM, qint2, qint4, qint8


os.environ["CUDA_VISISBLE_DEVICES"] = "0"

device = "cpu"
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m")
data = load_dataset("ybelkada/english_quotes_copy")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)


def main(weights):
    config = LoraConfig(
        r=16,
        lora_alpha=32,
        target_modules=["q_proj", "v_proj"],
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
    )

    model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m", device_map=device)
    QuantizedModelForCausalLM.quantize(model, weights=weights)

    class QuantizationConfig:
        quant_method = QuantizationMethod.QUANTO

    class HfQuantizer:
        is_trainable = True  #                                  <======= is False right now
        quantization_config = QuantizationConfig()

    model.hf_quantizer = HfQuantizer()
    model.is_quantized = True

    model = get_peft_model(model, config)
    assert "QLinear" in repr(model)
    assert "lora.Linear" in repr(model)

    trainer = Trainer(
        model=model,
        train_dataset=data["train"],
        args=TrainingArguments(
            per_device_train_batch_size=4,
            gradient_accumulation_steps=4,
            warmup_steps=2,
            max_steps=5,
            learning_rate=2e-4,
            logging_steps=1,
            output_dir="/tmp/peft/quanto",
        ),
        data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
    )
    model.config.use_cache = False
    trainer.train()

if __name__ == "__main__":
    main(qint2)
    main(qint4)
    main(qint8)

(I do get a RuntimeError: CUDA error: an illegal memory access was encountered from quanto when running with CUDA, but that's a separate issue)

@SunMarc SunMarc reopened this Sep 6, 2024
@dacorvo
Copy link
Contributor

dacorvo commented Sep 30, 2024

Hi there. I am considering pushing one last quanto version that:

  • has optimum-quanto as single dependency,
  • raises an exception whenever a symbol is imported from quanto (instead of optimum.quanto).

This is because there are more than 50k downloads of the obsolete quanto version every week, and I would like to redirect the users to the correct library.

I am however worried about the implications in transformers, so it would be great to either merge this pull-request or make sure it is harmless.

@SunMarc
Copy link
Member Author

SunMarc commented Sep 30, 2024

Hi there. I am considering pushing one last quanto version that:

has optimum-quanto as single dependency,
raises an exception whenever a symbol is imported from quanto (instead of optimum.quanto).
This is because there are more than 50k downloads of the obsolete quanto version every week, and I would like to redirect the users to the correct library.

I am however worried about the implications in transformers, so it would be great to either merge this pull-request or make sure it is harmless.

Thanks for the heads-up ! I will spend a bit of time today to finish this PR !

@HuggingFaceDocBuilderDev

Hey! 🤗 Thanks for your contribution to the transformers library!

Before merging this pull request, slow tests CI should be triggered. To enable this:

  • Add the run-slow label to the PR
  • When your PR is ready for merge and all reviewers' comments have been addressed, push an empty commit with the command [run-slow] followed by a comma separated list of all the models to be tested, i.e. [run_slow] model_to_test_1, model_to_test_2
    • If the pull request affects a lot of models, put at most 10 models in the commit message
  • A transformers maintainer will then approve the workflow to start the tests

(For maintainers) The documentation for slow tests CI on PRs is here.

Comment on lines 789 to 803
# We have two different API since in optimum-optimun, we don't use AffineQuantizer anymore
if is_optimum_quanto_available():
from optimum.quanto import QBitsTensor

qtensor = QBitsTensor.quantize(tensor, self.qtype, axis, self.q_group_size, scale, zeropoint)
return qtensor
elif is_quanto_available():
logger.warning_once(
"Importing from quanto will be deprecated in v4.47. Please install optimum-quanto instead `pip install optimum-quanto`"
)
from quanto import AffineQuantizer

qtensor = AffineQuantizer.apply(tensor, self.qtype, axis, self.q_group_size, scale, zeropoint)

Copy link
Member Author

@SunMarc SunMarc Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this sounds good @dacorvo ? I checked quantize_weights but there are breaking changes between main and the latest version with args being removed. Also you use introduced WeightQBitsTensor recently in quantize_weights.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break on next release: I am doing it right now so that at least you can use the latest API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, thanks Marc (and David for the recent release).

Just a question, should the is_trainable property also be updated (see my comment above)?

def is_trainable(self, model: Optional["PreTrainedModel"] = None):
return False

src/transformers/cache_utils.py Outdated Show resolved Hide resolved
src/transformers/cache_utils.py Outdated Show resolved Hide resolved
src/transformers/quantizers/quantizer_quanto.py Outdated Show resolved Hide resolved
@SunMarc
Copy link
Member Author

SunMarc commented Oct 1, 2024

Generally LGTM, thanks Marc (and David for the recent release).

Just a question, should the is_trainable property also be updated (see #31732 (comment))?

Oh indeed ! I'll change it !

@SunMarc SunMarc requested a review from dacorvo October 1, 2024 14:39
Copy link
Contributor

@dacorvo dacorvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks !

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - nice handling of the versions. Thanks for adding!

@SunMarc SunMarc force-pushed the update-quanto-library branch from 08da020 to f4ab762 Compare October 2, 2024 11:42
@@ -193,7 +213,7 @@ def _process_model_after_weight_loading(self, model):

@property
def is_trainable(self, model: Optional["PreTrainedModel"] = None):
return False
return True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switching to optimum quanto makes them trainable? 👁️ 👁️
If yes, should be version specific no? Or am I missing something?!

Copy link
Member Author

@SunMarc SunMarc Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trainable with peft ! No changes in quanto, just in peft cc @BenjaminBossan and this comment: #31732 (comment)

@SunMarc SunMarc merged commit cac4a48 into main Oct 2, 2024
25 checks passed
@SunMarc SunMarc deleted the update-quanto-library branch October 2, 2024 13:14
NielsRogge pushed a commit to NielsRogge/transformers that referenced this pull request Oct 21, 2024
* switch to optimum-quanto rebase squach

* fix import check

* again

* test try-except

* style
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* switch to optimum-quanto rebase squach

* fix import check

* again

* test try-except

* style
BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024
* switch to optimum-quanto rebase squach

* fix import check

* again

* test try-except

* style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants