-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] MMRazor Quantization Design #347
Comments
@ZhangZhiPku 非常感谢这么用心的回复,从回复内容可以看出对量化有着专业的洞察与见地。关于mmrazor的量化也谈下我们背后的一些思考。
关于回复中的5个重点,整体的回复如下:
|
Hi, I have a question about this quantization scheme for mmdet and mmseg. There are only config files for mmcls, but no references for mmdet and mmseg. For example, there is "skipped_methods" field in config file, Please give me any insights. |
Thanks for your work |
Thanks for the wonderful work! From the proposal I have not understood if the is pytorch-cuda support for the quantization. The backend lists the three examples, and I wonder if native pytorch-cuda backend is supported of if one would go through TensorRT? |
* move to lib * optional import pytorch rewriter * reduce torch dependancy of tensorrt export * remove more mmcv support * fix pytest * remove mmcv logge * Add `mmdeploy.utils.logging` * Improve the common of the `get_logger` * Fix lint * onnxruntim add try catch to import wrapper if pytorch is available * Using `mmcv.utils.logging` in all files under `mmdeploy/codebase` * add __init__ * add prebuild tools * support windows * for comment * exit if failed * add exist * decouple * add tags * remove .mmdeploy_python * read python version from system * update windows config * update linux config * remote many * better build name * rename python tag * fix pyhon-tag * update window config * add env search * update tag * fix build without CUDA_TOOLKIT_ROOT_DIR Co-authored-by: HinGwenWoong <peterhuang0323@outlook.com>
Motivation
To design and implement the better quantization part of MMRazor with community.
Collect more requirements and suggestions before releasing quantization by RFC (Request for Comments).
Overview
MMRazor quantization will be an algorithm platform not just provide basic quantization function api. We hope it will help us in the following ways:
Compress and deploy your model faster.
Producing better models with our quantization algorithms.
Implement some novel quantization algorithms easier.
Goals
Support implementing mainstream QAT and PTQ algorithms, such as LSQ, Adaround and so on.
Support complete working pipeline from quantization to deployment. You can deploy quantized models on multiple backends with mmdploy.
Adaptive OpenMMLab 2.0. Thus it can unified support OpenMMLab upstream repositories without extra code.
Easier to use. You can quantize your model just by modifying config and running script, rather than modify your source model.
Algorithms
We plan to support some quantization algorithms in future as follows. Welcome to propose your requirements.
QAT
LSQ
LSQ+
IAO
......
PTQ
Adaround
BRECQ
QDrop
......
Main features
We list some main features to be supported in future. Welcome to comment.
Quantization type: QAT and PTQ(static/dynamic)
Quantization bits: 1 ~ 32 Note: 1 bit is not binaryzation, just common quantization.
Quantization methods (uniform quantization):
Multiple backends:
Some algorithms and features to be supported will be implemented in the next several versions due to lack of manpower, welcome to create PRs to speed up development.
Most features will be released in the first release, except dynamic quantization and more backends supporting. According to quantization algorithms, we will release them by ranks in the next two versions.
Release plan
We will release our first version in December 2022 if everything goes well.
Design and Implement
We will extend and develop to implement our design based on PyTorch basic quantization function api and torch.fx. So some modules in PyTorch will be inherited and also some new modules will be created.
User-friendly config
We will use
Qscheme
to convert user-friendly config to API oriented parameters. Demo config is as follows.Usage
Quantization algorithms' entrance is like other model compression algorithms as follows.
QAT:
tools/train.py
PTQ:
tools/test.py
Deploy quantized model's entrance is
mmdeploy/tools/deploy.py
. So you can just run the following commands to implement the pipeline from quantization to deployFor more details about the above commands, please refer to the quantization document to be released.
Core modules
In
forward
, they will update the statistics of the observed Tensor. And they should provide acalculate_qparams
function that computes the quantization parameters given the collected statistics.In
forward
, they will update the statistics of the observed Tensor and fake quantize the input. They should also provide acalculate_qparams
function that computes the quantization parameters given the collected statistics.In fake quantize, you can implement some algorithms' special operations.
They implement some core quantization function APIs for
algorithm
, such asqconfig_convert
,prepare
,convert_model
,fuse_model
, and so on. What is more, different quantizers can deal with different backends to be deployed, thus we can configure it in the config for different backends.They will provide some core APIs for Quantization Loops to implement quantization pipelines. Such as
calib_step
,prepare
,convert
and so on. AndAlgorithms
also maintain traced graphs andforward
with graphs.They inherited mmengine's
TrainLoop
andTestLoop
, adding some core quantization steps, such ascalibrate
,preprare
,convert
. There are also some special steps for some quantization algorithms, such assubgraph reconstruction
.How to trace the model automatically
Because torch.fx has its own limitations, some models'
forward
can not be traced when there are some special cases inforward
, such as dynamic judgment.For tracing the model automatically, we custom a
CustomTracer
andUntracedMethodRegistry
.UntracedMethodRegistry
can be used as a decorator to make decorated methods skipped byCustomTracer
. What is more, methods to be skipped can be configured in our configs. Please refer to the chapterUser-friendly config
to learn about its usage.So the solution is as follows.
Collect these untraceable codes to a function or a method and make the rest of the pipeline traceable. In OpenMMLab 2.0, we refactored some model interfaces to adapt torch.fx preliminary.
Specified these methods to be skipped in our configs.
WIP code
For more details about the implementation, please refer to the branch: https://github.com/open-mmlab/mmrazor/tree/quantize
Note:
The quantize branch is in development, modifying code will happen at any time.
The text was updated successfully, but these errors were encountered: