add int8 bert model (onnx#481)

* add int8 bert model Signed-off-by: mengniwa <mengni.wang@intel.com> * update readme Signed-off-by: mengniwa <mengni.wang@intel.com> Co-authored-by: Wenbing Li <10278425+wenbingl@users.noreply.github.com>
fmonpelat · Nov 12, 2021 · 2063d79 · 2063d79
1 parent 5f7b9ca
commit 2063d79
Show file tree

Hide file tree

Showing 5 changed files with 50 additions and 5 deletions.
diff --git a/text/machine_comprehension/bert-squad/README.md b/text/machine_comprehension/bert-squad/README.md
@@ -10,10 +10,17 @@ BERT (Bidirectional Encoder Representations from Transformers) applies Transform
 
 ## Model
 
- |Model        |Download  |Download (with sample test data)| ONNX version |Opset version|
-| ------------- | ------------- | ------------- | ------------- | ------------- |
-|BERT-Squad| [416 MB](model/bertsquad-8.onnx) |  [385 MB](model/bertsquad-8.tar.gz) |  1.3 | 8|
-|BERT-Squad| [416 MB](model/bertsquad-10.onnx) |  [384 MB](model/bertsquad-10.tar.gz) |  1.5 | 10|
+ |Model        |Download  |Download (with sample test data)| ONNX version |Opset version| Accuracy|
+| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
+|BERT-Squad| [416 MB](model/bertsquad-8.onnx) |  [385 MB](model/bertsquad-8.tar.gz) |  1.3 | 8| |
+|BERT-Squad| [416 MB](model/bertsquad-10.onnx) |  [384 MB](model/bertsquad-10.tar.gz) |  1.5 | 10| |
+|BERT-Squad| [416 MB](model/bertsquad-12.onnx) |  [384 MB](model/bertsquad-12.tar.gz) |  1.9 | 12| 80.67171|
+|BERT-Squad-int8| [119 MB](model/bertsquad-12-int8.onnx) |  [101 MB](model/bertsquad-12-int8.tar.gz) |  1.9 | 12| 80.43519|
+> Compared with the fp32 BERT-Squad, BERT-Squad-int8's accuracy drop ratio is 0.29%, performance improvement is 1.81x.
+>
+> Note the performance depends on the test hardware. 
+> 
+> Performance data here is collected with Intel® Xeon® Platinum 8280 Processor, 1s 4c per instance, CentOS Linux 8.3, data batch size is 1.
 
 Dependencies
 * [tokenization.py](dependencies/tokenization.py)
@@ -110,13 +117,39 @@ Metric is Exact Matching (EM) of 80.7, computed over SQuAD v1.1 dev data, for th
 ## Training
 Fine-tuned the model using SQuAD-1.1 dataset. Look at [BertTutorial.ipynb](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/BertTutorial.ipynb) for more information for converting the model from tensorflow to onnx and for fine-tuning
 
+## Quantization
+BERT-Squad-int8 is obtained by quantizing BERT-Squad model (opset=12). We use [Intel® Neural Compressor](https://github.com/intel/neural-compressor) with onnxruntime backend to perform quantization. View the [instructions](https://github.com/intel-innersource/frameworks.ai.lpot.intel-lpot/blob/master/examples/onnxrt/onnx_model_zoo/bert-squad/readme.md) to understand how to use Intel® Neural Compressor for quantization.
+
+### Environment
+onnx: 1.9.0 
+onnxruntime: 1.8.0
+
+### Prepare model
+```shell
+wget https://github.com/onnx/models/raw/master/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx
+```
+
+### Model quantize
+```bash
+bash run_tuning.sh --input_model=/path/to/model \ # model path as *.onnx
+                   --output_model=/path/to/model_tune \
+                   --dataset_location=/path/to/SQuAD/dataset \
+                   --config=bert.yaml
+```
 
 ## References
 * **BERT** Model from the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
 
 * [BERT Tutorial](https://github.com/onnx/tensorflow-onnx/blob/master/tutorials/BertTutorial.ipynb)
+
+* [Intel® Neural Compressor](https://github.com/intel/neural-compressor)
+
 ## Contributors
-[Kundana Pillari](https://github.com/kundanapillari)
+* [Kundana Pillari](https://github.com/kundanapillari)
+* [mengniwang95](https://github.com/mengniwang95) (Intel)
+* [airMeng](https://github.com/airMeng) (Intel)
+* [ftian1](https://github.com/ftian1) (Intel)
+* [hshen14](https://github.com/hshen14) (Intel)
 
 ## License
 Apache 2.0
diff --git a/text/machine_comprehension/bert-squad/model/bertsquad-12-int8.onnx b/text/machine_comprehension/bert-squad/model/bertsquad-12-int8.onnx
diff --git a/text/machine_comprehension/bert-squad/model/bertsquad-12-int8.tar.gz b/text/machine_comprehension/bert-squad/model/bertsquad-12-int8.tar.gz
diff --git a/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx b/text/machine_comprehension/bert-squad/model/bertsquad-12.onnx
diff --git a/text/machine_comprehension/bert-squad/model/bertsquad-12.tar.gz b/text/machine_comprehension/bert-squad/model/bertsquad-12.tar.gz