Add example of python code to readme of transformers tools (#3966)

* Use shorter name for tools * Use optimizer_cli * Add comments about -i parameter
microsoft · May 17, 2020 · 56700be · 56700be
1 parent 769c11f
commit 56700be
Showing 1 changed file with 28 additions and 5 deletions.
diff --git a/onnxruntime/python/tools/transformers/README.md b/onnxruntime/python/tools/transformers/README.md
@@ -16,9 +16,18 @@ This tool can be installed using pip as follows:
 pip install onnxruntime-tools
 ```
 
-After it is installed, you can use command like the following to optimize model:
+In your python code, you can use it like the following:
+
+```python
+from onnxruntime_tools import optimizer
+optimized_model = optimizer.optimize_model("gpt2.onnx", model_type='gpt2', num_heads=12, hidden_size=768)
+optimized_model.convert_model_float32_to_float16()
+optimized_model.save_model_to_file("gpt2_fp16.onnx")
+```
+
+You can also use command like the following to optimize model:
 ```console
-python -m onnxruntime_tools.transformers.optimizer --input gpt2.onnx --output gpt2_opt.onnx --model_type gpt2
+python -m onnxruntime_tools.optimizer_cli --input gpt2.onnx --output gpt2_opt.onnx --model_type gpt2
 ```
 
 If you want to use the latest script, you can get script files from [here](https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers/). Then run it like the following:
@@ -36,7 +45,7 @@ For tf2onnx, please refer to [its BERT tutorial](https://github.com/onnx/tensorf
 
 Example of using the script optimizer.py to optimize a BERT-large model to run in V100 GPU:
 ```console
-python -m onnxruntime_tools.transformers.optimizer --input bert_large.onnx --output bert_large_fp16.onnx --num_heads 16 --hidden_size 1024 --float16
+python -m onnxruntime_tools.optimizer_cli --input bert_large.onnx --output bert_large_fp16.onnx --num_heads 16 --hidden_size 1024 --float16
 ```
 
 ### Options
@@ -79,11 +88,25 @@ For GPT2 models, current optimization does not support past state (both inputs a
 
 The benchmark script requires PyTorch be installed.
 
-You can run benchmark script to see the inference speed of OnnxRuntime. Here is an example to run benchmark on a pretrained model bert-base-cased on GPU.
+You can run benchmark script to see the inference speed of OnnxRuntime. Here is an example to run benchmark on pretrained model bert-base-cased on GPU.
 
 ```console
-python -m onnxruntime_tools.transformers.benchmark -m bert-base-cased -g
+python -m onnxruntime_tools.transformers.benchmark -g -m bert-base-cased -o -v -b 0
+python -m onnxruntime_tools.transformers.benchmark -g -m bert-base-cased -o
+python -m onnxruntime_tools.transformers.benchmark -g -m bert-base-cased -e torch
+python -m onnxruntime_tools.transformers.benchmark -g -m bert-base-cased -e torchscript
 ```
+The first command will generate ONNX models (both before and after optimizations), but not run performance tests since we set batch size to 0. The other three commands will run performance test on three engines: OnnxRuntime, PyTorch and PyTorch+TorchScript.
+
+If you remove -o parameter, optimizer is not used in benchmark.
+
+If your GPU (like V100 or T4) has TensorCore, you can append --fp16 to the above commands to enable mixed precision using float16.
+
+If you want to benchmark on CPU, you can remove -g option in the commands.
+
+Note that our current benchmark on GPT2 model has disabled past state from inputs and outputs.
+
+By default, ONNX model has only one input (input_ids). You can use -i parameter to test models with more inputs. For example, we can add "-i 3" to command line to test a bert model with 3 inputs (input_ids, token_type_ids and attention_mask). The performance result might be different. This option only supports OnnxRuntime right now.
 
 ## Model Verification