Skip to content

Commit

Permalink
Add example of python code to readme of transformers tools (#3966)
Browse files Browse the repository at this point in the history
* Use shorter name for tools
* Use optimizer_cli
* Add comments about -i parameter
  • Loading branch information
tianleiwu authored May 17, 2020
1 parent 769c11f commit 56700be
Showing 1 changed file with 28 additions and 5 deletions.
33 changes: 28 additions & 5 deletions onnxruntime/python/tools/transformers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,18 @@ This tool can be installed using pip as follows:
pip install onnxruntime-tools
```

After it is installed, you can use command like the following to optimize model:
In your python code, you can use it like the following:

```python
from onnxruntime_tools import optimizer
optimized_model = optimizer.optimize_model("gpt2.onnx", model_type='gpt2', num_heads=12, hidden_size=768)
optimized_model.convert_model_float32_to_float16()
optimized_model.save_model_to_file("gpt2_fp16.onnx")
```

You can also use command like the following to optimize model:
```console
python -m onnxruntime_tools.transformers.optimizer --input gpt2.onnx --output gpt2_opt.onnx --model_type gpt2
python -m onnxruntime_tools.optimizer_cli --input gpt2.onnx --output gpt2_opt.onnx --model_type gpt2
```

If you want to use the latest script, you can get script files from [here](https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers/). Then run it like the following:
Expand All @@ -36,7 +45,7 @@ For tf2onnx, please refer to [its BERT tutorial](https://github.com/onnx/tensorf

Example of using the script optimizer.py to optimize a BERT-large model to run in V100 GPU:
```console
python -m onnxruntime_tools.transformers.optimizer --input bert_large.onnx --output bert_large_fp16.onnx --num_heads 16 --hidden_size 1024 --float16
python -m onnxruntime_tools.optimizer_cli --input bert_large.onnx --output bert_large_fp16.onnx --num_heads 16 --hidden_size 1024 --float16
```

### Options
Expand Down Expand Up @@ -79,11 +88,25 @@ For GPT2 models, current optimization does not support past state (both inputs a

The benchmark script requires PyTorch be installed.

You can run benchmark script to see the inference speed of OnnxRuntime. Here is an example to run benchmark on a pretrained model bert-base-cased on GPU.
You can run benchmark script to see the inference speed of OnnxRuntime. Here is an example to run benchmark on pretrained model bert-base-cased on GPU.

```console
python -m onnxruntime_tools.transformers.benchmark -m bert-base-cased -g
python -m onnxruntime_tools.transformers.benchmark -g -m bert-base-cased -o -v -b 0
python -m onnxruntime_tools.transformers.benchmark -g -m bert-base-cased -o
python -m onnxruntime_tools.transformers.benchmark -g -m bert-base-cased -e torch
python -m onnxruntime_tools.transformers.benchmark -g -m bert-base-cased -e torchscript
```
The first command will generate ONNX models (both before and after optimizations), but not run performance tests since we set batch size to 0. The other three commands will run performance test on three engines: OnnxRuntime, PyTorch and PyTorch+TorchScript.

If you remove -o parameter, optimizer is not used in benchmark.

If your GPU (like V100 or T4) has TensorCore, you can append --fp16 to the above commands to enable mixed precision using float16.

If you want to benchmark on CPU, you can remove -g option in the commands.

Note that our current benchmark on GPT2 model has disabled past state from inputs and outputs.

By default, ONNX model has only one input (input_ids). You can use -i parameter to test models with more inputs. For example, we can add "-i 3" to command line to test a bert model with 3 inputs (input_ids, token_type_ids and attention_mask). The performance result might be different. This option only supports OnnxRuntime right now.

## Model Verification

Expand Down

0 comments on commit 56700be

Please sign in to comment.