Transformer-based language model for text generation.
This GPT-2 model with generation can produce the result without any extra code or algorithm. It already embedded a beam search algorithm into the ONNX model, so there is NO Post-Processing code to inference.
Model | Download | Compressed | ONNX version | Opset version |
---|---|---|---|---|
gpt2-lm-head-bs | 635 MB | N/A (similiar size) | 1.7 | 12 |
Huggingface PyTorch GPT-2-with-lm-head + conversion script ==> ONNX GPT-2-LM-HEAD-BS.onnx The full conversion script is in onnxruntime-extentions, and some model parameters can be changed if the number in the script was changed.
running this model is straightforward, with onnxruntime-extensions, it only contains several lines for an end-to-end inference.
from onnxruntime_extensions import PyOrtFunction
gpt2_all = PyOrtFunction.from_model('model/gpt2-lm-head-bs-12.onnx')
encdict = tokenizer('What is the best story', padding=True, return_tensors='np')
outputs = gpt2_all(encdict['input_ids'], encdict['attention_mask'].astype('float32'), 30)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
and the tokenizer used in the above code example can be get like
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
tokenizer.padding_side = "left"
tokenizer.pad_token = tokenizer.eos_token
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, andIlya Sutskever. Language Models are Unsupervised Multitask Learners. 2019.
This model is converted directly from huggingface/transformers.
Wenbing Li
Apache 2.0 License