This project (https://onnx.ai/onnx-mlir/) provides compiler technology to transform a valid Open Neural Network Exchange (ONNX) graph into code that implements the graph with minimum runtime support. It implements the ONNX standard and is based on the underlying LLVM/MLIR compiler technology.
System | Build Status | Model Zoo Status |
---|---|---|
s390x-Linux | ||
ppc64le-Linux | ||
amd64-Linux | ||
amd64-Windows | ||
amd64-macOS | ||
This project contributes:
- an ONNX Dialect that can be integrated in other projects,
- a compiler interfaces that lower ONNX graphs into MLIR files/LLVM bytecodes/C & Java libraries,
- an
onnx-mlir
driver to perform these lowering, - and a python/C/C++/Java runtime environment.
Current levels of support for the code generation of ONNX operations are listed here for a generic CPU and IBM's Telum integrated AI accelerator.
For ongoing discussions, we use an #onnx-mlir-discussion
slack channel established under the Linux Foundation AI and Data Workspace.
We use GitHub Issues for request for comments, questions, or bug reports.
Security-related issues are reported using the channels listed in the SECURITY page.
We hold informal weekly meetings on Tuesdays, 8-9pm EST where we discuss current issues and progress. Meeting uses WebEx and everyone is welcome to attend. Please email alexe@us.ibm.com to be added to the meeting invite or to request a 15-30 min time slot to discuss a specific topic of interest.
The preferred approach to using and developing ONNX-MLIR is to use Docker Images and Containers, as getting the proper code dependences may be tricky on some systems. Our instructions on using ONNX-MLIR with Dockers are here.
If you intend to develop code, you should look at our workflow document which help you setup your Docker environment in a way that let you contribute code easily.
ONNX-MLIR runs natively on Linux, OSX, and Windows. Detailed instructions are provided below.
python >= 3.8
gcc >= 6.4
protobuf >= 3.18.3
cmake >= 3.13.4
make >= 4.2.1 or ninja >= 1.10.2
java >= 1.11 (optional)
Look here for help to set up the prerequisite software.
At any point in time, ONNX-MLIR depends on a specific commit of the LLVM project that has been shown to work with the project. Periodically the maintainers need to move to a more recent LLVM level. Among other things, this requires to update the LLVM commit string in clone-mlir.sh. When updating ONNX-MLIR, it is good practice to check that the commit string of the MLIR/LLVM is the same as the one listed in that file. See instructions here when third-party ONNX also need to be updated.
Directions to install MLIR and ONNX-MLIR are dependent on your OS.
After installation, an onnx-mlir
executable should appear in the build/Debug/bin
or build/Release/bin
directory.
If you have difficulties building, rebuilding, or testing onnx-mlir
, check this page for helpful hints.
The usage of onnx-mlir
is as such:
OVERVIEW: ONNX-MLIR modular optimizer driver
USAGE: onnx-mlir [options] <input file>
OPTIONS:
Generic Options:
--help - Display available options (--help-hidden for more)
--help-list - Display list of available options (--help-list-hidden for more)
--version - Display the version of this program
ONNX-MLIR Options:
These are frontend options.
Choose target to emit:
--EmitONNXBasic - Ingest ONNX and emit the basic ONNX operations without inferred shapes.
--EmitONNXIR - Ingest ONNX and emit corresponding ONNX dialect.
--EmitMLIR - Lower the input to MLIR built-in transformation dialect.
--EmitLLVMIR - Lower the input to LLVM IR (LLVM MLIR dialect).
--EmitObj - Compile the input to an object file.
--EmitLib - Compile and link the input into a shared library (default).
--EmitJNI - Compile the input to a jar file.
Optimization levels:
--O0 - Optimization level 0 (default).
--O1 - Optimization level 1.
--O2 - Optimization level 2.
--O3 - Optimization level 3.
The full list of options is given by the -help
option.
The -
and the --
prefix for flags can be used interchangeably.
Note that just as most compilers, the default optimization level is -O0
.
We recommend using -O3
for most applications.
Options are also read from the ONNX_MLIR_FLAGS
environment variable. For example, ONNX_MLIR_FLAGS="-O3"
will ensure -O3
for all compilations.
For example, use the following command to lower an ONNX model (e.g., add.onnx) to ONNX dialect:
./onnx-mlir --EmitONNXIR add.onnx
The output should look like:
module {
func.func @main_graph(%arg0: tensor<10x10x10xf32>, %arg1: tensor<10x10x10xf32>) -> tensor<10x10x10xf32> {
%0 = "onnx.Add"(%arg0, %arg1) : (tensor<10x10x10xf32>, tensor<10x10x10xf32>) -> tensor<10x10x10xf32>
return %0 : tensor<10x10x10xf32>
}
}
An example based on the add operation is found here, which build an ONNX model using a python script, and then provide a main program to load the model's value, compute, and print the models output.
An end to end example is provided here, which train, compile, and execute a simple MNIST example using our C/C++, Python, or Java interface.
Documentation is provided in the docs
sub-directory; the DocumentList page provides an organized list of documents. Information is also provided on our public facing
onnx.ai/onnx-mlir pages.
We are welcoming contributions from the community. Please consult the CONTRIBUTING page for help on how to proceed.
ONNX-MLIR requires committers to sign their code using the Developer Certificate of Origin (DCO).
Practically, each git commit
needs to be signed, see here for specific instructions.
The ONNX-MLIR code of conduct is described at https://onnx.ai/codeofconduct.html.
- The onnx-mlir-serving project implements a GRPC server written with C++ to serve onnx-mlir compiled models. Benefiting from C++ implementation, ONNX Serving has very low latency overhead and high throughput.