Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QLinearAdd Op Request #5895

Open
cjvolzka opened this issue Feb 1, 2024 · 3 comments
Open

QLinearAdd Op Request #5895

cjvolzka opened this issue Feb 1, 2024 · 3 comments
Labels
operator Issues related to ONNX operators

Comments

@cjvolzka
Copy link
Contributor

cjvolzka commented Feb 1, 2024

QLinearAdd Operator Request

Describe the operator

An Add Operator for quantized data. It supports zero_point and scale input tensor for the Add input (A and B) and output (C) tensors:
This Op exists in ONNXRuntime time but not in the ONNX standard operators
https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.QLinearAdd

Several int8.onnx models in the onnx model zoo validated directory user this Op. Pretty much all of them but one which have QlinearMatMul also have QLinearAdd.

Onnx-mlir would like to support these models but we only support official ONNX Operators.

Can this operator be constructed using existing onnx operators?

Unsure.

Is this operator used by any model currently? Which one?

Several offhand I found in the onnx model zoo:
bvlcalexnet-12-int8
mnist-12-int8
vgg16-12-int8

Are you willing to contribute it? (Y/N)

N

Notes

ONNX already has QLinearMatMul and QLinearConv from these models but appears to be missing QLinearAdd.

@cjvolzka cjvolzka added the operator Issues related to ONNX operators label Feb 1, 2024
@cjvolzka cjvolzka changed the title QLinearAdd Op QLinearAdd Op Request Feb 1, 2024
@justinchuby
Copy link
Contributor

Can this be represented using the Dequantize-Add-Quantize pattern?

@cjvolzka
Copy link
Contributor Author

cjvolzka commented Feb 1, 2024

No. Dequantizing would turn them back into floats. So you'd loose both the memory savings and performance improvements of integer math on the operation. You'd also have to recalculate scales to requantize which would add time. Also using the QLinear* Ops are for models that were quantized at training time. So dequantizing and requantizing would incur accuracy hits hat would harm the point of the "quantization aware" training of the the original model.

Also from what I've seen, when you have QLinear* Ops you start with a QuantizeLinear. Do a series of QLinear* Ops on the values and then afterward there's a DequantizeLinear at the end. Everything between the QuantizeLinear and DequantizeLinear should stay quantized and use scales and offsets set at training time.

@gramalingam
Copy link
Contributor

Just as a background explanation: there has been a shift towards using the pattern Justin describes: at the model level, an op on quantized tensor(s) is expressed as "Dequantize => op => Quantize" first. Then, a backend can rewrite this pattern into a "CustomQuantizedOp" if it has support for doing so.

The reason was to avoid introducing QLinearX for many different X ops (like Add, Mul, Sub, Div, Relu, etc.). Which would be very disruptive. However, if the industry converges on ops that are worth explicitly supporting in quantized form, they may be worth adding at some point. I am not sure we are there yet. But opinions welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operator Issues related to ONNX operators
Projects
None yet
Development

No branches or pull requests

3 participants