QLinearAdd Op Request #5895

cjvolzka · 2024-02-01T17:59:01Z

QLinearAdd Operator Request

Describe the operator

An Add Operator for quantized data. It supports zero_point and scale input tensor for the Add input (A and B) and output (C) tensors:
This Op exists in ONNXRuntime time but not in the ONNX standard operators
https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.QLinearAdd

Several int8.onnx models in the onnx model zoo validated directory user this Op. Pretty much all of them but one which have QlinearMatMul also have QLinearAdd.

Onnx-mlir would like to support these models but we only support official ONNX Operators.

Can this operator be constructed using existing onnx operators?

Unsure.

Is this operator used by any model currently? Which one?

Several offhand I found in the onnx model zoo:
bvlcalexnet-12-int8
mnist-12-int8
vgg16-12-int8

Are you willing to contribute it? (Y/N)

N

Notes

ONNX already has QLinearMatMul and QLinearConv from these models but appears to be missing QLinearAdd.

justinchuby · 2024-02-01T19:18:43Z

Can this be represented using the Dequantize-Add-Quantize pattern?

cjvolzka · 2024-02-01T21:04:43Z

No. Dequantizing would turn them back into floats. So you'd loose both the memory savings and performance improvements of integer math on the operation. You'd also have to recalculate scales to requantize which would add time. Also using the QLinear* Ops are for models that were quantized at training time. So dequantizing and requantizing would incur accuracy hits hat would harm the point of the "quantization aware" training of the the original model.

Also from what I've seen, when you have QLinear* Ops you start with a QuantizeLinear. Do a series of QLinear* Ops on the values and then afterward there's a DequantizeLinear at the end. Everything between the QuantizeLinear and DequantizeLinear should stay quantized and use scales and offsets set at training time.

gramalingam · 2024-02-05T22:51:13Z

Just as a background explanation: there has been a shift towards using the pattern Justin describes: at the model level, an op on quantized tensor(s) is expressed as "Dequantize => op => Quantize" first. Then, a backend can rewrite this pattern into a "CustomQuantizedOp" if it has support for doing so.

The reason was to avoid introducing QLinearX for many different X ops (like Add, Mul, Sub, Div, Relu, etc.). Which would be very disruptive. However, if the industry converges on ops that are worth explicitly supporting in quantized form, they may be worth adding at some point. I am not sure we are there yet. But opinions welcome.

cjvolzka added the operator Issues related to ONNX operators label Feb 1, 2024

cjvolzka changed the title ~~QLinearAdd Op~~ QLinearAdd Op Request Feb 1, 2024

kali mentioned this issue Nov 14, 2024

QLinearAdd support sonos/tract#1570

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QLinearAdd Op Request #5895

QLinearAdd Op Request #5895

cjvolzka commented Feb 1, 2024 •

edited

Loading

justinchuby commented Feb 1, 2024

cjvolzka commented Feb 1, 2024

gramalingam commented Feb 5, 2024

QLinearAdd Op Request #5895

QLinearAdd Op Request #5895

Comments

cjvolzka commented Feb 1, 2024 • edited Loading

QLinearAdd Operator Request

Describe the operator

Can this operator be constructed using existing onnx operators?

Is this operator used by any model currently? Which one?

Are you willing to contribute it? (Y/N)

Notes

justinchuby commented Feb 1, 2024

cjvolzka commented Feb 1, 2024

gramalingam commented Feb 5, 2024

cjvolzka commented Feb 1, 2024 •

edited

Loading