SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
Updated
Oct 31, 2024 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
A script to convert floating-point CNN models into generalized low-precision ShiftCNN representation
Low Precision(quantized) Yolov5
Code for DNN feature map compression paper
LinearCosine: Adding beats multiplying for lower-precision efficient cosine similarity
Add a description, image, and links to the low-precision topic page so that developers can more easily learn about it.
To associate your repository with the low-precision topic, visit your repo's landing page and select "manage topics."