Skip to content
View Qubitium's full-sized avatar

Block or report Qubitium

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Python 327 12 Updated Oct 24, 2024

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]

Python 117 21 Updated Oct 24, 2024

Official inference framework for 1-bit LLMs

C++ 10,109 682 Updated Oct 25, 2024

Convert PDF to markdown quickly with high accuracy

Python 17,338 991 Updated Oct 25, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python 212 15 Updated Oct 8, 2024

Tile primitives for speedy kernels

Cuda 1,560 60 Updated Oct 28, 2024

Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"

Python 155 14 Updated Oct 16, 2024
Python 219 22 Updated Oct 21, 2024

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 13,624 1,092 Updated May 23, 2024

Code for the paper "Entanglement-induced provable and robust quantum learning advantages"

Jupyter Notebook 4 Updated Oct 7, 2024

Entropy Based Sampling and Parallel CoT Decoding

TypeScript 2,859 296 Updated Oct 28, 2024

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 469 26 Updated Oct 28, 2024

Powerful system container and virtual machine manager

Go 4,366 932 Updated Oct 28, 2024

A MAD laboratory to improve AI architecture designs 🧪

Python 92 6 Updated May 2, 2024

A dungeon crawler designed for a quantum computer

OpenQASM 70 3 Updated Aug 21, 2020

A port of DOOM for a quantum computer

C++ 634 20 Updated Sep 30, 2024

GPTQ based LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Python 112 26 Updated Oct 28, 2024

StutterFormer is an AI model that aims to be able to receive a speech sample with stuttering disfluencies, and return it with the disfluencies attenuated or eliminated.

Jupyter Notebook 12 Updated Feb 10, 2023

Filament is a real-time physically based rendering engine for Android, iOS, Windows, Linux, macOS, and WebGL2

C++ 17,771 1,884 Updated Oct 28, 2024

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 187 15 Updated Oct 28, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 615 24 Updated Sep 21, 2024

Aidan Bench attempts to measure <big_model_smell> in LLMs.

Python 87 5 Updated Oct 17, 2024

High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild

Zig 1,605 57 Updated Oct 25, 2024

markdown parser and HTML renderer for Go

Go 1,401 173 Updated Sep 30, 2024

Shiva library: Implementation in Rust of a parser and generator for documents of any type

Rust 274 14 Updated Oct 25, 2024

Efficient Triton Kernels for LLM Training

Python 3,315 181 Updated Oct 26, 2024

Control fault/locate indicators in disk slots in enclosures (SES devices)

Python 53 21 Updated Aug 27, 2024

Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models

Python 2,054 145 Updated Aug 1, 2024

A framework for Privacy Preserving Machine Learning

Python 1,538 280 Updated Oct 19, 2024
Next