Skip to content
View Qubitium's full-sized avatar

Block or report Qubitium

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MUSA Templates for Linear Algebra Subroutines

C++ 2 1 Updated Sep 30, 2024

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Python 117 26 Updated Nov 5, 2024
Python 889 90 Updated Nov 5, 2024

Amnezia VPN Client (Desktop+Mobile)

C++ 5,641 347 Updated Nov 5, 2024

Fast Hadamard transform in CUDA, with a PyTorch interface

C 105 14 Updated May 24, 2024

Virtual environment stacks for Python

Python 82 1 Updated Nov 5, 2024

NanoGPT (124M) quality in 2.4B tokens

Python 887 63 Updated Nov 5, 2024

LLM training in simple, raw C/CUDA

Cuda 24,301 2,742 Updated Oct 2, 2024

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,097 60 Updated Oct 31, 2024

Fast and memory-efficient exact attention

Python 14,061 1,313 Updated Nov 5, 2024

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,119 205 Updated Nov 5, 2024

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Python 358 14 Updated Nov 3, 2024

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]

Python 121 21 Updated Oct 24, 2024

Official inference framework for 1-bit LLMs

C++ 10,799 732 Updated Oct 31, 2024

Convert PDF to markdown quickly with high accuracy

Python 17,518 1,006 Updated Nov 5, 2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Python 219 16 Updated Oct 8, 2024

Tile primitives for speedy kernels

Cuda 1,625 65 Updated Nov 1, 2024

Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"

Python 164 17 Updated Oct 16, 2024
Python 223 22 Updated Oct 21, 2024

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 13,670 1,092 Updated May 23, 2024

Code for the paper "Entanglement-induced provable and robust quantum learning advantages"

Jupyter Notebook 5 Updated Oct 7, 2024

Entropy Based Sampling and Parallel CoT Decoding

Python 2,929 305 Updated Nov 5, 2024

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 485 27 Updated Nov 1, 2024

Powerful system container and virtual machine manager

Go 4,372 930 Updated Nov 5, 2024

A MAD laboratory to improve AI architecture designs 🧪

Python 92 6 Updated May 2, 2024

A dungeon crawler designed for a quantum computer

OpenQASM 70 3 Updated Aug 21, 2020

A port of DOOM for a quantum computer

C++ 641 21 Updated Sep 30, 2024

StutterFormer is an AI model that aims to be able to receive a speech sample with stuttering disfluencies, and return it with the disfluencies attenuated or eliminated.

Jupyter Notebook 13 Updated Feb 10, 2023

Filament is a real-time physically based rendering engine for Android, iOS, Windows, Linux, macOS, and WebGL2

C++ 17,784 1,890 Updated Nov 5, 2024
Next