-
ModelCloud.ai
- Earth/Epoch 2.0
- https://modelcloud.ai
- @qubitium
Stars
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
Convert PDF to markdown quickly with high accuracy
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
llama3 implementation one matrix multiplication at a time
Code for the paper "Entanglement-induced provable and robust quantum learning advantages"
Entropy Based Sampling and Parallel CoT Decoding
VPTQ, A Flexible and Extreme low-bit quantization algorithm
Powerful system container and virtual machine manager
A MAD laboratory to improve AI architecture designs 🧪
A dungeon crawler designed for a quantum computer
GPTQ based LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
StutterFormer is an AI model that aims to be able to receive a speech sample with stuttering disfluencies, and return it with the disfluencies attenuated or eliminated.
Filament is a real-time physically based rendering engine for Android, iOS, Windows, Linux, macOS, and WebGL2
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
A throughput-oriented high-performance serving framework for LLMs
Aidan Bench attempts to measure <big_model_smell> in LLMs.
High performance AI inference stack. Built for production. @ziglang / @openxla / MLIR / @bazelbuild
Shiva library: Implementation in Rust of a parser and generator for documents of any type
Efficient Triton Kernels for LLM Training
Control fault/locate indicators in disk slots in enclosures (SES devices)
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
A framework for Privacy Preserving Machine Learning