-
ModelCloud.ai
- Earth/Epoch 2.0
- https://modelcloud.ai
- @qubitium
Stars
MooreThreads / mutlass
Forked from NVIDIA/cutlassMUSA Templates for Linear Algebra Subroutines
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Fast Hadamard transform in CUDA, with a PyTorch interface
NanoGPT (124M) quality in 2.4B tokens
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
Fast and memory-efficient exact attention
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
Convert PDF to markdown quickly with high accuracy
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
llama3 implementation one matrix multiplication at a time
Code for the paper "Entanglement-induced provable and robust quantum learning advantages"
Entropy Based Sampling and Parallel CoT Decoding
VPTQ, A Flexible and Extreme low-bit quantization algorithm
Powerful system container and virtual machine manager
A MAD laboratory to improve AI architecture designs 🧪
A dungeon crawler designed for a quantum computer
StutterFormer is an AI model that aims to be able to receive a speech sample with stuttering disfluencies, and return it with the disfluencies attenuated or eliminated.
Filament is a real-time physically based rendering engine for Android, iOS, Windows, Linux, macOS, and WebGL2