Stars
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
LLM training code for Databricks foundation models
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Unsupervised text tokenizer for Neural Network-based text generation.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Robust Speech Recognition via Large-Scale Weak Supervision
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
Masked Structural Growth for 2x Faster Language Model Pre-training
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
Awesome-LLM: a curated list of Large Language Model
A curated reading list of research in Mixture-of-Experts(MoE).
Pygments is a generic syntax highlighter written in Python
Language Savant. If your repository's language is being reported incorrectly, send us a pull request!
Plug in and play implementation of " Textbooks Are All You Need", ready for training, inference, and dataset generation
Python package built to ease deep learning on graph, on top of existing DL frameworks.
Integrate cutting-edge LLM technology quickly and easily into your apps
A relation-aware semantic parsing model from English to SQL
DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
Development repository for the Triton language and compiler
A high-throughput and memory-efficient inference and serving engine for LLMs
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Modeling, training, eval, and inference code for OLMo
Official style files for papers submitted to venues of the Association for Computational Linguistics