Stars
Scalable data pre processing and curation toolkit for LLMs
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference,…
We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Emu Series: Generative Multimodal Models from BAAI
800,000 step-level correctness labels on LLM solutions to MATH problems
Code implementation of synthetic continued pretraining
The official repo of INF-34B models trained by INF Technology.
General technology for enabling AI capabilities w/ LLMs and MLLMs
O1 Replication Journey: A Strategic Progress Report – Part I
Visualization of MCTS algorithm applied to Tic-tac-toe.
zeyugao / transformers
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"
[NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling better-reasoned decision-making for daily task planning problems.
Trainable PyTorch framework for developing protein, RNA and complex models.
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
A small repository demonstrating the use of Webdataset and Imagenet
terashuf shuffles multi-terabyte text files using limited memory
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
[ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Ring attention implementation with flash attention