- Zurich, Switzerland
Highlights
- Pro
Lists (8)
Sort Name ascending (A-Z)
Stars
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
A tool for bandwidth measurements on NVIDIA GPUs.
collection of benchmarks to measure basic GPU capabilities
Dynamic Memory Management for Serving LLMs without PagedAttention
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
Instruct-tune LLaMA on consumer hardware
🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
Curated collection of papers in machine learning systems
NVIDIA Linux open GPU kernel module source
This repository contains the experimental PyTorch native float8 training UX
A list of ICs and IPs for AI, Machine Learning and Deep Learning.
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Examples demonstrating available options to program multiple GPUs in a single node or a cluster