Skip to content
View C-TC's full-sized avatar
  • Zurich, Switzerland

Highlights

  • Pro

Block or report C-TC

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

算法竞赛课件分享

3,901 765 Updated Aug 30, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,559 175 Updated Sep 27, 2024

A tool for bandwidth measurements on NVIDIA GPUs.

C++ 289 28 Updated Jun 14, 2024

Unified Collective Communication Library

C 193 95 Updated Sep 17, 2024

Hands-On Practical MLIR Tutorial

C++ 296 40 Updated Oct 20, 2023

collection of benchmarks to measure basic GPU capabilities

Jupyter Notebook 249 38 Updated Jun 21, 2024
HTML 106 17 Updated Sep 23, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 192 13 Updated Sep 24, 2024

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

C 1,123 421 Updated Sep 28, 2024

Open Fabric Interfaces

C 554 374 Updated Sep 28, 2024

NCCL Profiling Kit

Python 104 11 Updated Jul 1, 2024

Material for gpu-mode lectures

Jupyter Notebook 2,566 256 Updated Sep 23, 2024

⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Vue 5,976 419 Updated Sep 27, 2024

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

Python 285 47 Updated Sep 27, 2024

Instruct-tune LLaMA on consumer hardware

Jupyter Notebook 18,566 2,215 Updated Jul 29, 2024

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,193 127 Updated Sep 28, 2024

本项目旨在分享大模型相关技术原理以及实战经验。

HTML 9,357 916 Updated Sep 22, 2024

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 7,752 937 Updated Sep 26, 2024

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 7,690 451 Updated May 3, 2024

Open MPI main development repository

C 2,130 858 Updated Sep 24, 2024

CUDA Library Samples

Cuda 1,540 324 Updated Sep 10, 2024

Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.

Jupyter Notebook 34 3 Updated Jul 17, 2024

Curated collection of papers in machine learning systems

127 7 Updated Jul 25, 2024

NVIDIA Linux open GPU kernel module source

C 15,058 1,254 Updated Sep 26, 2024

Hardware locality (hwloc)

C 567 173 Updated Sep 26, 2024

Tile primitives for speedy kernels

Cuda 1,503 58 Updated Sep 28, 2024

This repository contains the experimental PyTorch native float8 training UX

Python 212 20 Updated Aug 1, 2024

A list of ICs and IPs for AI, Machine Learning and Deep Learning.

PHP 1,630 273 Updated Jun 5, 2024

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Python 11,614 387 Updated Sep 20, 2024

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 534 106 Updated Aug 14, 2024
Next