Skip to content
View 66RING's full-sized avatar
😈
Chaos !ncoming
😈
Chaos !ncoming

Highlights

  • Pro

Organizations

@ChaosDaily @LosersDelight @aovim

Block or report 66RING

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

Hands-On Practical MLIR Tutorial

C++ 296 40 Updated Oct 20, 2023

Checkpoint/Restore tool

C 2,900 582 Updated Sep 26, 2024

Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".

Python 8 Updated Sep 15, 2024

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 55 3 Updated Sep 26, 2024
Python 15 1 Updated Sep 28, 2024

The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

Python 10 Updated Aug 16, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 533 21 Updated Sep 21, 2024

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Python 1,070 92 Updated Sep 27, 2024

A portable embedded database using Arrow.

Rust 626 41 Updated Sep 28, 2024

使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention

Cuda 46 3 Updated Aug 12, 2024

mimalloc is a compact general purpose allocator with excellent performance.

C 10,432 842 Updated Aug 22, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Python 517 46 Updated Sep 28, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 11,076 940 Updated Aug 21, 2024

SquirrelFS: A crash-consistent Rust file system for persistent memory (OSDI 24)

C 42 2 Updated Aug 26, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 192 13 Updated Sep 24, 2024

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Cuda 164 5 Updated Sep 15, 2024

A Fast and Extensible DRAM Simulator, with built-in support for modeling many different DRAM technologies including DDRx, LPDDRx, GDDRx, WIOx, HBMx, and various academic proposals. Described in the…

C++ 566 209 Updated Aug 29, 2023

A lightweight library for portable low-level GPU computation using WebGPU.

C++ 3,681 175 Updated Sep 23, 2024

Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind

Python 108 2 Updated Aug 23, 2024

[HotStorage'24] Can Modern LLMs Tune and Configure LSM-based Key-Value Stores?

Python 14 1 Updated Jun 6, 2024

The repo for NSDI24 paper: SIEVE is Simpler than LRU: an Efficient Turn-Key Eviction Algorithm for Web Caches

C 46 2 Updated Aug 2, 2024

The Operating System for JudgeDuck -- Stable and Accurate Judge System

C++ 196 5 Updated Apr 26, 2024

Long short token decoding speed up 4x for long context LLM. A hundred lines of core code. Open source for learning.

Python 6 Updated Jul 24, 2024

allocation visualization in svg graph

C++ 129 17 Updated Jul 17, 2024

LLM101n: Let's build a Storyteller

28,967 1,585 Updated Aug 1, 2024

An easy-to-use LLM quantization and inference toolkit based on GPTQ algorithm (weight-only quantization).

Python 92 19 Updated Sep 26, 2024

[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.

Python 74 8 Updated May 16, 2024

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 4,729 382 Updated Sep 26, 2024

A large-scale simulation framework for LLM inference

Python 241 27 Updated Aug 24, 2024
Jupyter Notebook 61 6 Updated Jul 23, 2024
Next