Tags · DefTruth/Awesome-LLM-Inference

v2.6.10

🔥🔥[FFPA] FFPA: Yet another Faster Flash Prefill Attention with O(1) S…

…RAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth) (#111)

🔥🔥[FFPA] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth)

Jan 6, 2025
b8b3a43
zip
tar.gz
Notes

v2.6.9

🔥🔥[HADACORE] HADACORE: TENSOR CORE ACCELERATED HADAMARD TRANSFORM KER…

…NEL (#108)

Dec 22, 2024
6ad7b30
zip
tar.gz
Notes

v2.6.8

🔥[BatchLLM] BatchLLM: Optimizing Large Batched LLM Inference with Glo…

…bal Prefix Sharing and Throughput-oriented Token Batching (#104)

🔥[BatchLLM] BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching

Dec 8, 2024
32fdb84
zip
tar.gz
Notes

v2.6.7

🔥[KV Cache Recomputation] Efficient LLM Inference with I/O-Aware Part…

…ial KV Cache Recomputation (#102)

Dec 1, 2024
9f548f6
zip
tar.gz
Notes

v2.6.6

🔥[SparseInfer] SparseInfer: Training-free Prediction of Activation Sp…

…arsity for Fast LLM Inference (#100)

Nov 25, 2024
40292d7
zip
tar.gz
Notes

v2.6.5

🔥🔥[TP: Comm Compression] Communication Compression for Tensor Paralle…

…l LLM Inference (#94)

🔥🔥[TP: Comm Compression] Communication Compression for Tensor Parallel LLM Inference

Nov 18, 2024
06c76ad
zip
tar.gz
Notes

v2.6.4

🔥[VL-CACHE] VL-CACHE: SPARSITY AND MODALITY-AWARE KV CACHE COMPRESSIO…

…N FOR VISION-LANGUAGE MODEL INFERENCE ACCELERATION

Nov 12, 2024
f3f27a7
zip
tar.gz
Notes

v2.6.3

🔥[Tensor Product] Acceleration of Tensor-Product Operations with Tens…

…or Cores (#90)

Oct 31, 2024
a854d6c
zip
tar.gz
Notes

v2.6.2

🔥[FastAttention] FastAttention: Extend FlashAttention2 to NPUs and Lo…

…w-resource GPUs for Efficient Inference (#88)

Oct 28, 2024
613300d
zip
tar.gz
Notes

v2.6.1

🔥[PARALLELSPEC] PARALLELSPEC: PARALLEL DRAFTER FOR EFFICIENT SPECULAT…

…IVE DECODING (#84)

Oct 10, 2024
7ba03a6
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.6.10

v2.6.9

v2.6.8

v2.6.7

v2.6.6

v2.6.5

v2.6.4

v2.6.3

v2.6.2

v2.6.1

Tags: DefTruth/Awesome-LLM-Inference