Skip to content

Tags: DefTruth/Awesome-LLM-Inference

Tags

v2.6.10

Toggle v2.6.10's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥🔥[FFPA] FFPA: Yet another Faster Flash Prefill Attention with O(1) S…

…RAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth) (#111)

🔥🔥[FFPA] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth)

v2.6.9

Toggle v2.6.9's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥🔥[HADACORE] HADACORE: TENSOR CORE ACCELERATED HADAMARD TRANSFORM KER…

…NEL (#108)

v2.6.8

Toggle v2.6.8's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥[BatchLLM] BatchLLM: Optimizing Large Batched LLM Inference with Glo…

…bal Prefix Sharing and Throughput-oriented Token Batching (#104)

🔥[BatchLLM] BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching

v2.6.7

Toggle v2.6.7's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥[KV Cache Recomputation] Efficient LLM Inference with I/O-Aware Part…

…ial KV Cache Recomputation (#102)

v2.6.6

Toggle v2.6.6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥[SparseInfer] SparseInfer: Training-free Prediction of Activation Sp…

…arsity for Fast LLM Inference (#100)

v2.6.5

Toggle v2.6.5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥🔥[TP: Comm Compression] Communication Compression for Tensor Paralle…

…l LLM Inference (#94)

🔥🔥[TP: Comm Compression] Communication Compression for Tensor Parallel LLM Inference

v2.6.4

Toggle v2.6.4's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥[VL-CACHE] VL-CACHE: SPARSITY AND MODALITY-AWARE KV CACHE COMPRESSIO…

…N FOR VISION-LANGUAGE MODEL INFERENCE ACCELERATION

v2.6.3

Toggle v2.6.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥[Tensor Product] Acceleration of Tensor-Product Operations with Tens…

…or Cores (#90)

v2.6.2

Toggle v2.6.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥[FastAttention] FastAttention: Extend FlashAttention2 to NPUs and Lo…

…w-resource GPUs for Efficient Inference (#88)

v2.6.1

Toggle v2.6.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🔥[PARALLELSPEC] PARALLELSPEC: PARALLEL DRAFTER FOR EFFICIENT SPECULAT…

…IVE DECODING (#84)