GitHub - LeiWang1999/Stream-k.tvm

Reproduce the PPoPP'23 Paper Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU with TVM TIR and TL, which could be helpful for us to optimize the performance for small shapes.

Dependencies:

pip install git+https://github.com/microsoft/BitBLAS.git

TODO Items:

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
autotune		autotune
dataparallel.fp16.breakdown		dataparallel.fp16.breakdown
figures		figures
streamk.fp16.breakdown		streamk.fp16.breakdown
streamk.fp32.breakdown		streamk.fp32.breakdown
unit_test		unit_test
.gitignore		.gitignore
README.md		README.md
streamk_fp16.py		streamk_fp16.py
streamk_fp32.py		streamk_fp32.py

Provide feedback