Pulse · triton-lang/triton · GitHub

January 26, 2025 – February 2, 2025

Overview

83 Active pull requests

14 Active issues

66 Pull requests merged by 23 people

[PROTON-DEV] Improve profile interface
#5793 merged Feb 2, 2025
[AMD] NFC: Drop unused constructors in elementwise patterns
#5789 merged Feb 2, 2025
[BACKEND] Refactor shared memory layout representation
#5786 merged Feb 1, 2025
Use os.sep in filter_traceback function
#5781 merged Feb 1, 2025
Tutorial 09 Descriptor Kernel
#5779 merged Feb 1, 2025
[Proton] Fixed pc sampling error
#5787 merged Feb 1, 2025
[BC Breaking] Add output dtype to tl.sum with default
#5763 merged Feb 1, 2025
[Pipeliner] Fix condition for pipelining loads
#5780 merged Feb 1, 2025
[AMD] Add GFX950 fp32 to bf16 Conversion Ops
#5782 merged Feb 1, 2025
[BACKEND] bump llvm to ffe3129e9bdc146ee4d91e849173d1c64b1ae974
#5784 merged Feb 1, 2025
[Layouts] Remove sketchy remat condition
#5783 merged Feb 1, 2025
Do not reorder transpose of dot operand that is used in ops other than dotOp
#5686 merged Jan 31, 2025
[DEV] Don't use .ONESHELL in Makefile
#5775 merged Jan 31, 2025
[AMD] Emit AMD specific intrinsics for dot
#4594 merged Jan 31, 2025
[AMD] Rewrite canonicalize pointers to use 1:N conversion
#5329 merged Jan 31, 2025
[PROTON] Skip warnings caused by legacy clang compilers
#5778 merged Jan 31, 2025
Revert "[LAYOUTS] Generalise HoistLayoutConversion to work with arbit…
#5776 merged Jan 31, 2025
[TOOLS] Fixed bug in AOT compiler
#5771 merged Jan 31, 2025
Fix __builtin_clz implementation on Windows
#5774 merged Jan 31, 2025
[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops
#5673 merged Jan 31, 2025
[AMD][BACKEND] Bugfix to small tile pingpong
#5759 merged Jan 31, 2025
[PROTON] Explicitly list all cpp files
#5756 merged Jan 31, 2025
[ANALYSIS][DEBUG] Output theoretical vs actual peak memory allocation size
#5658 merged Jan 31, 2025
[DRIVER] Pass correct SM and PTX versions to llvm
#5770 merged Jan 31, 2025
[Triton] Change xor_sum to use @jit (NFC)
#5769 merged Jan 31, 2025
[DOC] Update core maintainers list
#5767 merged Jan 30, 2025
Use env builtin implementation from LLVM's lit utility for platform independence
#5762 merged Jan 30, 2025
[PROTON] Add the -diff option to proton-viewer
#5740 merged Jan 30, 2025
[BACKEND] Canonicalize ReshapeOp even if not allowing reorder
#5752 merged Jan 30, 2025
[DEV] Unify Makefile and cuda CI commands
#5753 merged Jan 30, 2025
[PIPELINE] Limit number of buffers for register operands
#5755 merged Jan 30, 2025
Reapply "[Layouts] Propagate layouts into conditionals (#5610)"
#5725 merged Jan 30, 2025
[Proton][Dialect] Middle-end Proton operator definitions
#5754 merged Jan 30, 2025
Improve thread locality for reduction ops (#5671)
#5757 merged Jan 30, 2025
[PROTON] Reworked the mechanism for finding libraries for profiling backends.
#5751 merged Jan 30, 2025
[Frontend][Diagnostics] Improve emitting diagnostic information
#5581 merged Jan 30, 2025
[LAYOUTS] Create a trait that implements Layout equality by comparing the LLs
#5747 merged Jan 29, 2025
[BACKEND] Limit vector size to scratch size for convert_layout
#5746 merged Jan 29, 2025
[backend] NFC: Split architecture dependant and independant parts of FMA dot conversion
#5655 merged Jan 29, 2025
[BACKEND] bump to llvm/llvm-project@c118864223c6
#5684 merged Jan 29, 2025
Optimize reduce(reshape_1D)
#5748 merged Jan 29, 2025
[AMD][BACKEND] Disable pingpong with non-local_load input.
#5718 merged Jan 29, 2025
Revert "[PROTON] Prefer the default library path when loading profiler backends"
#5749 merged Jan 29, 2025
Revert "[Coalesce] Fix the default order to be row major "
#5744 merged Jan 29, 2025
[NVIDIA] Use correct commit type for TMA
#5738 merged Jan 29, 2025
[BACKEND] Deprecate SharedToDotOperandMMAv2OrV3.cpp
#5734 merged Jan 29, 2025
[BC Breaking] Make tl.ravel keep element orders by default
#5743 merged Jan 29, 2025
[PROTON-DEV] Add the instrumentation mode and clean up dependencies
#5742 merged Jan 29, 2025
[PROTON] Prefer the default library path when loading profiler backends
#5739 merged Jan 29, 2025
[NVIDIA] Prefer nvvm intrinsics over custom PTX
#5733 merged Jan 29, 2025
[PROTON] Add max flops formula for sm_100
#5736 merged Jan 29, 2025
[NVIDIA] Use native bf16 ops
#5732 merged Jan 28, 2025
[Coalesce] Fix the default order to be row major
#5707 merged Jan 28, 2025
[FRONTEND] Restore error traceback filtering
#5731 merged Jan 28, 2025
[NFC] replace TritonGPUToLLVM/Utility.h macros with TritonLLVMOpBuilder
#5717 merged Jan 28, 2025
[PROTON] Change the output format of pc sampling lines
#5711 merged Jan 28, 2025
[PTXAS] Fix ptxas lineinfo option
#5705 merged Jan 28, 2025
[FRONTEND] Allow JITFunctions as arguments to other JITFunctions
#5723 merged Jan 28, 2025
[PROTON] Correct misuse of strip
#5716 merged Jan 28, 2025
[README] Add instructions to build torch for Blackwell
#5727 merged Jan 28, 2025
Add support for Nvidia Blackwell GPUs
#5724 merged Jan 28, 2025
Add missing include header
#5721 merged Jan 28, 2025
[AMD][Buffer Ops] Leverage MLIR infra for errors in more places
#5719 merged Jan 28, 2025
[AMD] more efficient fp32 to bf16 type conversion
#5633 merged Jan 27, 2025
[NFC] Make attrs type for AOT compilation the same as what is normally used in Triton
#5702 merged Jan 27, 2025
Revert "[Layouts] Propagate layouts into conditionals (#5610)"
#5710 merged Jan 27, 2025

17 Pull requests opened by 17 people

Bump actions/checkout from 3 to 4
#5714 opened Jan 27, 2025
Use `torch.Tensor` with unsigned types directly instead of `TensorWrapper`
#5715 opened Jan 27, 2025
[WIP] Support shared encoding defined with linear layout
#5720 opened Jan 27, 2025
[WIP][Pipeliner] Enable automatic loop fusion
#5726 opened Jan 28, 2025
[AMD] AsyncCopyGlobalToLocal lowering to global.load.lds
#5729 opened Jan 28, 2025
[AMD] Initial support for LDS transpose load instructions
#5750 opened Jan 29, 2025
[AMD] Added `ConcatOp` to AMDGPU Dialect
#5760 opened Jan 30, 2025
[release/3.2.x] Get proper PTX version for CUDA >= 12.6
#5765 opened Jan 30, 2025
[WIP][PIPELINE] Remove outer loop pipelining transformation
#5766 opened Jan 30, 2025
[OPTIMIZER] Fix insertion location in HoistLayoutConversion pattern
#5772 opened Jan 31, 2025
[BACKEND] Update LLVM version to https://github.com/llvm/llvm-project/commit/4573c857da88b3210d497d9a88a89351a74b5964
#5777 opened Jan 31, 2025
[LAYOUTS] Remove HoistLayoutConversion in favour of backwardsRemat
#5788 opened Feb 1, 2025
[INTERPRETER] Support tuple arguments in interpreter
#5790 opened Feb 2, 2025
[PROTON] Parallelize proton tests
#5792 opened Feb 2, 2025
[NFC] Move element bit width into NVMMASharedEncoding
#5794 opened Feb 2, 2025
[INTERP] Support tensor descriptor ops
#5795 opened Feb 3, 2025
[mlir][dialect] Refactor DotLike trait into a DotOpInterface + Enable verification of scaled_dot
#5796 opened Feb 3, 2025

8 Issues closed by 4 people

jit issue when INTERPRETER=1
#5056 closed Feb 2, 2025
import te raise error
#5722 closed Jan 31, 2025
Assertion error when lowering a reduce->reshape->reshape->broadcast pattern to LLIR
#5745 closed Jan 29, 2025
Triton does not really enable -ftz
#5735 closed Jan 29, 2025
Not able to install dependencies file. : triton=3.0.0
#5741 closed Jan 29, 2025
New LLVM version will conflict with macros defined in TritonGPUToLLVM/Utility.h
#5691 closed Jan 28, 2025
Fused Kernel of Conv1d and MM Get Wrong Results.
#5630 closed Jan 28, 2025
TypeError: object of type 'int' has no len()
#5704 closed Jan 27, 2025

6 Issues opened by 6 people

Triton interpreter cannot handle parameters that alias
#5791 opened Feb 2, 2025
"ImportError: cannot import name 'backends' from 'triton.backends' (unknown location)" for triton installed from source
#5773 opened Jan 31, 2025
Misreport "Cannot have `return` statements inside `while` or `for`" if values returned by function are disregarded
#5768 opened Jan 30, 2025
[3.2.x] `ptx_get_version` cannot handle CUDA>12.6
#5737 opened Jan 29, 2025
No module named `triton.backends.nvidia._C` when installing from `triton-3.2.0-cp312-cp312-linux_x86_64.whl`
#5728 opened Jan 28, 2025
[RFC] Improve performance for layer-norm in turtorial
#5712 opened Jan 27, 2025

12 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[Proton][Dialect] Add Proton Device Memory Buffer Init and Allocate Pass
#5606 commented on Feb 1, 2025 • 6 new comments
[AMD-Pipeline] Add multi-stage global/local prefetch
#5353 commented on Jan 28, 2025 • 3 new comments
[AMD] Enable pingpong scheduling by default
#5696 commented on Feb 1, 2025 • 2 new comments
The Precision Issue of the GELU Operator
#5692 commented on Jan 27, 2025 • 0 new comments
Cannot find 2.0.0.dev20221202 version
#4511 commented on Jan 29, 2025 • 0 new comments
Is Triton unable to install in python 3.10 versions?
#1057 commented on Jan 31, 2025 • 0 new comments
[WIP][SWP] Print recurring dependencies when reporting scheduling conflicts
#5375 commented on Jan 30, 2025 • 0 new comments
[AMD] refactor convert buffer ops
#5563 commented on Jan 31, 2025 • 0 new comments
[WIP] [AMD] Remove "remove unsupported conversions" pass
#5674 commented on Jan 31, 2025 • 0 new comments
[Proton][Dialect] Middle-end support of the Proton Dialect and the frontend Python package
#5677 commented on Feb 1, 2025 • 0 new comments
Fix assertion in ScanLowering for num_ctas>1
#5680 commented on Jan 29, 2025 • 0 new comments
[WIP][AMD] Add MFMA and WMMA layouts to LinearEncodingTest
#5698 commented on Jan 29, 2025 • 0 new comments