-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Insights: triton-lang/triton
Overview
Could not load contribution data
Please try again later
66 Pull requests merged by 23 people
-
[PROTON-DEV] Improve profile interface
#5793 merged
Feb 2, 2025 -
[AMD] NFC: Drop unused constructors in elementwise patterns
#5789 merged
Feb 2, 2025 -
[BACKEND] Refactor shared memory layout representation
#5786 merged
Feb 1, 2025 -
Use
os.sep
infilter_traceback
function#5781 merged
Feb 1, 2025 -
Tutorial 09 Descriptor Kernel
#5779 merged
Feb 1, 2025 -
[Proton] Fixed pc sampling error
#5787 merged
Feb 1, 2025 -
[BC Breaking] Add output dtype to
tl.sum
with default#5763 merged
Feb 1, 2025 -
[Pipeliner] Fix condition for pipelining loads
#5780 merged
Feb 1, 2025 -
[AMD] Add GFX950 fp32 to bf16 Conversion Ops
#5782 merged
Feb 1, 2025 -
[BACKEND] bump llvm to ffe3129e9bdc146ee4d91e849173d1c64b1ae974
#5784 merged
Feb 1, 2025 -
[Layouts] Remove sketchy remat condition
#5783 merged
Feb 1, 2025 -
Do not reorder transpose of dot operand that is used in ops other than dotOp
#5686 merged
Jan 31, 2025 -
[DEV] Don't use .ONESHELL in Makefile
#5775 merged
Jan 31, 2025 -
[AMD] Emit AMD specific intrinsics for dot
#4594 merged
Jan 31, 2025 -
[AMD] Rewrite canonicalize pointers to use 1:N conversion
#5329 merged
Jan 31, 2025 -
[PROTON] Skip warnings caused by legacy clang compilers
#5778 merged
Jan 31, 2025 -
Revert "[LAYOUTS] Generalise HoistLayoutConversion to work with arbit…
#5776 merged
Jan 31, 2025 -
[TOOLS] Fixed bug in AOT compiler
#5771 merged
Jan 31, 2025 -
Fix
__builtin_clz
implementation on Windows#5774 merged
Jan 31, 2025 -
[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops
#5673 merged
Jan 31, 2025 -
[AMD][BACKEND] Bugfix to small tile pingpong
#5759 merged
Jan 31, 2025 -
[PROTON] Explicitly list all
cpp
files#5756 merged
Jan 31, 2025 -
[ANALYSIS][DEBUG] Output theoretical vs actual peak memory allocation size
#5658 merged
Jan 31, 2025 -
[DRIVER] Pass correct SM and PTX versions to llvm
#5770 merged
Jan 31, 2025 -
[Triton] Change
xor_sum
to use@jit
(NFC)#5769 merged
Jan 31, 2025 -
[DOC] Update core maintainers list
#5767 merged
Jan 30, 2025 -
Use
env
builtin implementation from LLVM's lit utility for platform independence#5762 merged
Jan 30, 2025 -
[PROTON] Add the
-diff
option toproton-viewer
#5740 merged
Jan 30, 2025 -
[BACKEND] Canonicalize ReshapeOp even if not allowing reorder
#5752 merged
Jan 30, 2025 -
[DEV] Unify Makefile and cuda CI commands
#5753 merged
Jan 30, 2025 -
[PIPELINE] Limit number of buffers for register operands
#5755 merged
Jan 30, 2025 -
Reapply "[Layouts] Propagate layouts into conditionals (#5610)"
#5725 merged
Jan 30, 2025 -
[Proton][Dialect] Middle-end Proton operator definitions
#5754 merged
Jan 30, 2025 -
Improve thread locality for reduction ops (#5671)
#5757 merged
Jan 30, 2025 -
[PROTON] Reworked the mechanism for finding libraries for profiling backends.
#5751 merged
Jan 30, 2025 -
[Frontend][Diagnostics] Improve emitting diagnostic information
#5581 merged
Jan 30, 2025 -
[LAYOUTS] Create a trait that implements Layout equality by comparing the LLs
#5747 merged
Jan 29, 2025 -
[BACKEND] Limit vector size to scratch size for convert_layout
#5746 merged
Jan 29, 2025 -
[backend] NFC: Split architecture dependant and independant parts of FMA dot conversion
#5655 merged
Jan 29, 2025 -
[BACKEND] bump to llvm/llvm-project@c118864223c6
#5684 merged
Jan 29, 2025 -
Optimize reduce(reshape_1D)
#5748 merged
Jan 29, 2025 -
[AMD][BACKEND] Disable pingpong with non-local_load input.
#5718 merged
Jan 29, 2025 -
Revert "[PROTON] Prefer the default library path when loading profiler backends"
#5749 merged
Jan 29, 2025 -
Revert "[Coalesce] Fix the default order to be row major "
#5744 merged
Jan 29, 2025 -
[NVIDIA] Use correct commit type for TMA
#5738 merged
Jan 29, 2025 -
[BACKEND] Deprecate
SharedToDotOperandMMAv2OrV3.cpp
#5734 merged
Jan 29, 2025 -
[BC Breaking] Make tl.ravel keep element orders by default
#5743 merged
Jan 29, 2025 -
[PROTON-DEV] Add the instrumentation mode and clean up dependencies
#5742 merged
Jan 29, 2025 -
[PROTON] Prefer the default library path when loading profiler backends
#5739 merged
Jan 29, 2025 -
[NVIDIA] Prefer nvvm intrinsics over custom PTX
#5733 merged
Jan 29, 2025 -
[PROTON] Add max flops formula for sm_100
#5736 merged
Jan 29, 2025 -
[NVIDIA] Use native bf16 ops
#5732 merged
Jan 28, 2025 -
[Coalesce] Fix the default order to be row major
#5707 merged
Jan 28, 2025 -
[FRONTEND] Restore error traceback filtering
#5731 merged
Jan 28, 2025 -
[NFC] replace TritonGPUToLLVM/Utility.h macros with TritonLLVMOpBuilder
#5717 merged
Jan 28, 2025 -
[PROTON] Change the output format of pc sampling lines
#5711 merged
Jan 28, 2025 -
[PTXAS] Fix ptxas lineinfo option
#5705 merged
Jan 28, 2025 -
[FRONTEND] Allow JITFunctions as arguments to other JITFunctions
#5723 merged
Jan 28, 2025 -
[PROTON] Correct misuse of
strip
#5716 merged
Jan 28, 2025 -
[README] Add instructions to build torch for Blackwell
#5727 merged
Jan 28, 2025 -
Add support for Nvidia Blackwell GPUs
#5724 merged
Jan 28, 2025 -
Add missing include header
#5721 merged
Jan 28, 2025 -
[AMD][Buffer Ops] Leverage MLIR infra for errors in more places
#5719 merged
Jan 28, 2025 -
[AMD] more efficient fp32 to bf16 type conversion
#5633 merged
Jan 27, 2025 -
[NFC] Make
attrs
type for AOT compilation the same as what is normally used in Triton#5702 merged
Jan 27, 2025 -
Revert "[Layouts] Propagate layouts into conditionals (#5610)"
#5710 merged
Jan 27, 2025
17 Pull requests opened by 17 people
-
Bump actions/checkout from 3 to 4
#5714 opened
Jan 27, 2025 -
Use `torch.Tensor` with unsigned types directly instead of `TensorWrapper`
#5715 opened
Jan 27, 2025 -
[WIP] Support shared encoding defined with linear layout
#5720 opened
Jan 27, 2025 -
[WIP][Pipeliner] Enable automatic loop fusion
#5726 opened
Jan 28, 2025 -
[AMD] AsyncCopyGlobalToLocal lowering to global.load.lds
#5729 opened
Jan 28, 2025 -
[AMD] Initial support for LDS transpose load instructions
#5750 opened
Jan 29, 2025 -
[AMD] Added `ConcatOp` to AMDGPU Dialect
#5760 opened
Jan 30, 2025 -
[release/3.2.x] Get proper PTX version for CUDA >= 12.6
#5765 opened
Jan 30, 2025 -
[WIP][PIPELINE] Remove outer loop pipelining transformation
#5766 opened
Jan 30, 2025 -
[OPTIMIZER] Fix insertion location in HoistLayoutConversion pattern
#5772 opened
Jan 31, 2025 -
[LAYOUTS] Remove HoistLayoutConversion in favour of backwardsRemat
#5788 opened
Feb 1, 2025 -
[INTERPRETER] Support tuple arguments in interpreter
#5790 opened
Feb 2, 2025 -
[PROTON] Parallelize proton tests
#5792 opened
Feb 2, 2025 -
[NFC] Move element bit width into NVMMASharedEncoding
#5794 opened
Feb 2, 2025 -
[INTERP] Support tensor descriptor ops
#5795 opened
Feb 3, 2025 -
[mlir][dialect] Refactor DotLike trait into a DotOpInterface + Enable verification of scaled_dot
#5796 opened
Feb 3, 2025
8 Issues closed by 4 people
-
jit issue when INTERPRETER=1
#5056 closed
Feb 2, 2025 -
import te raise error
#5722 closed
Jan 31, 2025 -
Assertion error when lowering a reduce->reshape->reshape->broadcast pattern to LLIR
#5745 closed
Jan 29, 2025 -
Triton does not really enable -ftz
#5735 closed
Jan 29, 2025 -
Not able to install dependencies file. : triton=3.0.0
#5741 closed
Jan 29, 2025 -
New LLVM version will conflict with macros defined in TritonGPUToLLVM/Utility.h
#5691 closed
Jan 28, 2025 -
Fused Kernel of Conv1d and MM Get Wrong Results.
#5630 closed
Jan 28, 2025 -
TypeError: object of type 'int' has no len()
#5704 closed
Jan 27, 2025
6 Issues opened by 6 people
-
Triton interpreter cannot handle parameters that alias
#5791 opened
Feb 2, 2025 -
[3.2.x] `ptx_get_version` cannot handle CUDA>12.6
#5737 opened
Jan 29, 2025 -
[RFC] Improve performance for layer-norm in turtorial
#5712 opened
Jan 27, 2025
12 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[Proton][Dialect] Add Proton Device Memory Buffer Init and Allocate Pass
#5606 commented on
Feb 1, 2025 • 6 new comments -
[AMD-Pipeline] Add multi-stage global/local prefetch
#5353 commented on
Jan 28, 2025 • 3 new comments -
[AMD] Enable pingpong scheduling by default
#5696 commented on
Feb 1, 2025 • 2 new comments -
The Precision Issue of the GELU Operator
#5692 commented on
Jan 27, 2025 • 0 new comments -
Cannot find 2.0.0.dev20221202 version
#4511 commented on
Jan 29, 2025 • 0 new comments -
Is Triton unable to install in python 3.10 versions?
#1057 commented on
Jan 31, 2025 • 0 new comments -
[WIP][SWP] Print recurring dependencies when reporting scheduling conflicts
#5375 commented on
Jan 30, 2025 • 0 new comments -
[AMD] refactor convert buffer ops
#5563 commented on
Jan 31, 2025 • 0 new comments -
[WIP] [AMD] Remove "remove unsupported conversions" pass
#5674 commented on
Jan 31, 2025 • 0 new comments -
[Proton][Dialect] Middle-end support of the Proton Dialect and the frontend Python package
#5677 commented on
Feb 1, 2025 • 0 new comments -
Fix assertion in ScanLowering for num_ctas>1
#5680 commented on
Jan 29, 2025 • 0 new comments -
[WIP][AMD] Add MFMA and WMMA layouts to LinearEncodingTest
#5698 commented on
Jan 29, 2025 • 0 new comments