Summary
This release brings major performance improvements to tensor operations, particularly in matrix multiplication and convolution, along with experimental ROCm/HIP and SPIR-V support enabled by CubeCL runtimes. It also introduces foundational features for multi-backend compatibility and adds new quantization operations.
Support for ONNX models has been expanded, with additional operators and bug fixes for better operator coverage.
As with previous releases, this version includes various bug fixes, further performance optimizations, new tensor operations, and enhanced documentation.
Module & Tensor
- Remove copy restriction for const generic modules (#2222) @laggui
- Add deform_conv2d as implemented in torchvision (#2147) @wingertge
- Add dim checks on output rank for unsqueeze and stack (#2331) @laggui
- Add Softmin (#2358) @NoahSchiro
- Add
round
,floor
,ceil
for float tensor (#2372) @med1844 - Make tensor sync (#2392) @kingwingfly
- Add
tensor.one_hot
int operation (#2413) @tsanona - [Breaking] Change LR schedulers to return the initial LR at first
.step()
(#2337) @towerpark - Move LrSchedule generic to make it easier to use (#2309) @ArthurBrussee
- Add quantization ops default implementation (#2125 #2275 2301) @laggui
Bug Fixes
- Avoid 0 denominator in interpolate frac (#2224) @laggui
- Nonzero should return an empty vec for zero tensors (#2212) @laggui
- Change ndarray mask_where implementation to correctly deal with NaNs (#2272) @laggui
- Fix mask_where broadcasted input (#2381) @laggui
- Make powf broadcastable (#2398) @laggui
Backends
- Add candle
CudaDevice
andMetalDevice
to avoid creating a new unique device each time (#2290) @laggui - Add fusion mix precision (#2247) @nathanielsimard
- Add SPIR-V compiler backend to
burn-wgpu
(#2386) @wingertge - Add burn-hip (#2399) @syl20bnr
- Add
BackendRouter
to handle multiple backends on the way to distributed (#2353 #2419) @laggui
Bug Fixes
- Fix autodiff memory leak (#2347) @nathanielsimard
- Fix autodiff abs NaN when output is 0 (#2249) @AsherJingkongChen
Documentation & Examples
- Add documentation for custom
cubecl
kernels, update some outdated docs (#2404) @wingertge - Add comments to burn fusion (#2130) @cBournhonesque
- Improve doc for burn-tch (#2288) @kingwingfly
- Improve regression example (#2405) @laggui
- Create CITATION.cff (#2231) @antimora
- Enable doc_auto_cfg to show feature-req-hint in docs.rs (#2271) @kingwingfly
Fixes
- Fix tensor data elem type conversion in book (#2211) @laggui
- Fix target convert in batcher and align guide imports (#2215) @laggui
- Fix huber loss documentation (#2232) @kingwingfly
- Fix debugger settings doc in contributor book (#2223) @tiruka
- Fixed raspberry pi pico example not compiling (#2220) @BjornTheProgrammer
- Fixed path in book (#2262) @mehmetalianil
- Fix unresolved import
regression
(#2285) @tiruka - Fix burn book links (#2303 #2327) @laggui @tiruka
- Contributor Book: Fix the link of primitive types in the "Serialization" page (#2362) @towerpark
- Fix simple regression batch targets (#2379) @wangjiawen2013
- Fix xtask args which are unmodified when upgrading xtask commands (#2364) @tiruka
ONNX Support
- Add gather support for multi-dim indices (rank > 1) (#2199) @alteredoxide
- Allow onnx-import expand op with non-const shapes (#2189) @hexd0t
- Improve ONNX import tensor shape tracking (#2213) @hexd0t
- Add missing output padding to conv transpose ONNX (#2216) @laggui
- Fix ONNX where op for scalar inputs (#2218) @hexd0t
- simplify scope tracking in burn-import (#2207) @skewballfox
- Add onnx op trilu (#2323) @tiruka
- Add ConvTranspose1d ONNX op (#2349) @tiruka
Enhancements
- Improve slice kernel performance (#2252) @nathanielsimard
- Fix burn-jit conv2d excessive loop unrolling (#2263) @AsherJingkongChen
- Introduce autotuning to
conv2d
andconv_transpose2d
with a newim2col
/GEMM
algorithm (#2287) @wingertge - Further data locality optimizations for implicit GEMM (#2300) @wingertge
- Add utility methods to split gradients to GradientParams (#2311) @ArthurBrussee
- Add bounds checking to implicit GEMM to allow arbitrary input shapes (#2354) @wingertge
- Initialize accumulator to bias for implicit GEMM to save an expensive
float_add
(#2383) @wingertge
Refactoring
- Select kernel from CPA to CubeCL (#2168) @mepatrick73
- Migrate cubecl macro (#2266) @wingertge
- Remove primitves const D generic (#2298) @laggui
- Refactor elemwise fusion (#2344) @nathanielsimard
- Refactor Adaptive Avg Pool to CubeCL (#2351) @nathanielsimard
- Refactor pooling kernels (#2356) @nathanielsimard
- Refactor burn-tensor: Split conv backward ops to allow conditional gradient computation (#2278) @AsherJingkongChen
Miscellaneous
- Fix panic messages being invisible in tui mode (#2226) @PaulWagener
- Refactor xtask to use tracel-xtask and refactor CI workflow (#2063) @syl20bnr
- Automatic minimum rust version in README (#2227) @syl20bnr
- Set MSRV to 1.81 (#2388) @nathanielsimard
- Don't panic when the progress is > 1.0 (#2229) @PaulWagener
- Fix compile for dataset crate with vision feature (#2228) @PaulWagener
- Update CI workflow for last version of setup-linux action (#2248) @syl20bnr
- [CI] Fix llvmpipe, lavapipe install for valgrind and vulnerabilities (#2264) @syl20bnr
- Use CliMetricsRenderer when not in a terminal (#2307) @lancelet
- Update rusqlite and associated libraries (#2328) @paulirotta
- Fix missing fusion feature flag @nathanielsimard
- Move conv autotune under feature flag (except key) (#2330) @laggui
- Add should_run for convs instead of panicking (#2403) @ArthurBrussee
- Make changes for latest ratatui version (#2421) @laggui
- Add Windows/WindowsIterator/WindowsDataset (#2338) @NicoZweifel