v0.15.0

@laggui

Summary

This release brings major performance improvements to tensor operations, particularly in matrix multiplication and convolution, along with experimental ROCm/HIP and SPIR-V support enabled by CubeCL runtimes. It also introduces foundational features for multi-backend compatibility and adds new quantization operations.

Support for ONNX models has been expanded, with additional operators and bug fixes for better operator coverage.

As with previous releases, this version includes various bug fixes, further performance optimizations, new tensor operations, and enhanced documentation.

Module & Tensor

Remove copy restriction for const generic modules (#2222) @laggui
Add deform_conv2d as implemented in torchvision (#2147) @wingertge
Add dim checks on output rank for unsqueeze and stack (#2331) @laggui
Add Softmin (#2358) @NoahSchiro
Add round, floor, ceil for float tensor (#2372) @med1844
Make tensor sync (#2392) @kingwingfly
Add tensor.one_hot int operation (#2413) @tsanona
[Breaking] Change LR schedulers to return the initial LR at first .step() (#2337) @towerpark
Move LrSchedule generic to make it easier to use (#2309) @ArthurBrussee
Add quantization ops default implementation (#2125 #2275 2301) @laggui

Bug Fixes

Avoid 0 denominator in interpolate frac (#2224) @laggui
Nonzero should return an empty vec for zero tensors (#2212) @laggui
Change ndarray mask_where implementation to correctly deal with NaNs (#2272) @laggui
Fix mask_where broadcasted input (#2381) @laggui
Make powf broadcastable (#2398) @laggui

Backends

Add candle CudaDevice and MetalDevice to avoid creating a new unique device each time (#2290) @laggui
Add fusion mix precision (#2247) @nathanielsimard
Add SPIR-V compiler backend to burn-wgpu (#2386) @wingertge
Add burn-hip (#2399) @syl20bnr
Add BackendRouter to handle multiple backends on the way to distributed (#2353 #2419) @laggui

Bug Fixes

Fix autodiff memory leak (#2347) @nathanielsimard
Fix autodiff abs NaN when output is 0 (#2249) @AsherJingkongChen

Documentation & Examples

Add documentation for custom cubecl kernels, update some outdated docs (#2404) @wingertge
Add comments to burn fusion (#2130) @cBournhonesque
Improve doc for burn-tch (#2288) @kingwingfly
Improve regression example (#2405) @laggui
Create CITATION.cff (#2231) @antimora
Enable doc_auto_cfg to show feature-req-hint in docs.rs (#2271) @kingwingfly

Fixes

Fix tensor data elem type conversion in book (#2211) @laggui
Fix target convert in batcher and align guide imports (#2215) @laggui
Fix huber loss documentation (#2232) @kingwingfly
Fix debugger settings doc in contributor book (#2223) @tiruka
Fixed raspberry pi pico example not compiling (#2220) @BjornTheProgrammer
Fixed path in book (#2262) @mehmetalianil
Fix unresolved import regression (#2285) @tiruka
Fix burn book links (#2303 #2327) @laggui @tiruka
Contributor Book: Fix the link of primitive types in the "Serialization" page (#2362) @towerpark
Fix simple regression batch targets (#2379) @wangjiawen2013
Fix xtask args which are unmodified when upgrading xtask commands (#2364) @tiruka

ONNX Support

Add gather support for multi-dim indices (rank > 1) (#2199) @alteredoxide
Allow onnx-import expand op with non-const shapes (#2189) @hexd0t
Improve ONNX import tensor shape tracking (#2213) @hexd0t
Add missing output padding to conv transpose ONNX (#2216) @laggui
Fix ONNX where op for scalar inputs (#2218) @hexd0t
simplify scope tracking in burn-import (#2207) @skewballfox
Add onnx op trilu (#2323) @tiruka
Add ConvTranspose1d ONNX op (#2349) @tiruka

Enhancements

Improve slice kernel performance (#2252) @nathanielsimard
Fix burn-jit conv2d excessive loop unrolling (#2263) @AsherJingkongChen
Introduce autotuning to conv2d and conv_transpose2d with a new im2col/GEMM algorithm (#2287) @wingertge
Further data locality optimizations for implicit GEMM (#2300) @wingertge
Add utility methods to split gradients to GradientParams (#2311) @ArthurBrussee
Add bounds checking to implicit GEMM to allow arbitrary input shapes (#2354) @wingertge
Initialize accumulator to bias for implicit GEMM to save an expensive float_add (#2383) @wingertge

Refactoring

Select kernel from CPA to CubeCL (#2168) @mepatrick73
Migrate cubecl macro (#2266) @wingertge
Remove primitves const D generic (#2298) @laggui
Refactor elemwise fusion (#2344) @nathanielsimard
Refactor Adaptive Avg Pool to CubeCL (#2351) @nathanielsimard
Refactor pooling kernels (#2356) @nathanielsimard
Refactor burn-tensor: Split conv backward ops to allow conditional gradient computation (#2278) @AsherJingkongChen

Miscellaneous

Fix panic messages being invisible in tui mode (#2226) @PaulWagener
Refactor xtask to use tracel-xtask and refactor CI workflow (#2063) @syl20bnr
Automatic minimum rust version in README (#2227) @syl20bnr
Set MSRV to 1.81 (#2388) @nathanielsimard
Don't panic when the progress is > 1.0 (#2229) @PaulWagener
Fix compile for dataset crate with vision feature (#2228) @PaulWagener
Update CI workflow for last version of setup-linux action (#2248) @syl20bnr
[CI] Fix llvmpipe, lavapipe install for valgrind and vulnerabilities (#2264) @syl20bnr
Use CliMetricsRenderer when not in a terminal (#2307) @lancelet
Update rusqlite and associated libraries (#2328) @paulirotta
Fix missing fusion feature flag @nathanielsimard
Move conv autotune under feature flag (except key) (#2330) @laggui
Add should_run for convs instead of panicking (#2403) @ArthurBrussee
Make changes for latest ratatui version (#2421) @laggui
Add Windows/WindowsIterator/WindowsDataset (#2338) @NicoZweifel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.15.0

Summary

Module & Tensor

Bug Fixes

Backends

Bug Fixes

Documentation & Examples

Fixes

ONNX Support

Enhancements

Refactoring

Miscellaneous

Contributors