Releases: coreylowman/dfdx
Releases · coreylowman/dfdx
v0.13.0 - `dtypes` module & adds `AMP<F>` dtype
What's Changed
- Make
storage_traits::TensorToArray
pub by @AndrejOrsula in #817 - accurate-gelu by @jcrist1 in #813
- Fixing examples/04-gradients.rs by @coreylowman in #824
- Moving optim kernels to tensor ops by @coreylowman in #828
- [Breaking] Adds
AMP<F>
dtype by @coreylowman in #811 - Adding documentation to dtypes module and amp by @coreylowman in #834
New Contributors
- @AndrejOrsula made their first contribution in #817
- @jcrist1 made their first contribution in #813
Full Changelog: v0.12.1...v0.13.0
v0.12.1 - Re-export f16 dtype & making more apis public
What's Changed
- Allow models to be backward compatible through #799 by @nkoppel in #808
- Various small fixes by @nkoppel in #814
- Making all shape traits public by @coreylowman in #816
Full Changelog: v0.12.0...v0.12.1
v0.12.0 - Adds f16 dtype
Breaking changes
- [Breaking] Adding Tensor::try_realize, and Tensor::realize no longer returns Result by @coreylowman in #758
- [Breaking] ReshapeTo::reshape_like and ReshapeTo::try_reshape_like now panic instead of returning option by @coreylowman in #766
- [Breaking] Adding dilation/groups to Conv2D. Adding dilation to Pool2D by @coreylowman in #767
- [Breaking] Use
gemm
for matmul. Removes support for matrixmultiply & MKL by @coreylowman in #776 - [Breaking] Moving storage GAT to trait level generic. Split DeviceStorage into multiple traits by @coreylowman in #782
- [Breaking] Adding dilation/groups to ConvTranspose2D by @coreylowman in #783
What's Changed
- Adding f16 as Dtype by @coreylowman in #696
- Adding example by @sirandreww in #740
- Adds TryConcatAlong to support Concat along any axis by @coreylowman in #750
- Changed CUDA_ARCH in compatibility.cuh by @jafioti in #752
- Allow
broadcast_like
to accept tensors OR shapes by @VasanthakumarV in #751 - Removing rerun build.rs for output destination by @coreylowman in #754
- Fixing compatibility for compute cap 70-75 by @coreylowman in #757
- Adds TriangleTensor and CmpKernel traits to Device bound by @coreylowman in #760
- Using Bernoulli distribution in dropout - makes dropout reproducible across dtypes by @coreylowman in #761
- Fixes bug with f16 mean where number of elements reduced was f16::INF by @coreylowman in #763
- Placeholder f16 gemm speedups by @coreylowman in #765
- MultiHeadAttention 3d impl now broadcasts to 4d instead of duplicating logic by @coreylowman in #768
- Moving
cudarc?/f16
behindf16
feature by @coreylowman in #774 - impl Clone for Adam, SGD, RMSprop by @coreylowman in #775
- Properly setting read_dst for gemm in forward/backward pass by @coreylowman in #777
- Adds rayon dependency. Using
gemm::Parallelism::Rayon(rayon::current_num_threads())
by @coreylowman in #778 - Add LogSoftmax by @kurnevsky in #769
- Moving some tests off nightly. Adding docs to conv2d op by @coreylowman in #779
- Adding better error messages if nvidia-smi/nvcc are not found by @coreylowman in #784
- Using for loop with gridDim.x * blockDim.x as increment by @coreylowman in #787
- Removing __hmax and __hmin compat functions by @coreylowman in #788
- Uses grid striding in fill_with by @coreylowman in #790
- Exposed NumpyDType publicly by @jafioti in #791
- Fixing weight shape for grouped Conv2D by @coreylowman in #797
- Bump half/cudarc versions by @coreylowman in #805
- Using Groups in conv weight init by @coreylowman in #806
- Add scalar support to TensorCollection by @nkoppel in #799
New Contributors
- @sirandreww made their first contribution in #740
- @kurnevsky made their first contribution in #769
Full Changelog: v0.11.2...v0.12.0
v0.11.2 - Tensor caching & other nice features
What's Changed
- Simplify upscale cuda kernels by @coreylowman in #680
- JIT compiling stack/concat cuda kernels by @coreylowman in #684
- Initial merging of nvidia-smi and nvcc checks by @quietlychris in #685
- feat: use
Cow
when appropriate by @Alexandcoats in #688 - Add const generic
NUM_THREADS
arg to launch_cfg by @VasanthakumarV in #691 - feat: add
Tensorlike
to clean up spooky ghosts by @Alexandcoats in #689 - Add
contiguous
andtry_contiguous
methods by @VasanthakumarV in #690 - (feat) add device access method by @ccaven in #692
- Add examples of runtime dimensions to
examples/02-ops.rs
by @VasanthakumarV in #698 - Prevent over-allocation for broadcasted outputs of sum_to by @nkoppel in #699
- Adds caching layer to tensor allocations by @coreylowman in #670
- Handle \r in build.rs by @ViliamVadocz in #702
- Disabling cache by default & adds enable_cache() by @coreylowman in #704
- Typos in feature_flags.rs by @mauvray in #710
- Adds
mat * vec
impl for matmul by @coreylowman in #716 - Adds better assertion macros for testing by @coreylowman in #714
- Combining multiple github workflows for reuse by @coreylowman in #717
- Changing nn ToDtype to use generic on method by @coreylowman in #719
- Flatten2D now accepts generic batch dim by @coreylowman in #720
- Uses
impl Into<E>
for scalar binary ops when possible by @coreylowman in #722 - Adds cudnn section to feature flags by @coreylowman in #723
- Impls for (T,) by @opfromthestart in #725
- Fixing dependencies for no-std by @coreylowman in #736
- Adds rust 1.65 as the minimum rust compiler version by @coreylowman in #737
- Moves scalar comparison to use the same method as tensor comparison. Deprecates
try_scalar_*
/scalar_*
. by @coreylowman in #738 - Run CI for all kind of pushes - not only pull request related ones by @YannickFricke in #739
- Adds
Tensor::to_device
to support sending tensors of any shape to any device by @coreylowman in #741
New Contributors
- @VasanthakumarV made their first contribution in #691
- @ccaven made their first contribution in #692
- @mauvray made their first contribution in #710
- @YannickFricke made their first contribution in #739
Full Changelog: v0.11.1...v0.11.2
v0.11.1 - cudnn, optimizations, and new ops/nn layers
What's Changed
- Fix bug in gather cuda kernel by @nkoppel in #588
- feat(device): introduce AutoDevice type by @kakoc in #579
- Use Recursive Macros to Implement Shape Operation Traits. by @nkoppel in #583
- Add ToDtype tensor operation by @nkoppel in #582
- Using 128 threads by default for cuda kernels by @coreylowman in #599
- Add Slice tensor operation. by @nkoppel in #602
- Optimizing conv kernels a bit by @coreylowman in #605
- feat: add upper/lower triangles (tril and triu) allocations by @Alexandcoats in #568
- Adds Tensor::roll by @coreylowman in #608
- Using multiple streams for matmul with cuda by @coreylowman in #610
- Fix no-std support by @Alexandcoats in #615
- Adds matrixmultiply/std to std feature by @kstavro in #618
- Implement concat for usize arrays; add concat to Device. by @nkoppel in #621
- Allow conv2d and pool2d to use dynamic dimensions for width and height. by @nkoppel in #620
- Switch to using nvcc --list-gpu-code for build.rs compute_cap by @quietlychris in #619
- Fix bug in reshape on cuda by @nkoppel in #622
- Don't always do try_min in pool_global.rs by @nkoppel in #623
- Revert "Switch to using nvcc --list-gpu-code for build.rs compute_cap… by @coreylowman in #624
- Adds
restrided
in favor ofget_unstrided_index
->get_strided_index
by @coreylowman in #628 - Combines multiple calls to get_strided_index into a single loop by @coreylowman in #629
- Reducing number of buffers sent to cuda for some operations by @coreylowman in #611
- Optimizing conv2d more by @coreylowman in #631
- Add ability to include smaller last batch by @nkoppel in #632
- Upscale2D and ConvTrans2d by @opfromthestart in #603
- impl Dtype for all Unit types except bool by @coreylowman in #635
- Allow convtrans2d to use dynamic dimensions by @nkoppel in #639
- JIT compiling kernel for to_dtype & reshape by @coreylowman in #634
- Optimize conv transpose kernels to do same thing as conv by @coreylowman in #641
- Reworking crate level documentation by @coreylowman in #644
- Adds synchronize to DeviceStorage by @coreylowman in #645
- adding usize dtype to cuda_kernel by @zojeda in #648
- Add PReLU and LeakyReLU by @opfromthestart in #586
- Moving logsumexp normalization off of graph by @coreylowman in #652
- Adding CmpKernels to Device, more documentation by @coreylowman in #653
- Removing bounds checking from cpu conv kernel folding by @coreylowman in #650
- Allow upscale2d to use dynamic dimensions by @nkoppel in #654
- Adding integration test for resnet18 by @coreylowman in #655
- Removing some un-necessary blanket impls by @coreylowman in #656
- Fixes conv transpose stride bug, adds more docs to upscale2d by @coreylowman in #658
- Some QOL fixes by @opfromthestart in #659
- Optimizing softmax & log_softmax by @coreylowman in #660
- Reuse f(x) for unary operations when possible. by @coreylowman in #661
- Allocating gradients in backward op by @coreylowman in #663
- Adds
Tensor::recip
(1 / x
) by @coreylowman in #665 - Reshape layer by @opfromthestart in #666
- Re-using tensor storage when possible by @coreylowman in #664
- Adds cudnn feature flag. Removes "test-cuda" feature flag. Using cuDNN for convolutions by @coreylowman in #651
- Always attempting allocation reuse during inference by @coreylowman in #673
- Clarify reshape behavior in docs by @coreylowman in #674
- Have SplitInto keep tapes of each head seperate by @nkoppel in #676
- Using arch option in nvrtc by @coreylowman in #675
New Contributors
- @kakoc made their first contribution in #579
- @quietlychris made their first contribution in #619
- @opfromthestart made their first contribution in #603
- @zojeda made their first contribution in #648
Full Changelog: v0.11.0...v0.11.1
v0.11.0 - Cuda support, mixed const/runtime tensors, and device rewrite
What's Changed
- AddInto by @Dimev in #256
- added 5d & 6d tensors by @M1ngXU in #283
- Remove phantom by @M1ngXU in #282
- remove tensor bound by @Dimev in #297
- Adding nightly to cargo-test by @JYudelson1 in #294
- Devices/Dyn dimensions refactor by @coreylowman in #304
- Add instructions for running the mnist example. by @infalmo in #310
- Removes Dyn. Use usize directly by @coreylowman in #315
- Making f32 default dtype for Tensor, updating examples/docstrings by @coreylowman in #316
- Only running gha on push by @coreylowman in #317
- Adding Unit and HasUnitType. Reducing bounds for Dtype by @coreylowman in #313
- Removing build_test_device. Using TestDevice everywhere by @coreylowman in #324
- Adding SampleTensor, Removing RandTensor/RandnTensor by @coreylowman in #327
- Removing usages of tensor aliases by @coreylowman in #328
- Moving intel-mkl stuff into sub module in build.rs by @coreylowman in #329
- Adding Cuda device and skeleton cuda kernel impls by @coreylowman in #322
- Implementing abs/exp/div/sum_to cuda kernels by @coreylowman in #331
- permute_to and broadcast_to cuda kernels by @coreylowman in #343
- Add cuda implementations for unary and binary tensor operations in #341 and #334 by @nkoppel in #346
- Using atomicAdd in binary op backwards to properly handle strides by @coreylowman in #350
- Resolve #352 and #347 by @nkoppel in #354
- Implement reshape cuda kernel (resolves #336) by @nkoppel in #356
- Add missing device generic in transformer test by @ViliamVadocz in #358
- Add select and gather cuda kernels. by @nkoppel in #359
- Upgrade to cudarc 0.6.0 by @coreylowman in #361
- Add tests for binary broadcasted add and fix bugs to allow them to pass. by @nkoppel in #357
- run GHA on pull_request by @coreylowman in #364
- matmul cuda kernels by @coreylowman in #342
- Adding dynamic example. by @Narsil in #368
- Add cuda kernels for min_to/max_to by @coreylowman in #370
- Adding dropout cuda kernel by @coreylowman in #372
- Adding ConstDim and ConstShape for tensor creation by @coreylowman in #373
- Fixing computation of lda/ldb/ldc with cblas by @coreylowman in #375
- Modify sum_to cuda kernel to not need atomic adds in backwards by @nkoppel in #367
- Simplifying
trait Conv2DKernel
and Cpu implementation by @coreylowman in #376 - (#344) Implement cuda kernels for optimizers by @nkoppel in #378
- Fix max_to and min_to edge case with negative zero by @ViliamVadocz in #380
- Add cuda kernels for conv2d by @coreylowman in #369
- Rework pool2d internals & add pool2d cuda kernels by @coreylowman in #384
- Implement Shape for arrays (#377) by @nkoppel in #385
- Efficient cuda kernels for reductions by @nkoppel in #382
- Improving compilation times of deeply nested const generic modules by @coreylowman in #391
- Fixing remainder of cuda tests & fixing cblas/cublas matmul with strides [1,1] by @coreylowman in #393
- Adding Cuda device usage to mnist example by @coreylowman in #396
- Adding GeLU operator (used in Gpt2) by @Narsil in #397
- Removing codecov from workflows/readme by @coreylowman in #403
- Reorganize tensor_ops, and add cuda_utils.cuh by @nkoppel in #398
- Some small optimizations for conv2d on cpu by @coreylowman in #404
- Removing Device generic from Gradients & optimizers by @coreylowman in #402
- Add ToDevice and OnDevice to simplify nn api (#388) by @nkoppel in #394
- Removes
ModuleBuilder
, AddsBuildModule
&BuildOnDevice
by @coreylowman in #405 - Enable multi-core matmul by @infalmo in #417
- Fix GELU CUDA kernel compilation by @ViliamVadocz in #409
- Adding nn.Embedding layer. by @Narsil in #406
- Removing defaults for Tensor Dtype & Device generic parameters by @coreylowman in #418
- Removing Default for optimizers & adding &M to constructors by @coreylowman in #422
- Adding runtime assertion in
try_binary_op
that shapes are equal by @coreylowman in #428 - Add boolean operations and choose. by @nkoppel in #415
- Add TensorFrom trait to create tensors from both vectors and arrays. by @nkoppel in #414
- Adding nn builder structs, dtype generics, and remove device defaults. by @coreylowman in #433
- Upgrade to cudarc==0.7.0 and use alloc_async instead of alloc_zeros_async by @coreylowman in #440
- Add comparison tensor operations by @ViliamVadocz in #386
- Add synchronize method to Cuda device by @ViliamVadocz in #442
- f64 kernels by @coreylowman in #421
- Add stack tensors method by @coreylowman in #449
- cargo check cuda & run f64 tests in CI by @coreylowman in #447
- Fix bug in #451 by @nkoppel in #453
- Add more runtime shape checks by @coreylowman in #454
- Adding ReshapeTo::reshape_like by @coreylowman in #456
- Adding SampleTensor::sample_uniform_like and SampleTensor::sample_normal_like by @coreylowman in #457
- Improve examples (add Cuda) by @TimerErTim in #452
- Dataset iterators - adds batching, collating for iterators by @coreylowman in #462
- Fixing issue with to_device and broadcasted tensors by @coreylowman in #465
- Bump cudarc 0.7.2 by @coreylowman in #466
- Adding index out of bounds checks to select/gather kernels by @coreylowman in #467
- Rename to
add_dim
. by @infalmo in #471 - impl BuildModule for ZeroSizedModule by @coreylowman in #470
- Adds TensorCollection by @coreylowman in #469
- Fixing cargo doc warnings by @coreylowman in #473
- Using
--gpu-architecture native
with nvcc by @coreylowman in #474 - using TensorFromVec for OneHotEncode and Arange by @coreylowman in #477
- Small batchnorm optimizations by @coreylowman in #478
- nvcc: fixed type bug by @M1ngXU in #480
- Adds fast_alloc feature and binary kernel optimizations by @coreylowman in #481
- Adding some "benchmarking" scripts by @coreylowman in #483
- Add try_forward and try_forward_mut to Module and ModuleMut. by @nkoppel in #482
- Optimizing cpu kernels for reductions by @coreylowman in #484
- Using alloc_zeros_async and memset_zeros for cuda by @coreylowman in #489
- Making Conv2D unbiased by default, and adding Bias2D module by @coreylowman in #494
- Using image/filter stride in cuda kernel for conv by @coreylowman in #495
- bump cudarc version by @coreylowman in #498
- Adding attention_reshape (inference only) kernels. by @Narsil in #497
- Adding lifetime to gat in Exact...
v0.10.0
What's Changed
Breaking Changes
- Binary ops (
add
,sub
,div
,mul
,maximum
,minimum
) take ownership of rhs by @coreylowman in #268 - backwards only allows 0d tensors now by @coreylowman in #206
- Clone now keeps same id, removing Tensor::duplicate by @coreylowman in #249
- Multi axis reductions
- See docs
- #189, #190, #194
- Reduction functions now can reduce across any axis/axes:
mean
,sum
,max
,min
,stddev
,var
,softmax
,log_softmax
, andlogsumexp
- Remove
-1
from valid axes, addtrait HasLastAxis
to use in generic functions instead - Adding
normalize
function that normalizes across any axis - Removing single axis reduction functions
fn *_axis()
:mean_axis
,sum_axis
,max_axis
,min_axis
,normalize_axis
,std_axis
,var_axis
- Rename
HasAxis
toHasAxes
- Add
trait BroadcastTo
- Remove
trait Broadcast1
,trait Broadcast2
,trait Broadcast3
,trait Broadcast4
- Remove
- Add
trait Reduce
/trait ReduceTo
- Remove
trait Reduce1
- Remove
- Batched select & select consistency
- See docs
- Renaming SelectTo, using SelectTo for batched select by @coreylowman in #217
- Add Batched Select for devices and tensor_ops by @coreylowman in #182
- Reduce things in prelude by @coreylowman in #209
- Renaming FlattenImage to Flatten2D by @coreylowman in #243
New features
Arc
in Tensors instead of Rc by @caelunshun in #236powi()
andpowf()
functions by @coreylowman in #167no_std
support- See feature flags docs
- Remove num-traits, no default features on depends by @coreylowman in #200
- Adding intel-mkl feature and removing the 4 mkl-- features by @coreylowman in #239
- Adding module that has docs for feature flags by @coreylowman in #240
- Adding "numpy" feature to make numpy & npz optional by @coreylowman in #241
- Adding
#![no_std]
support viano_std_compat
by @coreylowman in #244 - Adding default-features = false to dependencies by @coreylowman in #257
- Adding Axis permutations via
trait PermuteTo
. - Adding
trait ModuleMut
- See docs
- #225
- Removing Module super traits by @coreylowman in #223
- Rework Dropout/DropoutOneIn to use ModuleMut by @coreylowman in #226
- Adding decoupled/l2 weight decay in optimizers:
- See docs
- add HasArrayData to GradientProvider by @cBournhonesque in #261
- Add weight decay to SGD by @cBournhonesque in #258
- Adding weight_decay to Adam by @coreylowman in #275
- Adding weight decay to RMSprop by @coreylowman in #276
- Adding
nn::Transformer
#175, #173, #180- See docs
- Adding
nn::MinPool2D
,nn::MaxPool2D
,nn::AvgPool2D
by @coreylowman in #214- See docs
- Adding
nn::MinPoolGlobal
,nn::MaxPoolGlobal
,nn::AvgPoolGlobal
by @coreylowman in #216- See docs
- Adding
nn::BatchNorm2D
by @coreylowman in #228- See docs
Misc changes
- Add tensor() function as a convenient way to make tensors from arrays by @coreylowman in #161
- See docs
- Remove allocation in dropout implementation by @coreylowman in #164
- Removing Tensor::OwnedTape by @coreylowman in #197
- Revamping examples/ by @coreylowman in #205
- Conv cleanup
- Moving conv into device and cleaning up a bit by @coreylowman in #212
- Minifying conv impls by @coreylowman in #213
- Changing conv2d and conv2d_batched to methods of tensors by @coreylowman in #221
- Replacing conv2d implementation with matmuls by @coreylowman in #237
- Fix typos by @cBournhonesque in #235
- Combining multiple where clauses with const generics into a single one by @coreylowman in #264
- Checking for null ptr in AllocateZeros by @coreylowman in #271
- Reducing allocations in
map_df_uses_fx
by @coreylowman in #272 - Adding with_empty_tape and with_diff_tape by @coreylowman in #274
New Contributors
- @cBournhonesque made their first contribution in #235
- @caelunshun made their first contribution in #236
Full Changelog: v0.9.0...v0.10.0
v0.9.0
Breaking Changes
- Add broadcast functions, reductions on any axis, and selecting subtensors (#137, #114, #139) by @coreylowman in #138
- Added normalize axis and removed normalize by @vikigenius in #140
- #67
Optimizer::update
now returnsResult<(), UnusedParamsError>
by @coreylowman in #107
New features
- #34 Add Transformers!!! by @jafioti in #120
- #1 Add Conv2d by @coreylowman in #124
- #55 Added reshape function by #90 #129 #120
- #133 Adding FlattenImage layer that uses reshape by @coreylowman in #133
- #142 Adding Module::forward_mut by @coreylowman in #148
- #80 Adding nn::Softmax by @coreylowman in #81
- #79 Adding smooth_l1_loss and huber_loss by @coreylowman in #82
- #131 matmul now supports batched & broadcasted inputs by @coreylowman in #132
- add macOS MKL support by @yerke in #73
- Adding maximum function by @coreylowman in #143
- Adding min_axis function by @coreylowman in #144
Additional changes
- Simplifying implementation of BCE loss using binary_map by @coreylowman in #75
- Miscellaneous updates by @coreylowman in #76
- Added custom model example by @jafioti in #83
- add Debug and Display support for
NpzError
by @XBagon in #85 - Added nightly feature by @jafioti in #89
- Added 2d broadcast_first functions and 3d linear forward by @jafioti in #94
- #55 reshape, and #87 additional work on nightly feature by @coreylowman in #90
- #69 adding map_df_uses_fx by @coreylowman in #105
- Fixed a misleading docstring. by @M1ngXU in #109
- Fix Issue #110 Fix (Dropout (test) for non-positive values) by @M1ngXU in #113
- Issue #96 by @M1ngXU in #118
New Contributors
- @jafioti made their first contribution in #83
- @XBagon made their first contribution in #85
- @yerke made their first contribution in #73
- @M1ngXU made their first contribution in #109
- @vikigenius made their first contribution in #140
Full Changelog: v0.8.0...v0.9.0