Releases · coreylowman/dfdx

@AndrejOrsula

What's Changed

Make storage_traits::TensorToArray pub by @AndrejOrsula in #817
accurate-gelu by @jcrist1 in #813
Fixing examples/04-gradients.rs by @coreylowman in #824
Moving optim kernels to tensor ops by @coreylowman in #828
[Breaking] Adds AMP<F> dtype by @coreylowman in #811
Adding documentation to dtypes module and amp by @coreylowman in #834

New Contributors

@AndrejOrsula made their first contribution in #817
@jcrist1 made their first contribution in #813

Full Changelog: v0.12.1...v0.13.0

@nkoppel

What's Changed

Allow models to be backward compatible through #799 by @nkoppel in #808
Various small fixes by @nkoppel in #814
Making all shape traits public by @coreylowman in #816

Full Changelog: v0.12.0...v0.12.1

@coreylowman

Breaking changes

[Breaking] Adding Tensor::try_realize, and Tensor::realize no longer returns Result by @coreylowman in #758
[Breaking] ReshapeTo::reshape_like and ReshapeTo::try_reshape_like now panic instead of returning option by @coreylowman in #766
[Breaking] Adding dilation/groups to Conv2D. Adding dilation to Pool2D by @coreylowman in #767
[Breaking] Use gemm for matmul. Removes support for matrixmultiply & MKL by @coreylowman in #776
[Breaking] Moving storage GAT to trait level generic. Split DeviceStorage into multiple traits by @coreylowman in #782
[Breaking] Adding dilation/groups to ConvTranspose2D by @coreylowman in #783

What's Changed

Adding f16 as Dtype by @coreylowman in #696
Adding example by @sirandreww in #740
Adds TryConcatAlong to support Concat along any axis by @coreylowman in #750
Changed CUDA_ARCH in compatibility.cuh by @jafioti in #752
Allow broadcast_like to accept tensors OR shapes by @VasanthakumarV in #751
Removing rerun build.rs for output destination by @coreylowman in #754
Fixing compatibility for compute cap 70-75 by @coreylowman in #757
Adds TriangleTensor and CmpKernel traits to Device bound by @coreylowman in #760
Using Bernoulli distribution in dropout - makes dropout reproducible across dtypes by @coreylowman in #761
Fixes bug with f16 mean where number of elements reduced was f16::INF by @coreylowman in #763
Placeholder f16 gemm speedups by @coreylowman in #765
MultiHeadAttention 3d impl now broadcasts to 4d instead of duplicating logic by @coreylowman in #768
Moving cudarc?/f16 behind f16 feature by @coreylowman in #774
impl Clone for Adam, SGD, RMSprop by @coreylowman in #775
Properly setting read_dst for gemm in forward/backward pass by @coreylowman in #777
Adds rayon dependency. Using gemm::Parallelism::Rayon(rayon::current_num_threads()) by @coreylowman in #778
Add LogSoftmax by @kurnevsky in #769
Moving some tests off nightly. Adding docs to conv2d op by @coreylowman in #779
Adding better error messages if nvidia-smi/nvcc are not found by @coreylowman in #784
Using for loop with gridDim.x * blockDim.x as increment by @coreylowman in #787
Removing __hmax and __hmin compat functions by @coreylowman in #788
Uses grid striding in fill_with by @coreylowman in #790
Exposed NumpyDType publicly by @jafioti in #791
Fixing weight shape for grouped Conv2D by @coreylowman in #797
Bump half/cudarc versions by @coreylowman in #805
Using Groups in conv weight init by @coreylowman in #806
Add scalar support to TensorCollection by @nkoppel in #799

New Contributors

@sirandreww made their first contribution in #740
@kurnevsky made their first contribution in #769

Full Changelog: v0.11.2...v0.12.0

@coreylowman

What's Changed

Simplify upscale cuda kernels by @coreylowman in #680
JIT compiling stack/concat cuda kernels by @coreylowman in #684
Initial merging of nvidia-smi and nvcc checks by @quietlychris in #685
feat: use Cow when appropriate by @Alexandcoats in #688
Add const generic NUM_THREADS arg to launch_cfg by @VasanthakumarV in #691
feat: add Tensorlike to clean up spooky ghosts by @Alexandcoats in #689
Add contiguous and try_contiguous methods by @VasanthakumarV in #690
(feat) add device access method by @ccaven in #692
Add examples of runtime dimensions to examples/02-ops.rs by @VasanthakumarV in #698
Prevent over-allocation for broadcasted outputs of sum_to by @nkoppel in #699
Adds caching layer to tensor allocations by @coreylowman in #670
Handle \r in build.rs by @ViliamVadocz in #702
Disabling cache by default & adds enable_cache() by @coreylowman in #704
Typos in feature_flags.rs by @mauvray in #710
Adds mat * vec impl for matmul by @coreylowman in #716
Adds better assertion macros for testing by @coreylowman in #714
Combining multiple github workflows for reuse by @coreylowman in #717
Changing nn ToDtype to use generic on method by @coreylowman in #719
Flatten2D now accepts generic batch dim by @coreylowman in #720
Uses impl Into<E> for scalar binary ops when possible by @coreylowman in #722
Adds cudnn section to feature flags by @coreylowman in #723
Impls for (T,) by @opfromthestart in #725
Fixing dependencies for no-std by @coreylowman in #736
Adds rust 1.65 as the minimum rust compiler version by @coreylowman in #737
Moves scalar comparison to use the same method as tensor comparison. Deprecates try_scalar_*/scalar_*. by @coreylowman in #738
Run CI for all kind of pushes - not only pull request related ones by @YannickFricke in #739
Adds Tensor::to_device to support sending tensors of any shape to any device by @coreylowman in #741

New Contributors

@VasanthakumarV made their first contribution in #691
@ccaven made their first contribution in #692
@mauvray made their first contribution in #710
@YannickFricke made their first contribution in #739

Full Changelog: v0.11.1...v0.11.2

@nkoppel

What's Changed

Fix bug in gather cuda kernel by @nkoppel in #588
feat(device): introduce AutoDevice type by @kakoc in #579
Use Recursive Macros to Implement Shape Operation Traits. by @nkoppel in #583
Add ToDtype tensor operation by @nkoppel in #582
Using 128 threads by default for cuda kernels by @coreylowman in #599
Add Slice tensor operation. by @nkoppel in #602
Optimizing conv kernels a bit by @coreylowman in #605
feat: add upper/lower triangles (tril and triu) allocations by @Alexandcoats in #568
Adds Tensor::roll by @coreylowman in #608
Using multiple streams for matmul with cuda by @coreylowman in #610
Fix no-std support by @Alexandcoats in #615
Adds matrixmultiply/std to std feature by @kstavro in #618
Implement concat for usize arrays; add concat to Device. by @nkoppel in #621
Allow conv2d and pool2d to use dynamic dimensions for width and height. by @nkoppel in #620
Switch to using nvcc --list-gpu-code for build.rs compute_cap by @quietlychris in #619
Fix bug in reshape on cuda by @nkoppel in #622
Don't always do try_min in pool_global.rs by @nkoppel in #623
Revert "Switch to using nvcc --list-gpu-code for build.rs compute_cap… by @coreylowman in #624
Adds restrided in favor of get_unstrided_index -> get_strided_index by @coreylowman in #628
Combines multiple calls to get_strided_index into a single loop by @coreylowman in #629
Reducing number of buffers sent to cuda for some operations by @coreylowman in #611
Optimizing conv2d more by @coreylowman in #631
Add ability to include smaller last batch by @nkoppel in #632
Upscale2D and ConvTrans2d by @opfromthestart in #603
impl Dtype for all Unit types except bool by @coreylowman in #635
Allow convtrans2d to use dynamic dimensions by @nkoppel in #639
JIT compiling kernel for to_dtype & reshape by @coreylowman in #634
Optimize conv transpose kernels to do same thing as conv by @coreylowman in #641
Reworking crate level documentation by @coreylowman in #644
Adds synchronize to DeviceStorage by @coreylowman in #645
adding usize dtype to cuda_kernel by @zojeda in #648
Add PReLU and LeakyReLU by @opfromthestart in #586
Moving logsumexp normalization off of graph by @coreylowman in #652
Adding CmpKernels to Device, more documentation by @coreylowman in #653
Removing bounds checking from cpu conv kernel folding by @coreylowman in #650
Allow upscale2d to use dynamic dimensions by @nkoppel in #654
Adding integration test for resnet18 by @coreylowman in #655
Removing some un-necessary blanket impls by @coreylowman in #656
Fixes conv transpose stride bug, adds more docs to upscale2d by @coreylowman in #658
Some QOL fixes by @opfromthestart in #659
Optimizing softmax & log_softmax by @coreylowman in #660
Reuse f(x) for unary operations when possible. by @coreylowman in #661
Allocating gradients in backward op by @coreylowman in #663
Adds Tensor::recip (1 / x) by @coreylowman in #665
Reshape layer by @opfromthestart in #666
Re-using tensor storage when possible by @coreylowman in #664
Adds cudnn feature flag. Removes "test-cuda" feature flag. Using cuDNN for convolutions by @coreylowman in #651
Always attempting allocation reuse during inference by @coreylowman in #673
Clarify reshape behavior in docs by @coreylowman in #674
Have SplitInto keep tapes of each head seperate by @nkoppel in #676
Using arch option in nvrtc by @coreylowman in #675

New Contributors

@kakoc made their first contribution in #579
@quietlychris made their first contribution in #619
@opfromthestart made their first contribution in #603
@zojeda made their first contribution in #648

Full Changelog: v0.11.0...v0.11.1

@Dimev

What's Changed

AddInto by @Dimev in #256
added 5d & 6d tensors by @M1ngXU in #283
Remove phantom by @M1ngXU in #282
remove tensor bound by @Dimev in #297
Adding nightly to cargo-test by @JYudelson1 in #294
Devices/Dyn dimensions refactor by @coreylowman in #304
Add instructions for running the mnist example. by @infalmo in #310
Removes Dyn. Use usize directly by @coreylowman in #315
Making f32 default dtype for Tensor, updating examples/docstrings by @coreylowman in #316
Only running gha on push by @coreylowman in #317
Adding Unit and HasUnitType. Reducing bounds for Dtype by @coreylowman in #313
Removing build_test_device. Using TestDevice everywhere by @coreylowman in #324
Adding SampleTensor, Removing RandTensor/RandnTensor by @coreylowman in #327
Removing usages of tensor aliases by @coreylowman in #328
Moving intel-mkl stuff into sub module in build.rs by @coreylowman in #329
Adding Cuda device and skeleton cuda kernel impls by @coreylowman in #322
Implementing abs/exp/div/sum_to cuda kernels by @coreylowman in #331
permute_to and broadcast_to cuda kernels by @coreylowman in #343
Add cuda implementations for unary and binary tensor operations in #341 and #334 by @nkoppel in #346
Using atomicAdd in binary op backwards to properly handle strides by @coreylowman in #350
Resolve #352 and #347 by @nkoppel in #354
Implement reshape cuda kernel (resolves #336) by @nkoppel in #356
Add missing device generic in transformer test by @ViliamVadocz in #358
Add select and gather cuda kernels. by @nkoppel in #359
Upgrade to cudarc 0.6.0 by @coreylowman in #361
Add tests for binary broadcasted add and fix bugs to allow them to pass. by @nkoppel in #357
run GHA on pull_request by @coreylowman in #364
matmul cuda kernels by @coreylowman in #342
Adding dynamic example. by @Narsil in #368
Add cuda kernels for min_to/max_to by @coreylowman in #370
Adding dropout cuda kernel by @coreylowman in #372
Adding ConstDim and ConstShape for tensor creation by @coreylowman in #373
Fixing computation of lda/ldb/ldc with cblas by @coreylowman in #375
Modify sum_to cuda kernel to not need atomic adds in backwards by @nkoppel in #367
Simplifying trait Conv2DKernel and Cpu implementation by @coreylowman in #376
(#344) Implement cuda kernels for optimizers by @nkoppel in #378
Fix max_to and min_to edge case with negative zero by @ViliamVadocz in #380
Add cuda kernels for conv2d by @coreylowman in #369
Rework pool2d internals & add pool2d cuda kernels by @coreylowman in #384
Implement Shape for arrays (#377) by @nkoppel in #385
Efficient cuda kernels for reductions by @nkoppel in #382
Improving compilation times of deeply nested const generic modules by @coreylowman in #391
Fixing remainder of cuda tests & fixing cblas/cublas matmul with strides [1,1] by @coreylowman in #393
Adding Cuda device usage to mnist example by @coreylowman in #396
Adding GeLU operator (used in Gpt2) by @Narsil in #397
Removing codecov from workflows/readme by @coreylowman in #403
Reorganize tensor_ops, and add cuda_utils.cuh by @nkoppel in #398
Some small optimizations for conv2d on cpu by @coreylowman in #404
Removing Device generic from Gradients & optimizers by @coreylowman in #402
Add ToDevice and OnDevice to simplify nn api (#388) by @nkoppel in #394
Removes ModuleBuilder, Adds BuildModule & BuildOnDevice by @coreylowman in #405
Enable multi-core matmul by @infalmo in #417
Fix GELU CUDA kernel compilation by @ViliamVadocz in #409
Adding nn.Embedding layer. by @Narsil in #406
Removing defaults for Tensor Dtype & Device generic parameters by @coreylowman in #418
Removing Default for optimizers & adding &M to constructors by @coreylowman in #422
Adding runtime assertion in try_binary_op that shapes are equal by @coreylowman in #428
Add boolean operations and choose. by @nkoppel in #415
Add TensorFrom trait to create tensors from both vectors and arrays. by @nkoppel in #414
Adding nn builder structs, dtype generics, and remove device defaults. by @coreylowman in #433
Upgrade to cudarc==0.7.0 and use alloc_async instead of alloc_zeros_async by @coreylowman in #440
Add comparison tensor operations by @ViliamVadocz in #386
Add synchronize method to Cuda device by @ViliamVadocz in #442
f64 kernels by @coreylowman in #421
Add stack tensors method by @coreylowman in #449
cargo check cuda & run f64 tests in CI by @coreylowman in #447
Fix bug in #451 by @nkoppel in #453
Add more runtime shape checks by @coreylowman in #454
Adding ReshapeTo::reshape_like by @coreylowman in #456
Adding SampleTensor::sample_uniform_like and SampleTensor::sample_normal_like by @coreylowman in #457
Improve examples (add Cuda) by @TimerErTim in #452
Dataset iterators - adds batching, collating for iterators by @coreylowman in #462
Fixing issue with to_device and broadcasted tensors by @coreylowman in #465
Bump cudarc 0.7.2 by @coreylowman in #466
Adding index out of bounds checks to select/gather kernels by @coreylowman in #467
Rename to add_dim. by @infalmo in #471
impl BuildModule for ZeroSizedModule by @coreylowman in #470
Adds TensorCollection by @coreylowman in #469
Fixing cargo doc warnings by @coreylowman in #473
Using --gpu-architecture native with nvcc by @coreylowman in #474
using TensorFromVec for OneHotEncode and Arange by @coreylowman in #477
Small batchnorm optimizations by @coreylowman in #478
nvcc: fixed type bug by @M1ngXU in #480
Adds fast_alloc feature and binary kernel optimizations by @coreylowman in #481
Adding some "benchmarking" scripts by @coreylowman in #483
Add try_forward and try_forward_mut to Module and ModuleMut. by @nkoppel in #482
Optimizing cpu kernels for reductions by @coreylowman in #484
Using alloc_zeros_async and memset_zeros for cuda by @coreylowman in #489
Making Conv2D unbiased by default, and adding Bias2D module by @coreylowman in #494
Using image/filter stride in cuda kernel for conv by @coreylowman in #495
bump cudarc version by @coreylowman in #498
Adding attention_reshape (inference only) kernels. by @Narsil in #497
Adding lifetime to gat in Exact...

@coreylowman

What's Changed

Breaking Changes

Binary ops (add, sub, div, mul, maximum, minimum) take ownership of rhs by @coreylowman in #268
backwards only allows 0d tensors now by @coreylowman in #206
Clone now keeps same id, removing Tensor::duplicate by @coreylowman in #249
Multi axis reductions
- See docs
- #189, #190, #194
- Reduction functions now can reduce across any axis/axes: mean, sum, max, min, stddev, var, softmax, log_softmax, and logsumexp
- Remove -1 from valid axes, add trait HasLastAxis to use in generic functions instead
- Adding normalize function that normalizes across any axis
- Removing single axis reduction functions fn *_axis(): mean_axis, sum_axis, max_axis, min_axis, normalize_axis, std_axis, var_axis
- Rename HasAxis to HasAxes
- Add trait BroadcastTo
  - Remove trait Broadcast1, trait Broadcast2, trait Broadcast3, trait Broadcast4
- Add trait Reduce/trait ReduceTo
  - Remove trait Reduce1
Batched select & select consistency
- See docs
- Renaming SelectTo, using SelectTo for batched select by @coreylowman in #217
- Add Batched Select for devices and tensor_ops by @coreylowman in #182
Reduce things in prelude by @coreylowman in #209
Renaming FlattenImage to Flatten2D by @coreylowman in #243

New features

Arc in Tensors instead of Rc by @caelunshun in #236
powi() and powf() functions by @coreylowman in #167
no_std support
- See feature flags docs
- Remove num-traits, no default features on depends by @coreylowman in #200
- Adding intel-mkl feature and removing the 4 mkl-- features by @coreylowman in #239
- Adding module that has docs for feature flags by @coreylowman in #240
- Adding "numpy" feature to make numpy & npz optional by @coreylowman in #241
- Adding #![no_std] support via no_std_compat by @coreylowman in #244
- Adding default-features = false to dependencies by @coreylowman in #257
Adding Axis permutations via trait PermuteTo.
- See docs
- #208, #169, #193
Adding trait ModuleMut
- See docs
- #225
- Removing Module super traits by @coreylowman in #223
- Rework Dropout/DropoutOneIn to use ModuleMut by @coreylowman in #226
Adding decoupled/l2 weight decay in optimizers:
- See docs
- add HasArrayData to GradientProvider by @cBournhonesque in #261
- Add weight decay to SGD by @cBournhonesque in #258
- Adding weight_decay to Adam by @coreylowman in #275
- Adding weight decay to RMSprop by @coreylowman in #276
Adding nn::Transformer #175, #173, #180
- See docs
Adding nn::MinPool2D, nn::MaxPool2D, nn::AvgPool2D by @coreylowman in #214
- See docs
Adding nn::MinPoolGlobal, nn::MaxPoolGlobal, nn::AvgPoolGlobal by @coreylowman in #216
- See docs
Adding nn::BatchNorm2D by @coreylowman in #228
- See docs

Misc changes

Add tensor() function as a convenient way to make tensors from arrays by @coreylowman in #161
- See docs
Remove allocation in dropout implementation by @coreylowman in #164
Removing Tensor::OwnedTape by @coreylowman in #197
Revamping examples/ by @coreylowman in #205
Conv cleanup
- Moving conv into device and cleaning up a bit by @coreylowman in #212
- Minifying conv impls by @coreylowman in #213
- Changing conv2d and conv2d_batched to methods of tensors by @coreylowman in #221
- Replacing conv2d implementation with matmuls by @coreylowman in #237
Fix typos by @cBournhonesque in #235
Combining multiple where clauses with const generics into a single one by @coreylowman in #264
Checking for null ptr in AllocateZeros by @coreylowman in #271
Reducing allocations in map_df_uses_fx by @coreylowman in #272
Adding with_empty_tape and with_diff_tape by @coreylowman in #274

New Contributors

@cBournhonesque made their first contribution in #235
@caelunshun made their first contribution in #236

Full Changelog: v0.9.0...v0.10.0

@coreylowman

Breaking Changes

Add broadcast functions, reductions on any axis, and selecting subtensors (#137, #114, #139) by @coreylowman in #138
Added normalize axis and removed normalize by @vikigenius in #140
#67 Optimizer::update now returns Result<(), UnusedParamsError> by @coreylowman in #107

New features

#34 Add Transformers!!! by @jafioti in #120
#1 Add Conv2d by @coreylowman in #124
#55 Added reshape function by #90 #129 #120
#133 Adding FlattenImage layer that uses reshape by @coreylowman in #133
#142 Adding Module::forward_mut by @coreylowman in #148
#80 Adding nn::Softmax by @coreylowman in #81
#79 Adding smooth_l1_loss and huber_loss by @coreylowman in #82
#131 matmul now supports batched & broadcasted inputs by @coreylowman in #132
add macOS MKL support by @yerke in #73
Adding maximum function by @coreylowman in #143
Adding min_axis function by @coreylowman in #144

Additional changes

Simplifying implementation of BCE loss using binary_map by @coreylowman in #75
Miscellaneous updates by @coreylowman in #76
Added custom model example by @jafioti in #83
add Debug and Display support for NpzError by @XBagon in #85
Added nightly feature by @jafioti in #89
Added 2d broadcast_first functions and 3d linear forward by @jafioti in #94
#55 reshape, and #87 additional work on nightly feature by @coreylowman in #90
#69 adding map_df_uses_fx by @coreylowman in #105
Fixed a misleading docstring. by @M1ngXU in #109
Fix Issue #110 Fix (Dropout (test) for non-positive values) by @M1ngXU in #113
Issue #96 by @M1ngXU in #118

New Contributors

@jafioti made their first contribution in #83
@XBagon made their first contribution in #85
@yerke made their first contribution in #73
@M1ngXU made their first contribution in #109
@vikigenius made their first contribution in #140

Full Changelog: v0.8.0...v0.9.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Breaking changes

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Breaking Changes

New features

Misc changes

New Contributors

Contributors

Breaking Changes

New features

Additional changes

New Contributors

Contributors

Releases: coreylowman/dfdx

v0.13.0 - `dtypes` module & adds `AMP<F>` dtype

What's Changed

New Contributors

Contributors

v0.12.1 - Re-export f16 dtype & making more apis public

What's Changed

Contributors

v0.12.0 - Adds f16 dtype

Breaking changes

What's Changed

New Contributors

Contributors

v0.11.2 - Tensor caching & other nice features

What's Changed

New Contributors

Contributors

v0.11.1 - cudnn, optimizations, and new ops/nn layers

What's Changed

New Contributors

Contributors

v0.11.0 - Cuda support, mixed const/runtime tensors, and device rewrite

What's Changed

Contributors

v0.10.0

What's Changed

Breaking Changes

New features

Misc changes

New Contributors

Contributors

v0.9.0

Breaking Changes

New features

Additional changes

New Contributors

Contributors