Tags · Aliang-CN/DeepSpeed

v0.15.1

Handle an edge case where `CUDA_HOME` is not defined on ROCm systems (m…

…icrosoft#6488)

* Handles an edge case when building `gds` where `CUDA_HOME` is not
defined on ROCm systems

Sep 4, 2024
10ba3dd
zip
tar.gz

v0.15.0

Fix torch check (microsoft#6402)

Aug 22, 2024
55b4cae
zip
tar.gz

v0.14.5

Allow accelerator to instantiate the device (microsoft#5255)

when instantiating torch.device for HPU it cannot be fed with HPU:1
annotation, but only "HPU".
moving the logic to accelerator will allow to solve this issue, with
single line change.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Joe Mayer <114769929+jomayeri@users.noreply.github.com>

Aug 15, 2024
eb07d41
zip
tar.gz

v0.14.4

[XPU] support op builder from intel_extension_for_pytorch kernel path (…

…microsoft#5425)

#Motivation
From our next release, xpu DeepSpeed related kernels would be put into
intel_extension_for_pytorch. This PR is to add new op builders and use
kernel path from intel_extension_for_pytorch. More ops like MOE and WOQ
will be added.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Jun 20, 2024
d254d75
zip
tar.gz

v0.14.3

Monitor was always enabled causing performance degradation (microsoft…

…#5633)

The Boolean expression for the monitor to be enabled was incorrect, as
instead of using the `enabled` field, it used the comet configuration
object, making the expression always True.

This caused performance degradation (we've observed ~10% drop) as it
erroneously invoked the events logging flow along with the expensive
calculation of `loss.mean().item()`.

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

Jun 12, 2024
54f98fd
zip
tar.gz

v0.14.2

Update PyTest torch version to match PyTorch latest official (2.3.0) (m…

…icrosoft#5454)

Apr 23, 2024
5f631ab
zip
tar.gz

v0.14.1

Fix the FP6 kernels compilation problem on non-Ampere GPUs. (microsof…

…t#5333)

Refine the guards of FP6 kernel compilation. Fix the `undefined symbol`
problem of FP6 kernels on non-Ampere architectures.

Related issue: microsoft/DeepSpeed-MII#443.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

Apr 15, 2024
e3d873a
zip
tar.gz

v0.14.0

Update version.txt

Mar 8, 2024
ce78a63
zip
tar.gz

v0.13.5

fix fused_qkv model accuracy issue (microsoft#5217)

Fused_qkv model can not correctly choose the fused_qkv type. Need to
update the module_name_matches.

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Mar 5, 2024
bc0d246
zip
tar.gz

v0.13.4

Add script to check for `--extra-index-url` (microsoft#5184)

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

Feb 26, 2024
5115df3
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.15.1

v0.15.0

v0.14.5

v0.14.4

v0.14.3

v0.14.2

v0.14.1

v0.14.0

v0.13.5

v0.13.4

Tags: Aliang-CN/DeepSpeed