-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Insights: microsoft/DeepSpeed
Overview
Could not load contribution data
Please try again later
3 Pull requests merged by 2 people
-
Fix error caused by all_reduce call in domino
#6880 merged
Dec 26, 2024 -
hpu_accelerator: use torch.use_deterministic_algorithms
#6897 merged
Dec 20, 2024 -
Zero2: avoid graph breaks in torch.compile by using param_idx
#6803 merged
Dec 20, 2024
4 Pull requests opened by 4 people
-
Tecorigin sdaa accelerator
#6903 opened
Dec 23, 2024 -
Update Gaudi2 jobs to latest 1.19 build
#6905 opened
Dec 23, 2024 -
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm
#6909 opened
Dec 24, 2024 -
Add fp8_gemm fallback for non-triton systems
#6916 opened
Dec 26, 2024
6 Issues closed by 2 people
-
[QUESTIONS]:Some questions about running Domino
#6851 closed
Dec 26, 2024 -
[BUG] offload optmizer states in zero3
#6833 closed
Dec 20, 2024 -
Unable to Install DeepSpeed on Windows using pip
#6865 closed
Dec 19, 2024 -
nv-nightly CI test failure
#6883 closed
Dec 19, 2024 -
nv-ds-chat CI test failure
#6887 closed
Dec 19, 2024 -
nv-torch-nightly-v100 CI test failure
#6888 closed
Dec 19, 2024
14 Issues opened by 12 people
-
[REQUEST] Deepspeed Inference Supports VL (vision language) model
#6917 opened
Dec 26, 2024 -
初始化问题
#6914 opened
Dec 25, 2024 -
[BUG] Cannot access local variable 'locations' where it is not associated with a value
#6913 opened
Dec 25, 2024 -
[BUG] FAILED: multi_tensor_adam.cuda.o with
#6912 opened
Dec 24, 2024 -
[BUG]Convergence Issue: Training BERT for Embedding with Zero2 and 3 as compared to Torchrun
#6911 opened
Dec 24, 2024 -
[REQUEST] is fp8 training supported?
#6908 opened
Dec 24, 2024 -
nv-ds-chat CI test failure
#6907 opened
Dec 24, 2024 -
nv-torch-nightly-v100 CI test failure
#6904 opened
Dec 23, 2024 -
[BUG] triton kernel, loss 0, grar-norm nan
#6902 opened
Dec 22, 2024 -
[REQUEST] Support for XLA/TPU
#6901 opened
Dec 21, 2024 -
nv-nightly CI test failure
#6900 opened
Dec 20, 2024
18 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Reduce the device bubble introduced by heavy loop synchronization in coalesced fetch/release(z3_leaf_module)
#6694 commented on
Dec 26, 2024 • 4 new comments -
Fix `checkpointable_layers` Logic
#6881 commented on
Dec 26, 2024 • 0 new comments -
Use ds-specific module id to avoid conflicts
#6847 commented on
Dec 26, 2024 • 0 new comments -
[inf] Add config var to enable keeping module on host
#6846 commented on
Dec 24, 2024 • 0 new comments -
Stage3: Use new torch grad accumulation hooks API
#6773 commented on
Dec 26, 2024 • 0 new comments -
Add the missing view operations from sequence parallel(async).
#6750 commented on
Dec 25, 2024 • 0 new comments -
Change compile for pipeline module torch.compile
#6478 commented on
Dec 20, 2024 • 0 new comments -
[Draft][Demo] auto tp training
#5445 commented on
Dec 25, 2024 • 0 new comments -
Question about Ulysses and loss agregation
#6841 commented on
Dec 26, 2024 • 0 new comments -
[BUG] Invalidate trace cache @ step 10: expected module 11, but got module 19
#6870 commented on
Dec 26, 2024 • 0 new comments -
[REQUEST] Some questions about deepspeed sequence parallel
#6708 commented on
Dec 24, 2024 • 0 new comments -
[BUG] inference ops unit tests are failing
#6839 commented on
Dec 24, 2024 • 0 new comments -
DeepSpeed with ZeRO3 strategy cannot build 'fused_adam'
#6892 commented on
Dec 24, 2024 • 0 new comments -
[BUG]
#5241 commented on
Dec 23, 2024 • 0 new comments -
[BUG] Non-Deterministic Model Responses when the Input Prompt Order Changes
#6612 commented on
Dec 20, 2024 • 0 new comments -
AssertionError: no sync context manager is incompatible with gradientpartitioning logic of ZeRo stage 3
#6793 commented on
Dec 20, 2024 • 0 new comments -
DeepSpeed with trl
#6852 commented on
Dec 20, 2024 • 0 new comments -
How to perform inference MoE model with expert parallel
#6891 commented on
Dec 20, 2024 • 0 new comments