-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Insights: NVIDIA/Megatron-LM
Overview
-
- 0 Merged pull requests
- 2 Open pull requests
- 3 Closed issues
- 0 New issues
Could not load contribution data
Please try again later
2 Pull requests opened by 2 people
-
KV-cache for T5 model
#1358 opened
Jan 17, 2025 -
fix: name the token indices properly for training with tokens padding
#1360 opened
Jan 18, 2025
3 Issues closed by 3 people
-
[QUESTION] why is pre_mlp_layernorm an IdentityOp if num_experts is None
#1362 closed
Jan 21, 2025 -
[BUG] state[p]['master_weight'] become bf16
#1359 closed
Jan 18, 2025 -
[QUESTION] train expert-model-parallel-size=4 with error
#1357 closed
Jan 16, 2025
12 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[BUG] Zarr checkpoint loses distributed optimizer states due to lack of synchronizers on ranks that create arrays
#1053 commented on
Jan 14, 2025 • 0 new comments -
[QUESTION] The dataset cannot be found in multi-node multi-GPU training.
#1355 commented on
Jan 15, 2025 • 0 new comments -
[ENHANCEMENT]Is Megatron planning to use flux technology?Integrating communication and gemm into one operator to improve overlap rate
#1136 commented on
Jan 16, 2025 • 0 new comments -
[QUESTION] Typo in MoE README
#1346 commented on
Jan 17, 2025 • 0 new comments -
[BUG] MoE load balancing loss is accumulated twice when using activation checkpointing
#1330 commented on
Jan 17, 2025 • 0 new comments -
[BUG] Encountering NaN gradients when using CUDA Graph
#1279 commented on
Jan 17, 2025 • 0 new comments -
[ENHANCEMENT]How, or rather, is there any support provided for MOE models of Qwen2MoeForCausalLM in the transformers library?
#856 commented on
Jan 17, 2025 • 0 new comments -
[QUESTION] found NaN in local grad norm in backward pass before data-parallel communication collective
#780 commented on
Jan 19, 2025 • 0 new comments -
[QUESTION] deepseek v2 compatility?
#1295 commented on
Jan 20, 2025 • 0 new comments -
Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining
#1262 commented on
Jan 21, 2025 • 0 new comments -
Fix: Resolve multimodal model errors and update README usage instructions
#1286 commented on
Jan 14, 2025 • 0 new comments -
[Update] Print training log in rank0
#1296 commented on
Jan 16, 2025 • 0 new comments