Pulse · NVIDIA/Megatron-LM

January 14, 2025 – January 21, 2025

Overview

2 Active pull requests

3 Active issues
- 0 Merged pull requests
- 2 Open pull requests
- 3 Closed issues
- 0 New issues

2 Pull requests opened by 2 people

KV-cache for T5 model
#1358 opened Jan 17, 2025
fix: name the token indices properly for training with tokens padding
#1360 opened Jan 18, 2025

3 Issues closed by 3 people

[QUESTION] why is pre_mlp_layernorm an IdentityOp if num_experts is None
#1362 closed Jan 21, 2025
[BUG] state[p]['master_weight'] become bf16
#1359 closed Jan 18, 2025
[QUESTION] train expert-model-parallel-size=4 with error
#1357 closed Jan 16, 2025

12 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[BUG] Zarr checkpoint loses distributed optimizer states due to lack of synchronizers on ranks that create arrays
#1053 commented on Jan 14, 2025 • 0 new comments
[QUESTION] The dataset cannot be found in multi-node multi-GPU training.
#1355 commented on Jan 15, 2025 • 0 new comments
[ENHANCEMENT]Is Megatron planning to use flux technology？Integrating communication and gemm into one operator to improve overlap rate
#1136 commented on Jan 16, 2025 • 0 new comments
[QUESTION] Typo in MoE README
#1346 commented on Jan 17, 2025 • 0 new comments
[BUG] MoE load balancing loss is accumulated twice when using activation checkpointing
#1330 commented on Jan 17, 2025 • 0 new comments
[BUG] Encountering NaN gradients when using CUDA Graph
#1279 commented on Jan 17, 2025 • 0 new comments
[ENHANCEMENT]How, or rather, is there any support provided for MOE models of Qwen2MoeForCausalLM in the transformers library?
#856 commented on Jan 17, 2025 • 0 new comments
[QUESTION] found NaN in local grad norm in backward pass before data-parallel communication collective
#780 commented on Jan 19, 2025 • 0 new comments
[QUESTION] deepseek v2 compatility?
#1295 commented on Jan 20, 2025 • 0 new comments
Enabling LR scaling for a specific layer (ex. down-projection...) during pretraining
#1262 commented on Jan 21, 2025 • 0 new comments
Fix: Resolve multimodal model errors and update README usage instructions
#1286 commented on Jan 14, 2025 • 0 new comments
[Update] Print training log in rank0
#1296 commented on Jan 16, 2025 • 0 new comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

January 14, 2025 – January 21, 2025

Overview

Could not load contribution data

2 Pull requests opened by 2 people

3 Issues closed by 3 people

12 Unresolved conversations

Insights: NVIDIA/Megatron-LM

January 14, 2025 – January 21, 2025

Overview

Could not load contribution data

2 Pull requests opened by 2 people

3 Issues closed by 3 people

12 Unresolved conversations