What's Changed
- fix(timeout): larger timeout by @JiaoPL in #495
- feat(doc): add GPU memory info for 7B & 20B models by @li126com in #507
- feat(model): add rope_base interface by @00INDEX in #512
- Feat(QA): Check loss when swapping micro_num and micro_bsz && Check grad norm by @li126com in #510
- Fix(QA): the py name in main is wrong by @li126com in #514
- fix/feat: small fix and enhancement by @SolenoidWGT in #515
- test(workflow): add workflow for loss test and change trigger event by @kkscilife in #513
- fix(ci): fix test model ckpt ci test by @SolenoidWGT in #518
- test(workflow): add unit test case by @kkscilife in #524
- feat(storage): use multipart upload when using oss by @li126com in #520
- Fix (QA checkpoint): fix test_model_checkpoint singleton import by @li126com in #526
- fix(model): add IS_SEQUENCE_PARALLEL check for norm module by @yingtongxiong in #528
- feat(model): add output embedding tf32 option by @JiaoPL in #523
- feat(grad_norm): vocab grad norm profiling by @JiaoPL in #519
- fix(data): fix the unpack for type_ids when use_flash_attn=False by @yingtongxiong in #516
- fix(storage): unify the name of AK and SK by @li126com in #527
- fix(test): fix type_ids unpack bug by @SolenoidWGT in #530
- feat(model): support llama model with checkpoint loading by @li126com in #532
- fix(metric): add metric dtype control by @Pryest in #533
- feat(ckpt): support auto resume in Volc and Ali by @li126com in #529
- fix(sequence_parallel): fix norm all-reduce in seq_parallel when not overlaping by @yingtongxiong in #534
- fix(pp): fix no-packed dataset load micro batch error by @SolenoidWGT in #538
- fix(model): change model_type
LLAMA
toLLAMA2
by @li126com in #539 - fix(moe): fix moe zero mode bug by @blankde in #548
- fix(grad_norm): token grad norm with tp by @JiaoPL in #547
- test(workflow): change into reserved by @kkscilife in #550
- fix(model): add ckpt_type constraint when loading ckpts by @li126com in #542
- feat(logger): add tensorboard key value buffer by @SolenoidWGT in #549
- fix(metrics): remove redundant cuda memory in metric calculations by @SolenoidWGT in #557
- fix(lr_scheduler): fix when resuming lr_scheduler without loading optimizer by @gaoyang07 in #565
Full Changelog: v0.2.1dev20231121...v0.2.1dev20240102