support llama,qwen,deepseek等结构的强化训练代码,包括ppo,dpo(完善中)等,支持pipeline模式的长窗口训练 在vs code debug中 找到"name": "qwen1_5_0.5b_dpo_debug" 做debug调试
-
Notifications
You must be signed in to change notification settings - Fork 1
sunxiaowu/transpeeder_rlhf
About
support llama,qwen,deepseek等结构的强化训练代码,包括ppo,dpo等
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published