Skip to content

support llama,qwen,deepseek等结构的强化训练代码,包括ppo,dpo等

Notifications You must be signed in to change notification settings

sunxiaowu/transpeeder_rlhf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

transpeeder_rlhf

support llama,qwen,deepseek等结构的强化训练代码,包括ppo,dpo(完善中)等,支持pipeline模式的长窗口训练 在vs code debug中 找到"name": "qwen1_5_0.5b_dpo_debug" 做debug调试

About

support llama,qwen,deepseek等结构的强化训练代码,包括ppo,dpo等

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published