Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Correct ppo_epochs usage (huggingface#1480)
* Correct ppo_epochs usage The usage of ppo_epochs is incorrect here. In https://github.com/huggingface/trl/blob/8534f0edf8608ad6bcbea9beefae380fa60ded77/trl/trainer/ppo_config.py#L104C8-L104C58 the ppo_epochs was described as "Number of optimisation epochs per batch of samples". However, here it is used as the usual epoch number, in which you do one iteration over the training dataset. * Update ppo_trainer.mdx * Update docs/source/ppo_trainer.mdx --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
- Loading branch information