Skip to content

Commit

Permalink
Correct ppo_epochs usage (huggingface#1480)
Browse files Browse the repository at this point in the history
* Correct ppo_epochs usage

The usage of ppo_epochs is incorrect here. 

In https://github.com/huggingface/trl/blob/8534f0edf8608ad6bcbea9beefae380fa60ded77/trl/trainer/ppo_config.py#L104C8-L104C58

the ppo_epochs was described as "Number of optimisation epochs per batch of samples". 

However, here it is used as the usual epoch number, in which you do one iteration over the training dataset.

* Update ppo_trainer.mdx

* Update docs/source/ppo_trainer.mdx

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
  • Loading branch information
muhammed-shihebi and kashif authored Apr 2, 2024
1 parent c674c66 commit ab0d11d
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion docs/source/ppo_trainer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,10 @@ We can then loop over all examples in the dataset and generate a response for ea

```py
from tqdm import tqdm
for epoch in tqdm(range(ppo_trainer.config.ppo_epochs), "epoch: "):


epochs = 10
for epoch in tqdm(range(epochs), "epoch: "):
for batch in tqdm(ppo_trainer.dataloader):
query_tensors = batch["input_ids"]

Expand Down

0 comments on commit ab0d11d

Please sign in to comment.