Correct ppo_epochs usage (huggingface#1480)

* Correct ppo_epochs usage The usage of ppo_epochs is incorrect here. In https://github.com/huggingface/trl/blob/8534f0edf8608ad6bcbea9beefae380fa60ded77/trl/trainer/ppo_config.py#L104C8-L104C58 the ppo_epochs was described as "Number of optimisation epochs per batch of samples". However, here it is used as the usual epoch number, in which you do one iteration over the training dataset. * Update ppo_trainer.mdx * Update docs/source/ppo_trainer.mdx --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
alexvishnevskiy · Apr 2, 2024 · ab0d11d · ab0d11d
1 parent c674c66
commit ab0d11d
Showing 1 changed file with 4 additions and 1 deletion.
diff --git a/docs/source/ppo_trainer.mdx b/docs/source/ppo_trainer.mdx
@@ -130,7 +130,10 @@ We can then loop over all examples in the dataset and generate a response for ea
 
 ```py
 from tqdm import tqdm
-for epoch in tqdm(range(ppo_trainer.config.ppo_epochs), "epoch: "):
+
+
+epochs = 10
+for epoch in tqdm(range(epochs), "epoch: "):
     for batch in tqdm(ppo_trainer.dataloader): 
         query_tensors = batch["input_ids"]