Skip to content

Commit

Permalink
feat: add training example to highlights
Browse files Browse the repository at this point in the history
  • Loading branch information
leandro committed Mar 30, 2020
1 parent df428af commit 2101f54
Show file tree
Hide file tree
Showing 9 changed files with 5 additions and 74 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ With `trl` you can train transformer language models with Proximal Policy Optimi
**Highlights:**
- GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in Reinforcement Learning.
- PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.
- Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier.

## How it works
Fine-tuning a language model via PPO consists of roughly three steps:
Expand Down
2 changes: 1 addition & 1 deletion docs/04-gpt2-sentiment-ppo-training.html
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ <h3 id="Import-dependencies">Import dependencies<a class="anchor-link" href="#Im
<div class="output_area">

<div class="output_subarea output_stream output_stderr output_text">
<pre>/Users/leandro/git/lm_ppo/env/lib/python3.7/site-packages/tqdm/std.py:658: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
<pre>/Users/leandro/git/trl/env/lib/python3.7/site-packages/tqdm/std.py:658: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
from pandas import Panel
</pre>
</div>
Expand Down
1 change: 1 addition & 0 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ <h2 id="What-is-it?">What is it?<a class="anchor-link" href="#What-is-it?"> </a>
<ul>
<li>GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in Reinforcement Learning.</li>
<li>PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.</li>
<li>Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier.</li>
</ul>

</div>
Expand Down
12 changes: 0 additions & 12 deletions nbs/00-core.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -200,18 +200,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
Expand Down
12 changes: 0 additions & 12 deletions nbs/01-gpt2-with-value-head.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -467,18 +467,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
Expand Down
12 changes: 0 additions & 12 deletions nbs/02-ppo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -396,18 +396,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
Expand Down
12 changes: 0 additions & 12 deletions nbs/03-bert-imdb-training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -490,18 +490,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
Expand Down
12 changes: 0 additions & 12 deletions nbs/04-gpt2-sentiment-ppo-training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -968,18 +968,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
Expand Down
15 changes: 2 additions & 13 deletions nbs/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@
"\n",
"**Highlights:**\n",
"- GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in Reinforcement Learning.\n",
"- PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model."
"- PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.\n",
"- Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier."
]
},
{
Expand Down Expand Up @@ -183,18 +184,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 2101f54

Please sign in to comment.