docs: tweak descriptions

shizhediao · Mar 30, 2020 · b5c1df4 · b5c1df4
1 parent 1460633
commit b5c1df4
Show file tree

Hide file tree

Showing 10 changed files with 32 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -1,12 +1,12 @@
-# Welcome to trl
-> Train transformer language models with Reinforcement Learning.
+# Welcome to Transformer Reinforcement Learning (trl)
+> Train transformer language models with reinforcement learning.
 
 
 ## What is it?
-With `trl` you can train transformer language models with Proximal Policy Optimization (PPO). The library is built with the `transformer` library by  🤗Huggingface. Therefore, pre-trained language models can be directly loaded via the transformer interface. At this point only GTP2 is implemented.
+With `trl` you can train transformer language models with Proximal Policy Optimization (PPO). The library is built with the `transformer` library by  🤗 Hugging Face ([link](https://github.com/huggingface/transformers)). Therefore, pre-trained language models can be directly loaded via the transformer interface. At this point only GTP2 is implemented.
 
 **Highlights:**
-- GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in Reinforcement Learning.
+- GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in reinforcement learning.
 - PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.
 - Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier.
 
@@ -108,10 +108,10 @@ This library is built with `nbdev` and as such all the library code as well as e
 - `04-gpt2-sentiment-ppo-training.ipynb`: Fine-tune GPT2 with the BERT sentiment classifier to produce positive movie reviews.
 
 
-## Reference
+## References
 
 ### Proximal Policy Optimisation
 The PPO implementation largely follows the structure introduced in the paper **"Fine-Tuning Language Models from Human Preferences"** by D. Ziegler et al. \[[paper](https://arxiv.org/pdf/1909.08593.pdf), [code](https://github.com/openai/lm-human-preferences)].
 
 ### Language models
-The language models utilize the `transformer` library by 🤗Huggingface.
+The language models utilize the `transformer` library by 🤗Hugging Face.
diff --git a/docs/01-gpt2-with-value-head.html b/docs/01-gpt2-with-value-head.html
@@ -5,8 +5,8 @@
 keywords: fastai
 sidebar: home_sidebar
 
-summary: "A GPT2 model with a value head built on the transformer library by huggingface."
-description: "A GPT2 model with a value head built on the transformer library by huggingface."
+summary: "A GPT2 model with a value head built on the `transformer` library by Hugging Face."
+description: "A GPT2 model with a value head built on the `transformer` library by Hugging Face."
 ---
 <!--
 

diff --git a/docs/02-ppo.html b/docs/02-ppo.html
@@ -29,7 +29,7 @@
 
 <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
 <div class="text_cell_render border-box-sizing rendered_html">
-<p>This follows the language model approach proposed in paper <a href="https://arxiv.org/pdf/1909.08593.pdf">"Fine-Tuning Language Models from Human Preferences"</a> and is similar to the <a href="https://github.com/openai/lm-human-preferences">original implementation</a>. The two main differences are 1) the method is implemented in Pytorch and 2) works with the transformer library by Huggingface.</p>
+<p>This follows the language model approach proposed in paper <a href="https://arxiv.org/pdf/1909.08593.pdf">"Fine-Tuning Language Models from Human Preferences"</a> and is similar to the <a href="https://github.com/openai/lm-human-preferences">original implementation</a>. The two main differences are 1) the method is implemented in Pytorch and 2) works with the <code>transformer</code> library by Hugging Face.</p>
 
 </div>
 </div>
@@ -187,8 +187,8 @@ <h2 id="FixedKLController" class="doc_header"><code>class</code> <code>FixedKLCo
 <span class="sd">        Initialize PPOTrainer.</span>
 <span class="sd">        </span>
 <span class="sd">        Args:</span>
-<span class="sd">            model (torch.model): Huggingface GPT2 model</span>
-<span class="sd">            ref_model (torch.model): Huggingface GPT2 refrence model used for KL penalty</span>
+<span class="sd">            model (torch.model): Hugging Face transformer GPT2 model with value head</span>
+<span class="sd">            ref_model (torch.model): Hugging Face transformer GPT2 refrence model used for KL penalty</span>
 <span class="sd">            ppo_params (dict or None): PPO parameters for training. Can include following keys:</span>
 <span class="sd">                &#39;lr&#39; (float): Adam learning rate, default: 1.41e-5</span>
 <span class="sd">                &#39;batch_size&#39; (int): Number of samples per optimisation step, default: 256</span>

diff --git a/docs/04-gpt2-sentiment-ppo-training.html b/docs/04-gpt2-sentiment-ppo-training.html
@@ -31,7 +31,7 @@
 <div class="text_cell_render border-box-sizing rendered_html">
 <div style="text-align: center">
 {% include image.html max-width="600" file="/trl/images/gpt2_bert_training.png" %}
-<p style="text-align: center;"> <b>Figure:</b> Experiment setup to tune GPT2. The yellow arrows are outside the scope of this notebook, but the trained models are available through Huggingface. </p>
+<p style="text-align: center;"> <b>Figure:</b> Experiment setup to tune GPT2. The yellow arrows are outside the scope of this notebook, but the trained models are available through Hugging Face. </p>
 </div><p>In this notebook we fine-tune GPT2 (small) to generate positive movie reviews based on the IMDB dataset. The model gets 5 tokens from a real review and is tasked to produce positive continuations. To reward positive continuations we use a BERT classifier to analyse the sentiment of the produced sentences and use the classifier's outputs as rewards signals for PPO training.</p>
 
 </div>

diff --git a/docs/index.html b/docs/index.html
@@ -1,12 +1,12 @@
 ---
 
-title: Welcome to trl
+title: Welcome to Transformer Reinforcement Learning (trl)
 
 keywords: fastai
 sidebar: home_sidebar
 
-summary: "Train transformer language models with Reinforcement Learning."
-description: "Train transformer language models with Reinforcement Learning."
+summary: "Train transformer language models with reinforcement learning."
+description: "Train transformer language models with reinforcement learning."
 ---
 <!--
 
@@ -29,10 +29,10 @@
 
 <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
 <div class="text_cell_render border-box-sizing rendered_html">
-<h2 id="What-is-it?">What is it?<a class="anchor-link" href="#What-is-it?"> </a></h2><p>With <code>trl</code> you can train transformer language models with Proximal Policy Optimization (PPO). The library is built with the <code>transformer</code> library by  🤗Huggingface. Therefore, pre-trained language models can be directly loaded via the transformer interface. At this point only GTP2 is implemented.</p>
+<h2 id="What-is-it?">What is it?<a class="anchor-link" href="#What-is-it?"> </a></h2><p>With <code>trl</code> you can train transformer language models with Proximal Policy Optimization (PPO). The library is built with the <code>transformer</code> library by  🤗 Hugging Face (<a href="https://github.com/huggingface/transformers">link</a>). Therefore, pre-trained language models can be directly loaded via the transformer interface. At this point only GTP2 is implemented.</p>
 <p><strong>Highlights:</strong></p>
 <ul>
-<li>GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in Reinforcement Learning.</li>
+<li>GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in reinforcement learning.</li>
 <li>PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.</li>
 <li>Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier.</li>
 </ul>
@@ -163,8 +163,8 @@ <h2 id="Notebooks">Notebooks<a class="anchor-link" href="#Notebooks"> </a></h2><
 </div>
 <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
 <div class="text_cell_render border-box-sizing rendered_html">
-<h2 id="Reference">Reference<a class="anchor-link" href="#Reference"> </a></h2><h3 id="Proximal-Policy-Optimisation">Proximal Policy Optimisation<a class="anchor-link" href="#Proximal-Policy-Optimisation"> </a></h3><p>The PPO implementation largely follows the structure introduced in the paper <strong>"Fine-Tuning Language Models from Human Preferences"</strong> by D. Ziegler et al. [<a href="https://arxiv.org/pdf/1909.08593.pdf">paper</a>, <a href="https://github.com/openai/lm-human-preferences">code</a>].</p>
-<h3 id="Language-models">Language models<a class="anchor-link" href="#Language-models"> </a></h3><p>The language models utilize the <code>transformer</code> library by 🤗Huggingface.</p>
+<h2 id="References">References<a class="anchor-link" href="#References"> </a></h2><h3 id="Proximal-Policy-Optimisation">Proximal Policy Optimisation<a class="anchor-link" href="#Proximal-Policy-Optimisation"> </a></h3><p>The PPO implementation largely follows the structure introduced in the paper <strong>"Fine-Tuning Language Models from Human Preferences"</strong> by D. Ziegler et al. [<a href="https://arxiv.org/pdf/1909.08593.pdf">paper</a>, <a href="https://github.com/openai/lm-human-preferences">code</a>].</p>
+<h3 id="Language-models">Language models<a class="anchor-link" href="#Language-models"> </a></h3><p>The language models utilize the <code>transformer</code> library by 🤗Hugging Face.</p>
 
 </div>
 </div>

diff --git a/nbs/01-gpt2-with-value-head.ipynb b/nbs/01-gpt2-with-value-head.ipynb
@@ -5,7 +5,7 @@
    "metadata": {},
    "source": [
     "# GPT2 with value head\n",
-    "> A GPT2 model with a value head built on the transformer library by huggingface."
+    "> A GPT2 model with a value head built on the `transformer` library by Hugging Face."
    ]
   },
   {

diff --git a/nbs/02-ppo.ipynb b/nbs/02-ppo.ipynb
@@ -13,7 +13,7 @@
    "metadata": {},
    "source": [
     "This follows the language model approach proposed in paper [\"Fine-Tuning Language Models from Human Preferences\"](\n",
-    "https://arxiv.org/pdf/1909.08593.pdf) and is similar to the [original implementation](https://github.com/openai/lm-human-preferences). The two main differences are 1) the method is implemented in Pytorch and 2) works with the transformer library by Huggingface."
+    "https://arxiv.org/pdf/1909.08593.pdf) and is similar to the [original implementation](https://github.com/openai/lm-human-preferences). The two main differences are 1) the method is implemented in Pytorch and 2) works with the `transformer` library by Hugging Face."
    ]
   },
   {
@@ -137,8 +137,8 @@
     "        Initialize PPOTrainer.\n",
     "        \n",
     "        Args:\n",
-    "            model (torch.model): Huggingface GPT2 model\n",
-    "            ref_model (torch.model): Huggingface GPT2 refrence model used for KL penalty\n",
+    "            model (torch.model): Hugging Face transformer GPT2 model with value head\n",
+    "            ref_model (torch.model): Hugging Face transformer GPT2 refrence model used for KL penalty\n",
     "            ppo_params (dict or None): PPO parameters for training. Can include following keys:\n",
     "                'lr' (float): Adam learning rate, default: 1.41e-5\n",
     "                'batch_size' (int): Number of samples per optimisation step, default: 256\n",

diff --git a/nbs/04-gpt2-sentiment-ppo-training.ipynb b/nbs/04-gpt2-sentiment-ppo-training.ipynb
@@ -14,7 +14,7 @@
    "source": [
     "<div style=\"text-align: center\">\n",
     "<img src='images/gpt2_bert_training.png' width='600'>\n",
-    "<p style=\"text-align: center;\"> <b>Figure:</b> Experiment setup to tune GPT2. The yellow arrows are outside the scope of this notebook, but the trained models are available through Huggingface. </p>\n",
+    "<p style=\"text-align: center;\"> <b>Figure:</b> Experiment setup to tune GPT2. The yellow arrows are outside the scope of this notebook, but the trained models are available through Hugging Face. </p>\n",
     "</div>\n",
     "\n",
     "\n",

diff --git a/nbs/index.ipynb b/nbs/index.ipynb
@@ -4,20 +4,20 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Welcome to trl\n",
+    "# Welcome to Transformer Reinforcement Learning (trl)\n",
     "\n",
-    "> Train transformer language models with Reinforcement Learning."
+    "> Train transformer language models with reinforcement learning."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## What is it?\n",
-    "With `trl` you can train transformer language models with Proximal Policy Optimization (PPO). The library is built with the `transformer` library by  🤗Huggingface. Therefore, pre-trained language models can be directly loaded via the transformer interface. At this point only GTP2 is implemented.\n",
+    "With `trl` you can train transformer language models with Proximal Policy Optimization (PPO). The library is built with the `transformer` library by  🤗 Hugging Face ([link](https://github.com/huggingface/transformers)). Therefore, pre-trained language models can be directly loaded via the transformer interface. At this point only GTP2 is implemented.\n",
     "\n",
     "**Highlights:**\n",
-    "- GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in Reinforcement Learning.\n",
+    "- GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in reinforcement learning.\n",
     "- PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.\n",
     "- Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier."
    ]
@@ -162,13 +162,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Reference\n",
+    "## References\n",
     "\n",
     "### Proximal Policy Optimisation\n",
     "The PPO implementation largely follows the structure introduced in the paper **\"Fine-Tuning Language Models from Human Preferences\"** by D. Ziegler et al. \\[[paper](https://arxiv.org/pdf/1909.08593.pdf), [code](https://github.com/openai/lm-human-preferences)].\n",
     "\n",
     "### Language models\n",
-    "The language models utilize the `transformer` library by 🤗Huggingface."
+    "The language models utilize the `transformer` library by 🤗Hugging Face."
    ]
   },
   {

diff --git a/trl/ppo.py b/trl/ppo.py
@@ -77,8 +77,8 @@ def __init__(self, model, ref_model, **ppo_params):
         Initialize PPOTrainer.
 
         Args:
-            model (torch.model): Huggingface GPT2 model
-            ref_model (torch.model): Huggingface GPT2 refrence model used for KL penalty
+            model (torch.model): Hugging Face transformer GPT2 model with value head
+            ref_model (torch.model): Hugging Face transformer GPT2 refrence model used for KL penalty
             ppo_params (dict or None): PPO parameters for training. Can include following keys:
                 'lr' (float): Adam learning rate, default: 1.41e-5
                 'batch_size' (int): Number of samples per optimisation step, default: 256