feat: add training example to highlights

shizhediao · Mar 30, 2020 · 2101f54 · 2101f54
1 parent df428af
commit 2101f54
Show file tree

Hide file tree

Showing 9 changed files with 5 additions and 74 deletions.
diff --git a/README.md b/README.md
@@ -8,6 +8,7 @@ With `trl` you can train transformer language models with Proximal Policy Optimi
 **Highlights:**
 - GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in Reinforcement Learning.
 - PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.
+- Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier.
 
 ## How it works
 Fine-tuning a language model via PPO consists of roughly three steps:

diff --git a/docs/04-gpt2-sentiment-ppo-training.html b/docs/04-gpt2-sentiment-ppo-training.html
@@ -91,7 +91,7 @@ <h3 id="Import-dependencies">Import dependencies<a class="anchor-link" href="#Im
 <div class="output_area">
 
 <div class="output_subarea output_stream output_stderr output_text">
-<pre>/Users/leandro/git/lm_ppo/env/lib/python3.7/site-packages/tqdm/std.py:658: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
+<pre>/Users/leandro/git/trl/env/lib/python3.7/site-packages/tqdm/std.py:658: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
   from pandas import Panel
 </pre>
 </div>

diff --git a/docs/index.html b/docs/index.html
@@ -34,6 +34,7 @@ <h2 id="What-is-it?">What is it?<a class="anchor-link" href="#What-is-it?"> </a>
 <ul>
 <li>GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in Reinforcement Learning.</li>
 <li>PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.</li>
+<li>Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier.</li>
 </ul>
 
 </div>

diff --git a/nbs/00-core.ipynb b/nbs/00-core.ipynb
@@ -200,18 +200,6 @@
    "display_name": "Python 3",
    "language": "python",
    "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.7"
   }
  },
  "nbformat": 4,

diff --git a/nbs/01-gpt2-with-value-head.ipynb b/nbs/01-gpt2-with-value-head.ipynb
@@ -467,18 +467,6 @@
    "display_name": "Python 3",
    "language": "python",
    "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.7"
   }
  },
  "nbformat": 4,

diff --git a/nbs/02-ppo.ipynb b/nbs/02-ppo.ipynb
@@ -396,18 +396,6 @@
    "display_name": "Python 3",
    "language": "python",
    "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.7"
   }
  },
  "nbformat": 4,

diff --git a/nbs/03-bert-imdb-training.ipynb b/nbs/03-bert-imdb-training.ipynb
@@ -490,18 +490,6 @@
    "display_name": "Python 3",
    "language": "python",
    "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.7"
   }
  },
  "nbformat": 4,

diff --git a/nbs/04-gpt2-sentiment-ppo-training.ipynb b/nbs/04-gpt2-sentiment-ppo-training.ipynb
@@ -968,18 +968,6 @@
    "display_name": "Python 3",
    "language": "python",
    "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.7"
   }
  },
  "nbformat": 4,

diff --git a/nbs/index.ipynb b/nbs/index.ipynb
@@ -18,7 +18,8 @@
     "\n",
     "**Highlights:**\n",
     "- GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in Reinforcement Learning.\n",
-    "- PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model."
+    "- PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.\n",
+    "- Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier."
    ]
   },
   {
@@ -183,18 +184,6 @@
    "display_name": "Python 3",
    "language": "python",
    "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.7"
   }
  },
  "nbformat": 4,
-Original file line number
+Diff line change
@@ Expand Up @@
     <ul>
     <li>GPT2 model with a value head: A transformer model with an additional scalar output for each token which can be used as a value function in Reinforcement Learning.</li>
     <li>PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.</li>
+    <li>Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier.</li>
     </ul>
     </div>
@@ Expand Down @@