Skip to content

Commit

Permalink
Fix nits & missing things (huggingface#108)
Browse files Browse the repository at this point in the history
* fix nits & missing things

* Update README.md
  • Loading branch information
younesbelkada authored Jan 25, 2023
1 parent c1b328e commit d4aca61
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 5 deletions.
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@

## What is it?
With `trl` you can train transformer language models with Proximal Policy Optimization (PPO). The library is built on top of the [`transformers`](https://github.com/huggingface/transformers) library by 🤗 Hugging Face. Therefore, pre-trained language models can be directly loaded via `transformers`. At this point only decoder architectures such as GPT2 are implemented.
With `trl` you can train transformer language models with Proximal Policy Optimization (PPO). The library is built on top of the [`transformers`](https://github.com/huggingface/transformers) library by 🤗 Hugging Face. Therefore, pre-trained language models can be directly loaded via `transformers`. At this point most of decoder architectures and encoder-decoder architectures are supported.

**Highlights:**
- PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.
- AutoModelForCausalLMWithValueHead: A transformer model with an additional scalar output for each token which can be used as a value function in reinforcement learning.
- `PPOTrainer`: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.
- `AutoModelForCausalLMWithValueHead` & `AutoModelForSeq2SeqLMWithValueHead`: A transformer model with an additional scalar output for each token which can be used as a value function in reinforcement learning.
- Example: Train GPT2 to generate positive movie reviews with a BERT sentiment classifier.

## How it works
Expand Down Expand Up @@ -78,7 +78,6 @@ response_tensor = respond_to_batch(model_ref, query_tensor)

# create a ppo trainer
ppo_trainer = PPOTrainer(ppo_config, model, model_ref, tokenizer)
device = ppo_trainer.accelerator.device

# define a reward for response
# (this could be any reward such as human feedback or output from another model)
Expand Down
31 changes: 30 additions & 1 deletion examples/sentiment/scripts/gpt2-sentiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,36 @@
# Below is an example function to build the dataset. In our case, we use the IMDB dataset
# from the `datasets` library. One should customize this function to train the model on
# its own dataset.

def build_dataset(config, dataset_name="imdb", input_min_text_length=2, input_max_text_length=8):
"""
Build dataset for training. This builds the dataset from `load_dataset`, one should
customize this function to train the model on its own dataset.
Args:
dataset_name (`str`):
The name of the dataset to be loaded.
Returns:
dataloader (`torch.utils.data.DataLoader`):
The dataloader for the dataset.
"""
tokenizer = AutoTokenizer.from_pretrained(config.model_name)
tokenizer.pad_token = tokenizer.eos_token
# load imdb with datasets
ds = load_dataset(dataset_name, split='train')
ds = ds.rename_columns({'text': 'review'})
ds = ds.filter(lambda x: len(x["review"])>200, batched=False)

input_size = LengthSampler(input_min_text_length, input_max_text_length)

def tokenize(sample):
sample["input_ids"] = tokenizer.encode(sample["review"])[:input_size()]
sample["query"] = tokenizer.decode(sample["input_ids"])
return sample

ds = ds.map(tokenize, batched=False)
ds.set_format(type='torch')
return ds

# We retrieve the dataloader by calling the `build_dataset` function.
dataset = build_dataset(config)
Expand Down

0 comments on commit d4aca61

Please sign in to comment.