From 690a58f91bc7942814b3839af20e8818a653a748 Mon Sep 17 00:00:00 2001 From: Enrico Shippole Date: Mon, 8 May 2023 01:49:43 -0400 Subject: [PATCH] add all files --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1bf6468..f9b7bb8 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ ## FAQ Three different size PaLM models (150m, 410m, 1b) have been trained with 8k context length on all of C4. The models are compatible with Lucidrain's Toolformer-pytorch, PaLM-pytorch, and PaLM-rlhf-pytorch. A fourth 2b model is currently being trained. These are currently the baseline versions of the models and additional training will be done at a larger scale. All of the models will be further instruction-tuned on FLAN to provide flan-PaLM models. -The models were trained with Flash Attention, Xpos Rotary Embeddings for better length extrapolation, and multi-query single-key-value attention for more efficient decoding. The models have been uploaded to Torch hub and the files are additionally stored on the Huggingface hub. You can find the model each of the PyTorch model files here: PaLM-150m, PaLM-410m, PaLM-1b. If the models are not downloading from Torch hub correctly be sure to clear out the checkpoint and model folders in `.cache/torch/hub/`. If that still does not resolve the issue then you can download the files from the Huggingface repositories. +The models were trained with Flash Attention, Xpos Rotary Embeddings for better length extrapolation, and multi-query single-key-value attention for more efficient decoding. The models have been uploaded to Torch hub and the files are additionally stored on the Huggingface hub. You can find the model each of the PyTorch model files here: PaLM-150m, PaLM-410m, PaLM-1b. If the models are not downloading from Torch hub correctly be sure to clear out the checkpoint and model folders in `.cache/torch/hub/`. If that still does not resolve the issue then you can download the files from the Huggingface repositories. Huggingface integration is currently a work-in-progress. All of the training data has been pre-tokenized with the GPTNEOX tokenizer and blocked at sequence lengths of 8192. This will help to save the large cost of preprocessing data. The datasets are available on Huggingface in parquet format and chunks here: C4 Chunk 1, C4 Chunk 2, C4 Chunk 3, C4 Chunk 4, and C4 Chunk 5. There is also another option in the distributed training script to not used the provided pre-tokenized C4 dataset and instead load and process another dataset such as openwebtext.