Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
dong-river committed Feb 21, 2024
0 parents commit f580f9a
Show file tree
Hide file tree
Showing 1,769 changed files with 957,508 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/.DS_Store
23 changes: 23 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# LLM_unlearning

## Install
```bash
conda create --name unlearning python=3.9.16
conda activate unlearning
pip install -r requirements.txt
```
## Run one experiment (125M FT)
```
python main.py --method DI --cd_num_token 1000 --model_name_or_path EleutherAI/gpt-neo-125m --train_batch_size 64 --eval_batch_size 256 --eval_num 5000 --num_epochs_di 10 --lr_di 1e-06 --di_strength 3 --output_folder outputs_new
```

## Run one experiment (1.3B LoRA)
```
python main.py --method DI --model_name_or_path EleutherAI/gpt-neo-1.3B --train_batch_size 32 --eval_batch_size 64 --eval_num 5000 --lr_di 5e-06 --di_strength 3 --num_epochs_di 100 --gradient_accu 2 --early_stop True --early_stop_criteria 1.03 --peft lora --rank 8 --lora_alpha 16 --warmup_steps 100 --output_folder outputs_new
```

## Run full experiments
Python files under ./exp would create thorough experiments for different purposes.

## Logs and Visualization
Running main.py will produce a result file and a generation example file. You then use parse_log.py to convert that to CSV file. We have our old results in the ./output folder and you can use visualization.ipynb to visualize it.
Binary file added data/.DS_Store
Binary file not shown.
Empty file.
Empty file.
Empty file.
Empty file.
Empty file.
19,621 changes: 19,621 additions & 0 deletions data/benchmark/lambada.csv

Large diffs are not rendered by default.

4,267 changes: 4,267 additions & 0 deletions data/benchmark/pubmed_qa.csv

Large diffs are not rendered by default.

Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Challenge/train-00000-of-00001.parquet", "etag": null}
Empty file.
10,003 changes: 10,003 additions & 0 deletions data/downloads/30b6e49bd1e17dbfea4c75c30d8399bf3a92f898e9832ef6ca159f74eabd6754

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_test.jsonl", "etag": null}
Empty file.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://dl.fbaipublicfiles.com/glue/superglue/data/v2/COPA.zip", "etag": null}
Empty file.
39,905 changes: 39,905 additions & 0 deletions data/downloads/630ed04bd62ee51d06d9ba13f00fe153c1951e84594dd9df8c4c1c9587516f77

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_train.jsonl", "etag": null}
Empty file.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Easy/test-00000-of-00001.parquet", "etag": null}
Empty file.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Easy/train-00000-of-00001.parquet", "etag": null}
Empty file.
10,042 changes: 10,042 additions & 0 deletions data/downloads/af9990f4ae181bbfbb2e33863f2dfa12b92eb453b0b0fc524106741d796a2d15

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_val.jsonl", "etag": null}
Empty file.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Challenge/validation-00000-of-00001.parquet", "etag": null}
Empty file.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://storage.googleapis.com/ai2-mosaic/public/physicaliqa/physicaliqa-train-dev.zip", "etag": null}
Empty file.
3,084 changes: 3,084 additions & 0 deletions data/downloads/e22289a2fc01bf5d112b3c9c699b8105bcb4d573ca3d8470b7f0c416771f76e1

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://yonatanbisk.com/piqa/data/tests.jsonl", "etag": null}
Empty file.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://storage.googleapis.com/ai2-mosaic/public/winogrande/winogrande_1.1.zip", "etag": null}
Empty file.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Challenge/test-00000-of-00001.parquet", "etag": null}
Empty file.
Empty file.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# WinoGrande

Version 1.1 (Sep 16th, 2020)

- - -

## Data

./data/
├── train_[xs,s,m,l,xl].jsonl # training set with differnt sizes
├── train_[xs,s,m,l,xl]-labels.lst # answer labels for training sets
├── train_debiased.jsonl # debiased training set
├── train_debiased-labels.lst # answer labels for debiased training set
├── dev.jsonl # development set
├── dev-labels.lst # answer labels for development set
├── test.jsonl # test set
├── sample-submissions-labels.lst # example submission file for leaderboard
└── eval.py # evaluation script

You can use `train_*.jsonl` for training models and `dev` for validation.
Please note that labels are not included in `test.jsonl`. To evaluate your models on `test` set, make a submission to our [leaderboard](https://winogrande.allenai.org).


## Evaluation

You can use `eval.py` for evaluation on the dev split, which yields `metrics.json`.

e.g., python eval.py --preds_file ./YOUR_PREDICTIONS.lst --labels_file ./dev-labels.lst

In the prediction file, each line consists of the predictions (1 or 2) by 5 training sets (ordered by `xs`, `s`, `m`, `l`, `xl`, separated by comma) for each evauation set question.

2,1,1,1,1
1,1,2,2,2
1,1,1,1,1
.........
.........

Namely, the first column is the predictions by a model trained/finetuned on `train_xs.jsonl`, followed by a model prediction by `train_s.jsonl`, ... , and the last (fifth) column is the predictions by a model from `train_xl.jsonl`.
Please checkout a sample submission file (`sample-submission-labels.lst`) for reference.

## Submission to Leaderboard

You can submit your predictions on `test` set to the [leaderboard](http://winogrande.allenai.org).
The submission file must be named as `predictions.lst`. The format is the same as above.


## Reference
If you use this dataset, please cite the following paper:

@article{sakaguchi2019winogrande,
title={WinoGrande: An Adversarial Winograd Schema Challenge at Scale},
author={Sakaguchi, Keisuke and Bras, Ronan Le and Bhagavatula, Chandra and Choi, Yejin},
journal={arXiv preprint arXiv:1907.10641},
year={2019}
}


## License

Winogrande dataset is licensed under CC BY 2.0.


## Questions?

You may ask us questions at our [google group](https://groups.google.com/a/allenai.org/forum/#!forum/winogrande).


## Contact

Email: keisukes[at]allenai.org
Loading

0 comments on commit f580f9a

Please sign in to comment.