init

dong-river · Feb 21, 2024 · f580f9a · f580f9a
commit f580f9a
Show file tree

Hide file tree

Showing 1,769 changed files with 957,508 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+/.DS_Store
diff --git a/README.md b/README.md
@@ -0,0 +1,23 @@
+# LLM_unlearning
+
+## Install
+```bash
+conda create --name unlearning python=3.9.16
+conda activate unlearning
+pip install -r requirements.txt
+```
+## Run one experiment (125M FT)
+```
+python main.py --method DI --cd_num_token 1000 --model_name_or_path EleutherAI/gpt-neo-125m --train_batch_size 64 --eval_batch_size 256 --eval_num 5000 --num_epochs_di 10 --lr_di 1e-06 --di_strength 3 --output_folder outputs_new
+```
+
+## Run one experiment (1.3B LoRA)
+```
+python main.py --method DI --model_name_or_path EleutherAI/gpt-neo-1.3B --train_batch_size 32 --eval_batch_size 64 --eval_num 5000 --lr_di 5e-06 --di_strength 3 --num_epochs_di 100 --gradient_accu 2 --early_stop True --early_stop_criteria 1.03 --peft lora --rank 8 --lora_alpha 16 --warmup_steps 100 --output_folder outputs_new
+```
+
+## Run full experiments
+Python files under ./exp would create thorough experiments for different purposes.
+
+## Logs and Visualization
+Running main.py will produce a result file and a generation example file. You then use parse_log.py to convert that to CSV file. We have our old results in the ./output folder and you can use visualization.ipynb to visualize it.
diff --git a/data/.DS_Store b/data/.DS_Store
diff --git a/...aswag_default_0.1.0_512a66dd8b1b1643ab4a48aa4f150d04c91680da6a4096498a5e5f799623d5ae.lock b/...aswag_default_0.1.0_512a66dd8b1b1643ab4a48aa4f150d04c91680da6a4096498a5e5f799623d5ae.lock
diff --git a/...9f3eb60e823d5_0.0.0_2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec.lock b/...9f3eb60e823d5_0.0.0_2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec.lock
diff --git a/...qa_plain_text_1.1.0_6c611c1a9bf220943c4174e117d3b660859665baf1d43156230116185312d011.lock b/...qa_plain_text_1.1.0_6c611c1a9bf220943c4174e117d3b660859665baf1d43156230116185312d011.lock
diff --git a/...per_glue_copa_1.0.3_bb9675f958ebfee0d5d6dc5476fafe38c79123727a7258d515c450873dbdbbed.lock b/...per_glue_copa_1.0.3_bb9675f958ebfee0d5d6dc5476fafe38c79123727a7258d515c450873dbdbbed.lock
diff --git a/..._winogrande_s_1.1.0_a826c3d3506aefe0e9e9390dcb53271070536586bab95849876b2c1743df56e2.lock b/..._winogrande_s_1.1.0_a826c3d3506aefe0e9e9390dcb53271070536586bab95849876b2c1743df56e2.lock
diff --git a/data/benchmark/lambada.csv b/data/benchmark/lambada.csv
diff --git a/data/benchmark/pubmed_qa.csv b/data/benchmark/pubmed_qa.csv
diff --git a/data/downloads/2b8d1b41bc3410e7183fc0ac9512242c0271f4e5556665158155ae695713d6e3 b/data/downloads/2b8d1b41bc3410e7183fc0ac9512242c0271f4e5556665158155ae695713d6e3
diff --git a/data/downloads/2b8d1b41bc3410e7183fc0ac9512242c0271f4e5556665158155ae695713d6e3.json b/data/downloads/2b8d1b41bc3410e7183fc0ac9512242c0271f4e5556665158155ae695713d6e3.json
@@ -0,0 +1 @@
+{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Challenge/train-00000-of-00001.parquet", "etag": null}
diff --git a/data/downloads/2b8d1b41bc3410e7183fc0ac9512242c0271f4e5556665158155ae695713d6e3.lock b/data/downloads/2b8d1b41bc3410e7183fc0ac9512242c0271f4e5556665158155ae695713d6e3.lock
diff --git a/data/downloads/30b6e49bd1e17dbfea4c75c30d8399bf3a92f898e9832ef6ca159f74eabd6754 b/data/downloads/30b6e49bd1e17dbfea4c75c30d8399bf3a92f898e9832ef6ca159f74eabd6754
diff --git a/data/downloads/30b6e49bd1e17dbfea4c75c30d8399bf3a92f898e9832ef6ca159f74eabd6754.json b/data/downloads/30b6e49bd1e17dbfea4c75c30d8399bf3a92f898e9832ef6ca159f74eabd6754.json
@@ -0,0 +1 @@
+{"url": "https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_test.jsonl", "etag": null}
diff --git a/data/downloads/30b6e49bd1e17dbfea4c75c30d8399bf3a92f898e9832ef6ca159f74eabd6754.lock b/data/downloads/30b6e49bd1e17dbfea4c75c30d8399bf3a92f898e9832ef6ca159f74eabd6754.lock
diff --git a/data/downloads/53d2f20b2636031aca97f6c04afef6cba49ef933449622025adfc8809de8b032 b/data/downloads/53d2f20b2636031aca97f6c04afef6cba49ef933449622025adfc8809de8b032
diff --git a/data/downloads/53d2f20b2636031aca97f6c04afef6cba49ef933449622025adfc8809de8b032.json b/data/downloads/53d2f20b2636031aca97f6c04afef6cba49ef933449622025adfc8809de8b032.json
@@ -0,0 +1 @@
+{"url": "https://dl.fbaipublicfiles.com/glue/superglue/data/v2/COPA.zip", "etag": null}
diff --git a/data/downloads/53d2f20b2636031aca97f6c04afef6cba49ef933449622025adfc8809de8b032.lock b/data/downloads/53d2f20b2636031aca97f6c04afef6cba49ef933449622025adfc8809de8b032.lock
diff --git a/data/downloads/630ed04bd62ee51d06d9ba13f00fe153c1951e84594dd9df8c4c1c9587516f77 b/data/downloads/630ed04bd62ee51d06d9ba13f00fe153c1951e84594dd9df8c4c1c9587516f77
diff --git a/data/downloads/630ed04bd62ee51d06d9ba13f00fe153c1951e84594dd9df8c4c1c9587516f77.json b/data/downloads/630ed04bd62ee51d06d9ba13f00fe153c1951e84594dd9df8c4c1c9587516f77.json
@@ -0,0 +1 @@
+{"url": "https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_train.jsonl", "etag": null}
diff --git a/data/downloads/630ed04bd62ee51d06d9ba13f00fe153c1951e84594dd9df8c4c1c9587516f77.lock b/data/downloads/630ed04bd62ee51d06d9ba13f00fe153c1951e84594dd9df8c4c1c9587516f77.lock
diff --git a/data/downloads/63c87df0329762fa4cf5a54b6d1a15173d51b1044fe330490daeafb0b54754a8 b/data/downloads/63c87df0329762fa4cf5a54b6d1a15173d51b1044fe330490daeafb0b54754a8
diff --git a/data/downloads/63c87df0329762fa4cf5a54b6d1a15173d51b1044fe330490daeafb0b54754a8.json b/data/downloads/63c87df0329762fa4cf5a54b6d1a15173d51b1044fe330490daeafb0b54754a8.json
@@ -0,0 +1 @@
+{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Easy/test-00000-of-00001.parquet", "etag": null}
diff --git a/data/downloads/63c87df0329762fa4cf5a54b6d1a15173d51b1044fe330490daeafb0b54754a8.lock b/data/downloads/63c87df0329762fa4cf5a54b6d1a15173d51b1044fe330490daeafb0b54754a8.lock
diff --git a/data/downloads/8c447af4bc8816f3aa2900a1d99f34bafeb1d6ad26dfcfba129dfdccb5120b87 b/data/downloads/8c447af4bc8816f3aa2900a1d99f34bafeb1d6ad26dfcfba129dfdccb5120b87
diff --git a/data/downloads/8c447af4bc8816f3aa2900a1d99f34bafeb1d6ad26dfcfba129dfdccb5120b87.json b/data/downloads/8c447af4bc8816f3aa2900a1d99f34bafeb1d6ad26dfcfba129dfdccb5120b87.json
@@ -0,0 +1 @@
+{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Easy/train-00000-of-00001.parquet", "etag": null}
diff --git a/data/downloads/8c447af4bc8816f3aa2900a1d99f34bafeb1d6ad26dfcfba129dfdccb5120b87.lock b/data/downloads/8c447af4bc8816f3aa2900a1d99f34bafeb1d6ad26dfcfba129dfdccb5120b87.lock
diff --git a/data/downloads/af9990f4ae181bbfbb2e33863f2dfa12b92eb453b0b0fc524106741d796a2d15 b/data/downloads/af9990f4ae181bbfbb2e33863f2dfa12b92eb453b0b0fc524106741d796a2d15
diff --git a/data/downloads/af9990f4ae181bbfbb2e33863f2dfa12b92eb453b0b0fc524106741d796a2d15.json b/data/downloads/af9990f4ae181bbfbb2e33863f2dfa12b92eb453b0b0fc524106741d796a2d15.json
@@ -0,0 +1 @@
+{"url": "https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_val.jsonl", "etag": null}
diff --git a/data/downloads/af9990f4ae181bbfbb2e33863f2dfa12b92eb453b0b0fc524106741d796a2d15.lock b/data/downloads/af9990f4ae181bbfbb2e33863f2dfa12b92eb453b0b0fc524106741d796a2d15.lock
diff --git a/data/downloads/ba208327ccb4a2f2b093cdd7eecee1c96cbbe3d92ec67e1be12bdbc972d4eea8 b/data/downloads/ba208327ccb4a2f2b093cdd7eecee1c96cbbe3d92ec67e1be12bdbc972d4eea8
diff --git a/data/downloads/ba208327ccb4a2f2b093cdd7eecee1c96cbbe3d92ec67e1be12bdbc972d4eea8.json b/data/downloads/ba208327ccb4a2f2b093cdd7eecee1c96cbbe3d92ec67e1be12bdbc972d4eea8.json
@@ -0,0 +1 @@
+{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Challenge/validation-00000-of-00001.parquet", "etag": null}
diff --git a/data/downloads/ba208327ccb4a2f2b093cdd7eecee1c96cbbe3d92ec67e1be12bdbc972d4eea8.lock b/data/downloads/ba208327ccb4a2f2b093cdd7eecee1c96cbbe3d92ec67e1be12bdbc972d4eea8.lock
diff --git a/data/downloads/d1b38d244e5da498143659669d640dab6fb81dfba94ede666ed9d3f3c3be694a b/data/downloads/d1b38d244e5da498143659669d640dab6fb81dfba94ede666ed9d3f3c3be694a
diff --git a/data/downloads/d1b38d244e5da498143659669d640dab6fb81dfba94ede666ed9d3f3c3be694a.json b/data/downloads/d1b38d244e5da498143659669d640dab6fb81dfba94ede666ed9d3f3c3be694a.json
@@ -0,0 +1 @@
+{"url": "https://storage.googleapis.com/ai2-mosaic/public/physicaliqa/physicaliqa-train-dev.zip", "etag": null}
diff --git a/data/downloads/d1b38d244e5da498143659669d640dab6fb81dfba94ede666ed9d3f3c3be694a.lock b/data/downloads/d1b38d244e5da498143659669d640dab6fb81dfba94ede666ed9d3f3c3be694a.lock
diff --git a/data/downloads/e22289a2fc01bf5d112b3c9c699b8105bcb4d573ca3d8470b7f0c416771f76e1 b/data/downloads/e22289a2fc01bf5d112b3c9c699b8105bcb4d573ca3d8470b7f0c416771f76e1
diff --git a/data/downloads/e22289a2fc01bf5d112b3c9c699b8105bcb4d573ca3d8470b7f0c416771f76e1.json b/data/downloads/e22289a2fc01bf5d112b3c9c699b8105bcb4d573ca3d8470b7f0c416771f76e1.json
@@ -0,0 +1 @@
+{"url": "https://yonatanbisk.com/piqa/data/tests.jsonl", "etag": null}
diff --git a/data/downloads/e22289a2fc01bf5d112b3c9c699b8105bcb4d573ca3d8470b7f0c416771f76e1.lock b/data/downloads/e22289a2fc01bf5d112b3c9c699b8105bcb4d573ca3d8470b7f0c416771f76e1.lock
diff --git a/data/downloads/e60860809f7c35bc30394c47748cf246674b6314e8450f4c6a6cf9065ff0ab18 b/data/downloads/e60860809f7c35bc30394c47748cf246674b6314e8450f4c6a6cf9065ff0ab18
diff --git a/data/downloads/e60860809f7c35bc30394c47748cf246674b6314e8450f4c6a6cf9065ff0ab18.json b/data/downloads/e60860809f7c35bc30394c47748cf246674b6314e8450f4c6a6cf9065ff0ab18.json
@@ -0,0 +1 @@
+{"url": "https://storage.googleapis.com/ai2-mosaic/public/winogrande/winogrande_1.1.zip", "etag": null}
diff --git a/data/downloads/e60860809f7c35bc30394c47748cf246674b6314e8450f4c6a6cf9065ff0ab18.lock b/data/downloads/e60860809f7c35bc30394c47748cf246674b6314e8450f4c6a6cf9065ff0ab18.lock
diff --git a/data/downloads/ec545f4634b4c60d7eba3ff158bce61c6c016554c7fa834b5be8ed09a721b8c3 b/data/downloads/ec545f4634b4c60d7eba3ff158bce61c6c016554c7fa834b5be8ed09a721b8c3
diff --git a/data/downloads/ec545f4634b4c60d7eba3ff158bce61c6c016554c7fa834b5be8ed09a721b8c3.json b/data/downloads/ec545f4634b4c60d7eba3ff158bce61c6c016554c7fa834b5be8ed09a721b8c3.json
@@ -0,0 +1 @@
+{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Challenge/test-00000-of-00001.parquet", "etag": null}
diff --git a/data/downloads/ec545f4634b4c60d7eba3ff158bce61c6c016554c7fa834b5be8ed09a721b8c3.lock b/data/downloads/ec545f4634b4c60d7eba3ff158bce61c6c016554c7fa834b5be8ed09a721b8c3.lock
diff --git a/...downloads/extracted/671ce19bc58f9c835a0af62a4eb9912b85d7b2a346a834088ccefac6435cc018.lock b/...downloads/extracted/671ce19bc58f9c835a0af62a4eb9912b85d7b2a346a834088ccefac6435cc018.lock
diff --git a/...f9c835a0af62a4eb9912b85d7b2a346a834088ccefac6435cc018/__MACOSX/winogrande_1.1/._README.md b/...f9c835a0af62a4eb9912b85d7b2a346a834088ccefac6435cc018/__MACOSX/winogrande_1.1/._README.md
diff --git a/...8f9c835a0af62a4eb9912b85d7b2a346a834088ccefac6435cc018/winogrande_1.1/README.md b/...8f9c835a0af62a4eb9912b85d7b2a346a834088ccefac6435cc018/winogrande_1.1/README.md
@@ -0,0 +1,70 @@
+# WinoGrande 
+
+Version 1.1 (Sep 16th, 2020)
+
+- - - 
+
+## Data
+
+    ./data/
+    ├── train_[xs,s,m,l,xl].jsonl          # training set with differnt sizes
+    ├── train_[xs,s,m,l,xl]-labels.lst     # answer labels for training sets
+    ├── train_debiased.jsonl               # debiased training set
+    ├── train_debiased-labels.lst          # answer labels for debiased training set
+    ├── dev.jsonl                          # development set
+    ├── dev-labels.lst                     # answer labels for development set
+    ├── test.jsonl                         # test set
+    ├── sample-submissions-labels.lst      # example submission file for leaderboard    
+    └── eval.py                            # evaluation script
+
+You can use `train_*.jsonl` for training models and `dev` for validation.
+Please note that labels are not included in `test.jsonl`. To evaluate your models on `test` set, make a submission to our [leaderboard](https://winogrande.allenai.org).
+
+
+## Evaluation
+
+You can use `eval.py` for evaluation on the dev split, which yields `metrics.json`. 
+
+    e.g., python eval.py --preds_file ./YOUR_PREDICTIONS.lst --labels_file ./dev-labels.lst
+
+In the prediction file, each line consists of the predictions (1 or 2) by 5 training sets (ordered by `xs`, `s`, `m`, `l`, `xl`, separated by comma) for each evauation set question. 
+
+     2,1,1,1,1
+     1,1,2,2,2
+     1,1,1,1,1
+     .........
+     .........
+
+Namely, the first column is the predictions by a model trained/finetuned on `train_xs.jsonl`, followed by a model prediction by `train_s.jsonl`, ... , and the last (fifth) column is the predictions by a model from `train_xl.jsonl`.
+Please checkout a sample submission file (`sample-submission-labels.lst`) for reference.
+
+## Submission to Leaderboard
+
+You can submit your predictions on `test` set to the [leaderboard](http://winogrande.allenai.org).
+The submission file must be named as `predictions.lst`. The format is the same as above.  
+
+
+## Reference
+If you use this dataset, please cite the following paper:
+
+	@article{sakaguchi2019winogrande,
+	    title={WinoGrande: An Adversarial Winograd Schema Challenge at Scale},
+	    author={Sakaguchi, Keisuke and Bras, Ronan Le and Bhagavatula, Chandra and Choi, Yejin},
+	    journal={arXiv preprint arXiv:1907.10641},
+	    year={2019}
+	}
+
+
+## License 
+
+Winogrande dataset is licensed under CC BY 2.0.
+
+
+## Questions?
+
+You may ask us questions at our [google group](https://groups.google.com/a/allenai.org/forum/#!forum/winogrande).
+
+
+## Contact 
+
+Email: keisukes[at]allenai.org
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"url": "https://huggingface.co/datasets/allenai/ai2_arc/resolve/210d026faf9955653af8916fad021475a3f00453/ARC-Challenge/train-00000-of-00001.parquet", "etag": null}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"url": "https://raw.githubusercontent.com/rowanz/hellaswag/master/data/hellaswag_test.jsonl", "etag": null}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"url": "https://dl.fbaipublicfiles.com/glue/superglue/data/v2/COPA.zip", "etag": null}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"url": "https://storage.googleapis.com/ai2-mosaic/public/physicaliqa/physicaliqa-train-dev.zip", "etag": null}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"url": "https://yonatanbisk.com/piqa/data/tests.jsonl", "etag": null}