Update README.md

De30 · Nov 6, 2023 · 2a902ae · 2a902ae
1 parent 20e0655
commit 2a902ae
Showing 1 changed file with 1 addition and 10 deletions.
diff --git a/WizardLM/README.md b/WizardLM/README.md
@@ -142,15 +142,6 @@ We will provide our latest models for you to try for as long as possible. If you
 
 
 
-## Training Data
-
-[`alpaca_evol_instruct_70k.json`](https://huggingface.co/datasets/victor123/evol_instruct_70k) contains 70K instruction-following data generated from Evol-Instruct. We used it for fine-tuning the WizardLM model.
-This JSON file is a list of dictionaries, each dictionary contains the following fields:
-
-- `instruction`: `str`, describes the task the model should perform. Each of the 70K instructions is unique.
-- `output`: `str`, the answer to the instruction as generated by `gpt-3.5-turbo`.
-
-
 
 ## WizardLM Weights
 We release [WizardLM] weights as delta weights to comply with the LLaMA model license.
@@ -243,7 +234,7 @@ python src\inference_wizardlm.py
 
 ### Evaluation
 
-To evaluate Wizard, we conduct human evaluation on the inputs from our human instruct evaluation set [`WizardLM_testset.jsonl`](./data/WizardLM_testset.jsonl) . This evaluation set was collected by the authors and covers a diverse list of user-oriented instructions including difficult Coding Generation & Debugging, Math, Reasoning, Complex Formats, Academic Writing, Extensive Disciplines, and so on. We performed a blind pairwise comparison between Wizard and baselines. Specifically, we recruit 10 well-educated annotators to rank the models from 1 to 5 on relevance, knowledgeable, reasoning, calculation and accuracy. 
+To evaluate Wizard, we conduct human evaluation on the inputs from our human instruct evaluation set [`WizardLM_testset.jsonl`]. This evaluation set was collected by the authors and covers a diverse list of user-oriented instructions including difficult Coding Generation & Debugging, Math, Reasoning, Complex Formats, Academic Writing, Extensive Disciplines, and so on. We performed a blind pairwise comparison between Wizard and baselines. Specifically, we recruit 10 well-educated annotators to rank the models from 1 to 5 on relevance, knowledgeable, reasoning, calculation and accuracy. 
 
 WizardLM achieved significantly better results than Alpaca and Vicuna-7b. 
 <p align="center" width="60%">