Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
nlpxucan authored Nov 6, 2023
1 parent 20e0655 commit 2a902ae
Showing 1 changed file with 1 addition and 10 deletions.
11 changes: 1 addition & 10 deletions WizardLM/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,15 +142,6 @@ We will provide our latest models for you to try for as long as possible. If you



## Training Data

[`alpaca_evol_instruct_70k.json`](https://huggingface.co/datasets/victor123/evol_instruct_70k) contains 70K instruction-following data generated from Evol-Instruct. We used it for fine-tuning the WizardLM model.
This JSON file is a list of dictionaries, each dictionary contains the following fields:

- `instruction`: `str`, describes the task the model should perform. Each of the 70K instructions is unique.
- `output`: `str`, the answer to the instruction as generated by `gpt-3.5-turbo`.



## WizardLM Weights
We release [WizardLM] weights as delta weights to comply with the LLaMA model license.
Expand Down Expand Up @@ -243,7 +234,7 @@ python src\inference_wizardlm.py

### Evaluation

To evaluate Wizard, we conduct human evaluation on the inputs from our human instruct evaluation set [`WizardLM_testset.jsonl`](./data/WizardLM_testset.jsonl) . This evaluation set was collected by the authors and covers a diverse list of user-oriented instructions including difficult Coding Generation & Debugging, Math, Reasoning, Complex Formats, Academic Writing, Extensive Disciplines, and so on. We performed a blind pairwise comparison between Wizard and baselines. Specifically, we recruit 10 well-educated annotators to rank the models from 1 to 5 on relevance, knowledgeable, reasoning, calculation and accuracy.
To evaluate Wizard, we conduct human evaluation on the inputs from our human instruct evaluation set [`WizardLM_testset.jsonl`]. This evaluation set was collected by the authors and covers a diverse list of user-oriented instructions including difficult Coding Generation & Debugging, Math, Reasoning, Complex Formats, Academic Writing, Extensive Disciplines, and so on. We performed a blind pairwise comparison between Wizard and baselines. Specifically, we recruit 10 well-educated annotators to rank the models from 1 to 5 on relevance, knowledgeable, reasoning, calculation and accuracy.

WizardLM achieved significantly better results than Alpaca and Vicuna-7b.
<p align="center" width="60%">
Expand Down

0 comments on commit 2a902ae

Please sign in to comment.