Skip to content

Commit

Permalink
Merge pull request nlpxucan#187 from ChiYeungLaw/main
Browse files Browse the repository at this point in the history
update step-by-step reproduce method
  • Loading branch information
nlpxucan authored Aug 30, 2023
2 parents 53999f1 + 5d475ec commit dec2f01
Show file tree
Hide file tree
Showing 2 changed files with 69 additions and 1 deletion.
68 changes: 68 additions & 0 deletions WizardCoder/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,74 @@ python process_humaneval.py --path ${output_path} --out_path ${output_path}.json
evaluate_functional_correctness ${output_path}.jsonl
```

### How to Reproduce the 73.2 Pass@1 on HumanEval with Greedy Decoding?

❗❗❗**This performance is 100% reproducible!**

- Step 1: Setup the environment
```bash
conda create -n eval python=3.10

conda activate eval

conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
```

- Step 2: Install the packages
```
transformers==4.31.0
numpy
fire
sentencepiece
deepspeed==0.10.0
accelerate
vllm==0.1.4
pandas
ray
pyarrow
```

- Step 3: Install Human-Eval from OpenAI
```bash
git clone https://github.com/openai/human-eval.git
pip install -e human-eval
```
uncomment the execution call in `human-eval/human_eval/execution.py`

- Step 4: Generate Answer

use the code `WizardLM/blob/main/WizardCoder/src/humaneval_gen_vllm.py` to generate answer.
```bash
model="WizardLM/WizardCoder-Python-34B-V1.0"
temp=0.0
max_len=2048
pred_num=1
num_seqs_per_iter=1

output_path=preds/T${temp}_N${pred_num}

mkdir -p ${output_path}
echo 'Output path: '$output_path
echo 'Model to eval: '$model

CUDA_VISIBLE_DEVICES=0,1,2,3 python humaneval_gen_vllm.py --model ${model} \
--start_index 0 --end_index 164 --temperature ${temp} \
--num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --num_gpus 4
```

- Step 5: Get the score

use the code `WizardLM/blob/main/WizardCoder/src/process_humaneval.py` to get the score.
```bash
output_path=preds/T0.0_N1

echo 'Output path: '$output_path
python process_humaneval.py --path ${output_path} --out_path ${output_path}.jsonl --add_prompt

evaluate_functional_correctness ${output_path}.jsonl
```


### How to Reproduce the 59.8 Pass@1 on HumanEval with Greedy Decoding?

❗❗❗**This performance is 100% reproducible!**
Expand Down
2 changes: 1 addition & 1 deletion WizardCoder/src/humaneval_gen_vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def main():
num_samples = len(prompts)
print("Number of samples: {}".format(num_samples))

llm = LLM(base_model=args.model, tensor_parallel_size=args.num_gpus)
llm = LLM(model=args.model, tensor_parallel_size=args.num_gpus)
sampling_params = SamplingParams(temperature=args.temperature, top_p=1, max_tokens=args.max_len)

print(f"Loaded {args.model}.")
Expand Down

0 comments on commit dec2f01

Please sign in to comment.