Merge branch 'nlpxucan:main' into main

SeekPoint · Aug 12, 2023 · 0256a77 · 0256a77
2 parents d0c1ccc + 41bdf67
commit 0256a77
Show file tree

Hide file tree

Showing 2 changed files with 46 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -16,6 +16,19 @@
 [![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE)
 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
 
+
+- 🔥🔥🔥 Our **WizardMath-70B-V1.0** model slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**.
+- 🔥 Our **WizardMath-70B-V1.0** model achieves  **81.6 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), which is **24.8** points higher than the SOTA open-source LLM.
+- 🔥 Our **WizardMath-70B-V1.0** model achieves  **22.7 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), which is **9.2** points higher than the SOTA open-source LLM.
+
+| Model | Checkpoint | Paper  | GSM8k | MATH  | License|
+| ----- |------| ---- |------|-------| ----- |
+| WizardMath-70B-V1.0 | 🤗 <a href="https://huggingface.co/WizardLM/WizardMath-70B-V1.0" target="_blank">HF Link</a> |  📃Coming Soon| **81.6**  |  **22.7**	  | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a> |
+| WizardMath-13B-V1.0 | 🤗 <a href="https://huggingface.co/WizardLM/WizardMath-13B-V1.0" target="_blank">HF Link</a> |  📃Coming Soon| **63.9**  |  **14.0**	  | <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a> |
+| WizardMath-7B-V1.0 | 🤗 <a href="https://huggingface.co/WizardLM/WizardMath-7B-V1.0" target="_blank">HF Link</a>  |  📃Coming Soon| 	 **54.9**  |  **10.7** |  <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" target="_blank">Llama 2 License </a>|
+
+
+
 <font size=0.5>
 
 
@@ -57,12 +70,7 @@ To obtain results **identical to our demo**, please strictly follow the prompts
 <b>For WizardLM</b>, the Prompt should be as following:
 
 ```
-A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
-USER: Hi
-ASSISTANT: Hello.
-USER: Who are you?
-ASSISTANT: I am WizardLM.
-......
+A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.</s>USER: Who are you? ASSISTANT: I am WizardLM.</s>......
 ```
 
 <b>For WizardCoder </b>, the Prompt should be as following:

diff --git a/WizardMath/README.md b/WizardMath/README.md
@@ -9,7 +9,7 @@ To develop our WizardMath model, we begin with adapting the **Evol-Instruct** an
 
 ## News
 
-- 🔥 Our **WizardMath-70B-V1.0** model achieves slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**.
+- 🔥 Our **WizardMath-70B-V1.0** model slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**.
 - 🔥 Our **WizardMath-70B-V1.0** model achieves  **81.6 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), which is **24.8** points higher than the SOTA open-source LLM.
 - 🔥 Our **WizardMath-70B-V1.0** model achieves  **22.7 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), which is **9.2** points higher than the SOTA open-source LLM.
 
@@ -32,6 +32,37 @@ To develop our WizardMath model, we begin with adapting the **Evol-Instruct** an
 
 ❗❗❗**Note: The score of ChatGPT reported by [Model Selection](https://arxiv.org/pdf/2305.14333v1.pdf) is 80.8%.**
 
+❗<b>Note for model system prompts usage:</b>
+
+**Default version:**
+
+```
+Below is an instruction that describes a task. Write a response that appropriately completes the request.
+
+
+### Instruction:
+{instruction}
+
+
+### Response: 
+```
+
+
+**CoT Version:** （❗For the **simple** math questions, we do NOT recommend to use the CoT prompt.） 
+
+
+```
+Below is an instruction that describes a task. Write a response that appropriately completes the request.
+
+
+### Instruction:
+{instruction}
+
+
+### Response: Let's think step by step.
+```
+
+
 The following table clearly demonstrates that our **WizardMath** exhibits a substantial performance advantage over all the open-source models on the GSM8k and MATH benchmarks. 
 
 ❗ **If you are confused with the different scores of our 7B, 13B and 70B models (54.9, 63.9 and 81.6), please check the Notes.**