Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
openvino-dev-samples committed Apr 11, 2024
1 parent eafd603 commit 9bfe4c6
Show file tree
Hide file tree
Showing 5 changed files with 440 additions and 1 deletion.
126 changes: 125 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,126 @@
# Qwen2.openvino
English | [简体中文](README_zh.md)

# Qwen2.openvino Demo

This sample shows how to deploy Qwen2 using OpenVINO

## 1. Environment configuration

We recommend that you create a new virtual environment and then install the dependencies as follows. The
recommended Python version is `3.10+`.

Linux

```
python3 -m venv openvino_env
source openvino_env/bin/activate
python3 -m pip install --upgrade pip
pip install wheel setuptools
pip install -r requirements.txt
```

Windows Powershell

```
python3 -m venv openvino_env
.\openvino_env\Scripts\activate
python3 -m pip install --upgrade pip
pip install wheel setuptools
pip install -r requirements.txt
```

## 2. Convert model

Since the Huggingface model needs to be converted to an OpenVINO IR model, you need to download the model and convert.

```
python3 convert.py --model_id Qwen/Qwen1.5-0.5B-Chat --precision int4 --output {your_path}/Qwen1.5-0.5B-Chat-ov
```

### Parameters that can be selected

* `--model_id` - path (absolute path) to be used from Huggngface_hub (https://huggingface.co/models) or the directory
where the model is located.
* `--precision` - model precision: fp16, int8 or int4.
* `--output` - the path where the converted model is saved
* If you have difficulty accessing `huggingface`, you can try to use `mirror-hf` to download

Linux
```
export HF_ENDPOINT=https://hf-mirror.com
```
Windows Powershell
```
$env:HF_ENDPOINT = "https://hf-mirror.com"
```
Download model
```
huggingface-cli download --resume-download --local-dir-use-symlinks False Qwen/Qwen1.5-0.5B-Chat --local-dir {your_path}/Qwen1.5-0.5B-Chat
```
## 3. Run the streaming chatbot
```
python3 chat.py --model_path {your_path}/Qwen1.5-0.5B-Chat-ov --max_sequence_length 4096 --device CPU
```
### Parameters that can be selected
* `--model_path` - The path to the directory where the OpenVINO IR model is located.
* `--max_sequence_length` - Maximum size of output tokens.
* `--device` - The device to run inference on. e.g "CPU","GPU".
## example
```
====Starting conversation====
User: hello
Qwen2-OpenVINO: Hello! How can I assist you today?

User: who are you ?
Qwen2-OpenVINO: I am an AI language model created by Alibaba Cloud. My purpose is to help users with their questions and provide them with accurate information. Is there anything specific you would like to know about me?

User: could you tell me a story ?
Qwen2-OpenVINO: Sure, here's a short story for you:

Once upon a time, in a small village nestled in the mountains, there lived a young girl named Lily who loved nature. She spent most of her days exploring the forest and watching the birds singing.

One day, while she was wandering through the woods, she stumbled upon a hidden cave deep within the forest. Inside, she found a beautiful crystal that sparkled with light. She picked it up and held it close to her heart, feeling a sense of joy and wonder.

As she walked away from the cave, she felt a sense of peace wash over her. She realized that sometimes, the things we miss the most are the simple things in life, like the beauty of nature or the warmth of the sun on our skin.

From that day forward, Lily made a habit of spending time in nature whenever she could. She would spend hours walking through the forest, watching the birds sing, and taking in the beauty around her. She knew that these moments were precious and that they would stay with her forever.

And so, Lily continued to live her life with a sense of joy and wonder, always cherishing the simple things in life.

User: please give this story a title
Qwen2-OpenVINO: "Nature's Magic: A Journey Through the Forest Crystal"
```
## Common problem
1. Do I need to install the OpenVINO C++ inference engine?
- Unnecessary
2. Do I have to use Intel hardware?
- We only tried it on Intel devices, and we recommend using x86 architecture Intel devices, including but not
limited to:
- Intel CPU, including personal computer CPU and server CPU.
- Intel's integrated GPU. For example: Arc™ Series and Iris® Series.
- Intel's discrete graphics card. For example: ARC™ A770 graphics card.
3. Why OpenVINO cannot find GPU device in my system?
- Ensure OpenCL diivess are installed correctly.
- Ensure you enabled the right permissions for GPU device
- More information can be found in [Install GPU drivers](https://github.com/openvinotoolkit/openvino_notebooks/wiki/Ubuntu#1-install-python-git-and-gpu-drivers-optional)
4. Whether support C++?
- Please refer to this [example](https://github.com/openvinotoolkit/openvino.genai/tree/master/text_generation/causal_lm/cpp)
116 changes: 116 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
简体中文 | [English](README.md)

# Qwen2.openvino Demo

这是如何使用 OpenVINO 部署 Qwen2 的示例

## 1. 环境配置

我们推荐您新建一个虚拟环境,然后按照以下安装依赖。
推荐在python3.10以上的环境下运行该示例。

Linux

```
python3 -m venv openvino_env
source openvino_env/bin/activate
python3 -m pip install --upgrade pip
pip install wheel setuptools
pip install -r requirements.txt
```

Windows Powershell

```
python3 -m venv openvino_env
.\openvino_env\Scripts\activate
python3 -m pip install --upgrade pip
pip install wheel setuptools
pip install -r requirements.txt
```

## 2. 转换模型

由于需要将Huggingface模型转换为OpenVINO IR模型,因此您需要下载模型并转换。

```
python3 convert.py --model_id Qwen/Qwen1.5-0.5B-Chat --precision int4 --output {your_path}/Qwen1.5-0.5B-Chat-ov
```

### 可以选择的参数

* `--model_id` - 用于从 Huggngface_hub (https://huggingface.co/models) 或 模型所在目录的路径(绝对路径)
* `--precision` - 模型精度:fp16, int8 或 int4。
* `--output` - 转换后模型保存的地址
* 如果您访问`huggingface` 有困难,你可以尝试使用 `mirror-hf` 进行下载

Linux
```
export HF_ENDPOINT=https://hf-mirror.com
```
Windows Powershell
```
$env:HF_ENDPOINT = "https://hf-mirror.com"
```
Download model
```
huggingface-cli download --resume-download --local-dir-use-symlinks False Qwen/Qwen1.5-0.5B-Chat --local-dir {your_path}/Qwen1.5-0.5B-Chat
```
## 3. 运行流式聊天机器人
```
python3 chat.py --model_path {your_path}/Qwen1.5-0.5B-Chat-ov --max_sequence_length 4096 --device CPU
```
### 可以选择的参数
* `--model_path` - OpenVINO IR 模型所在目录的路径。
* `--max_sequence_length` - 输出标记的最大大小。
* `--device` - 运行推理的设备。例如:"CPU","GPU"。
## 例子
```
====Starting conversation====
用户: 你好
Qwen2-OpenVINO: 你好!有什么我可以帮助你的吗?

用户: 你是谁?
Qwen2-OpenVINO: 我是来自阿里云的超大规模语言模型,我叫通义千问。

用户: 请给我讲一个故事
Qwen2-OpenVINO: 好的,这是一个关于一只小兔子和它的朋友的故事。

有一天,小兔子和他的朋友们决定去森林里探险。他们带上食物、水和一些工具,开始了他们的旅程。在旅途中,他们遇到了各种各样的动物,包括松鼠、狐狸、小鸟等等。他们一起玩耍、分享食物,还互相帮助解决问题。最后,他们在森林的深处找到了一个神秘的洞穴,里面藏着许多宝藏。他们带着所有的宝藏回到了家,庆祝这次愉快的冒险。

用户: 请为这个故事起个标题
Qwen2-OpenVINO: "小兔子与朋友们的冒险之旅"
```
## 常见问题
1. 需要安装 OpenVINO C++ 推理引擎吗
- 不需要
2. 一定要使用 Intel 的硬件吗?
- 我们仅在 Intel 设备上尝试,我们推荐使用x86架构的英特尔设备,包括但不限制于:
- 英特尔的CPU,包括个人电脑CPU 和服务器CPU。
- 英特尔的集成显卡。 例如:Arc™,Iris® 系列。
- 英特尔的独立显卡。例如:ARC™ A770 显卡。
3. 为什么OpenVINO没检测到我系统上的GPU设备?
- 确保OpenCL驱动是安装正确的。
- 确保你有足够的权限访问GPU设备
- 更多信息可以参考[Install GPU drivers](https://github.com/openvinotoolkit/openvino_notebooks/wiki/Ubuntu#1-install-python-git-and-gpu-drivers-optional)
4. 是否支持C++?
- C++示例可以[参考](https://github.com/openvinotoolkit/openvino.genai/tree/master/text_generation/causal_lm/cpp)
125 changes: 125 additions & 0 deletions chat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
import argparse
from typing import List, Tuple
from threading import Thread
import torch
import gradio as gr
from optimum.intel.openvino import OVModelForCausalLM
from transformers import (AutoTokenizer, AutoConfig,
TextIteratorStreamer, StoppingCriteriaList, StoppingCriteria)


class StopOnTokens(StoppingCriteria):
def __init__(self, token_ids):
self.token_ids = token_ids

def __call__(
self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs
) -> bool:
for stop_id in self.token_ids:
if input_ids[0][-1] == stop_id:
return True
return False


if __name__ == "__main__":
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument('-h',
'--help',
action='help',
help='Show this help message and exit.')
parser.add_argument('-m',
'--model_path',
required=True,
type=str,
help='Required. model path')
parser.add_argument('-l',
'--max_sequence_length',
default=256,
required=False,
type=int,
help='Required. maximun length of output')
parser.add_argument('-d',
'--device',
default='CPU',
required=False,
type=str,
help='Required. device for inference')
args = parser.parse_args()
model_dir = args.model_path

ov_config = {"PERFORMANCE_HINT": "LATENCY",
"NUM_STREAMS": "1", "CACHE_DIR": ""}

tokenizer = AutoTokenizer.from_pretrained(
model_dir, trust_remote_code=True)
print("====Compiling model====")
ov_model = OVModelForCausalLM.from_pretrained(
model_dir,
device=args.device,
ov_config=ov_config,
config=AutoConfig.from_pretrained(model_dir, trust_remote_code=True),
trust_remote_code=True,
)

streamer = TextIteratorStreamer(
tokenizer, timeout=60.0, skip_prompt=True, skip_special_tokens=True
)
stop_tokens = [151643, 151645]
stop_tokens = [StopOnTokens(stop_tokens)]

def convert_history_to_token(history: List[Tuple[str, str]]):

messages = []
for idx, (user_msg, model_msg) in enumerate(history):
if idx == len(history) - 1 and not model_msg:
messages.append({"role": "user", "content": user_msg})
break
if user_msg:
messages.append({"role": "user", "content": user_msg})
if model_msg:
messages.append({"role": "assistant", "content": model_msg})

model_inputs = tokenizer.apply_chat_template(messages,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt")
return model_inputs

history = []
print("====Starting conversation====")
while True:
input_text = input("用户: ")
if input_text.lower() == 'stop':
break

if input_text.lower() == 'clear':
history = []
print("AI助手: 对话历史已清空")
continue

print("Qwen2-OpenVINO:", end=" ")
history = history + [[input_text, ""]]
model_inputs = convert_history_to_token(history)
generate_kwargs = dict(
input_ids=model_inputs,
max_new_tokens=args.max_sequence_length,
temperature=0.1,
do_sample=True,
top_p=1.0,
top_k=50,
repetition_penalty=1.1,
streamer=streamer,
stopping_criteria=StoppingCriteriaList(stop_tokens),
pad_token_id=151645,
)

t1 = Thread(target=ov_model.generate, kwargs=generate_kwargs)
t1.start()

partial_text = ""
for new_text in streamer:
new_text = new_text
print(new_text, end="", flush=True)
partial_text += new_text
print("\n")
history[-1][1] = partial_text
Loading

0 comments on commit 9bfe4c6

Please sign in to comment.