Skip to content

Commit

Permalink
Merge pull request baidubce#49 from baidubce/qa_cookbook_update
Browse files Browse the repository at this point in the history
update question_answering.ipynb
  • Loading branch information
stonekim authored Nov 2, 2023
2 parents e5ff01d + eff6225 commit ed67e58
Show file tree
Hide file tree
Showing 4 changed files with 78 additions and 589 deletions.
21 changes: 16 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,23 @@ import qianfan

## 快速使用

在使用千帆 SDK 之前,用户需要在千帆平台上创建应用,以获得 API Key (**AK**) 和 Secret Key (**SK**)。AK 与 SK 是用户在调用千帆 SDK 时所需要的凭证。具体获取流程参见平台的[应用接入使用说明文档](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Slkkydake)
在使用千帆 SDK 之前,用户需要在千帆平台上创建应用,以获得 API Key (**AK**) 和 Secret Key (**SK**)。AK 与 SK 是用户在调用千帆 SDK 时所需要的凭证。具体获取流程参见平台的[应用接入使用说明文档](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Slkkydake)。在获得并配置了 AK 以及 SK 后,用户即可开始使用 SDK:

```python
import os
import qianfan

os.environ["QIANFAN_AK"]="..."
os.environ["QIANFAN_SK"]="..."

chat_comp = qianfan.ChatCompletion()
```:


## SDK 配置

千帆 SDK 内设了多种参数供用户设置,目前支持如下三种配置方式,按优先级从低到高排序:

获取到 AK 和 SK 后,用户还需要传递它们来初始化千帆 SDK。 千帆 SDK 支持如下三种传递方式,按优先级从低到高排序:
1. 从 DotEnv 文件中读取。参考配置文件以及参数类型[点此](https://github.com/baidubce/bce-qianfan-sdk/blob/main/dotenv_config_sample.env)。SDK 默认读取工作目录下的 `.env` 文件进行配置,用户可以在程序运行前设置环境变量 `QIANFAN_DOT_ENV_CONFIG_FILE` 来指定需要使用的配置文件。

2. 通过环境变量读取。可配置的参数与方式 1 相同。举个例子,在代码中,用户可以这么设置:
Expand Down Expand Up @@ -61,9 +75,6 @@ config.SK = "..."
chat_comp = qianfan.ChatCompletion()
```



## 功能

目前千帆 SDK 支持用户使用如下功能
Expand Down
Binary file not shown.
152 changes: 62 additions & 90 deletions cookbook/question_anwsering/question_answering.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -43,26 +43,27 @@
"outputs": [],
"source": [
"!pip install langchain\n",
"!pip install chromadb\n",
"!pip install qianfan"
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "981930ef-a0b4-46f9-b60b-a495117ea38e",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ['QIANFAN_AK'] = \"your_api_key\"\n",
"os.environ['QIANFAN_SK'] = \"your_secret_key\"\n",
"os.environ['QIANFAN_AK'] = \"your_ak\"\n",
"os.environ['QIANFAN_SK'] = \"your_sk\"\n",
"\n",
"# 此处为 Langsmith 相关功能开关。当且仅当你知道这是做什么用时,可删除注释并设置变量以使用 Langsmith 相关功能\n",
"# os.environ['LANGCHAIN_TRACING_V2'] = \"true\"\n",
"# os.environ['LANGCHAIN_ENDPOINT'] = \"https://api.smith.langchain.com\"\n",
"# os.environ['LANGCHAIN_API_KEY'] = \"your_langchian_api_key\"\n",
"# os.environ['LANGCHAIN_PROJECT'] = \"your_project_name\"\n",
"# os.environ['LANGCHAIN_API_KEY'] = \"LANGCHAIN_API_KEY\"\n",
"# os.environ['LANGCHAIN_PROJECT'] = \"LANGCHAIN_PROJECT\"\n",
"\n",
"\n",
"is_chinese = True\n",
Expand Down Expand Up @@ -107,12 +108,14 @@
"source": [
"## Step 1. Load\n",
"\n",
"指定一个 `DocumentLoader` 来把你指定的非结构化数据加载成 `Documents`。一个 `Document` 是文字(即 `page_content`)和与之相关的元数据的结合体"
"指定一个 `DocumentLoader` 来把你指定的非结构化数据加载成 `Documents`。一个 `Document` 是文字(即 `page_content`)和与之相关的元数据的结合体\n",
"\n",
"此处我们使用 `WebBaseLoader` ,从网页中加载一个 `Documents`。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "cf4d5c72",
"metadata": {},
"outputs": [],
Expand All @@ -123,6 +126,31 @@
"data = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "f3fa0645",
"metadata": {},
"source": [
"Langchain 内提供了非常多样的 Loader ,辅助用户从不同来源读取数据。这些 Loader 都声明在 `langchain.document_loaders` 包中。\n",
"\n",
"对于我们的中文示例,我们还提供了一种从 PDF 读取 `Document` 的演示样例:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c079033",
"metadata": {},
"outputs": [],
"source": [
"! pip install pdfplumber\n",
"from langchain.document_loaders import PDFPlumberLoader\n",
"\n",
"if is_chinese:\n",
" loader = PDFPlumberLoader(\"example_data/中国古代史-明朝.pdf\")\n",
" data = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "fd2cc9a7",
Expand All @@ -135,7 +163,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"id": "4b11c01d",
"metadata": {},
"outputs": [],
Expand All @@ -158,21 +186,10 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"id": "e9c302c8",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[INFO] [09-15 20:50:50] logging.py:55 [t:8485264192]: trying to refresh access_token\n",
"[INFO] [09-15 20:50:50] logging.py:55 [t:8485264192]: sucessfully refresh access_token\n",
"[INFO] [09-15 20:50:50] logging.py:55 [t:8485264192]: requesting llm api endpoint: /embeddings/embedding-v1\n",
"[INFO] [09-15 20:50:51] logging.py:55 [t:8485264192]: requesting llm api endpoint: /embeddings/embedding-v1\n"
]
}
],
"outputs": [],
"source": [
"from langchain.embeddings import QianfanEmbeddingsEndpoint\n",
"from langchain.vectorstores import Chroma\n",
Expand Down Expand Up @@ -224,11 +241,14 @@
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.chat_models import QianfanChatEndpoint\n",
"from langchain.prompts import PromptTemplate\n",
"\n",
"QA_CHAIN_PROMPT = PromptTemplate.from_template(CUSTOM_PROMPT_TEMPLATE)\n",
"\n",
"llm = QianfanChatEndpoint()\n",
"retriever=vectorstore.as_retriever(search_type=\"similarity_score_threshold\", search_kwargs={'score_threshold': 0.0})\n",
" \n",
"qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever)\n",
"qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT})\n",
"qa_chain({\"query\": QUESTION1})"
]
},
Expand All @@ -237,73 +257,48 @@
"id": "f7d52c84",
"metadata": {},
"source": [
"注意,此处不光可以传入 `ChatModel` ,也可以传入一个 `LLM` 对象到 `RetrievalQA` 中"
]
},
{
"cell_type": "markdown",
"id": "fa82f437",
"metadata": {},
"source": [
"#### 自定义 prompt\n",
"在上面的执行中,我们使用了千帆平台上提供的大模型调用,成功就一个问进行了问答。\n",
"\n",
"此外,`RetrievalQA` 链中使用的 `prompt` 参数也是可以定制的。由于 Langchain `RetrievalQA` 链中默认提供的 prompt 是用英语编写的,所以此处我们替换为了我们手动实现的中文 prompt,针对中文语境进行优化。\n",
"\n",
"`RetrievalQA` 链中使用的 `prompt` 参数也是可以定制的。由于 `RetrievalQA` 链中默认提供的 prompt 是用英语编写的,此处我们可以替换为我们手动实现的中文 prompt"
"#### 使用不同的大模型\n",
"\n",
"\n",
"除了使用默认的模型,即 `ERNIE-Bot-turbo` 以外,用户还可以设置上面 `QianfanChatEndpoint` 的 `model` 参数,来指定使用不同的大模型。例如我们想使用 ERNIE-Bot-4 模型时,就可以这么设置:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e4fee704",
"id": "05f194c9",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"\n",
"QA_CHAIN_PROMPT = PromptTemplate.from_template(CUSTOM_PROMPT_TEMPLATE)\n",
"llm = QianfanChatEndpoint(model=\"ERNIE-Bot-4\")\n",
"\n",
"qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT})\n",
"result = qa_chain({\"query\": QUESTION1})\n",
"result[\"result\"]"
"qa_chain({\"query\": QUESTION1})"
]
},
{
"cell_type": "markdown",
"id": "c825e9bf-6a56-46e4-8bbb-05441f76cb96",
"id": "214cfd40",
"metadata": {},
"source": [
"我们还可以选择把数据存储到千帆 Smith 的 Prompt Hub 上。需要搭配使用使用千帆 Smith SDK。\n",
"\n",
"一个例子如下所示"
"或者如果你已经在千帆平台上购买了资源并部署了自己的大模型服务,千帆 Langchain 组件还提供了 `endpoint` 参数,让你能够在 Langchian 中调用自己微调的大模型。有条件的用户可以取消注释并修改下列代码进行体验。"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a896060f-ebc4-4236-a4ad-32960601c6e8",
"id": "b8456042",
"metadata": {},
"outputs": [],
"source": [
"# !pip install qianfansmithhub"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aef8e734-ba54-48ae-b959-1898618f2d90",
"metadata": {},
"outputs": [],
"source": [
"# RAG prompt\n",
"# from qianfansmithhub import hub\n",
"# QA_CHAIN_PROMPT_HUB = hub.pull(\"rlm/rag-prompt\")\n",
"\n",
"# qa_chain = RetrievalQA.from_chain_type(\n",
"# llm,\n",
"# retriever=vectorstore.as_retriever(),\n",
"# chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT_HUB}\n",
"# )\n",
"# result = qa_chain({\"query\": question})\n",
"# result[\"result\"]"
"# llm = QianfanChatEndpoint(endpoint=\"your_service_endpoint\")\n",
"\n",
"# qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT})\n",
"# qa_chain({\"query\": QUESTION1})"
]
},
{
Expand All @@ -325,35 +320,12 @@
"source": [
"from langchain.chains import RetrievalQA\n",
"\n",
"qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, return_source_documents=True)\n",
"qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, chain_type_kwargs={\"prompt\": QA_CHAIN_PROMPT}, return_source_documents=True)\n",
"result = qa_chain({\"query\": QUESTION1})\n",
"len(result['source_documents'])\n",
"result['source_documents']"
]
},
{
"cell_type": "markdown",
"id": "1b600236",
"metadata": {},
"source": [
"#### 返回引用\n",
"\n",
"或者使用 `RetrievalQAWithSourcesChain` 来构造 Chain,以支持在返回的结果内包含结果的引用源。"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "948f6d19",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQAWithSourcesChain\n",
"\n",
"qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=retriever)\n",
"qa_chain({\"question\": QUESTION1})"
]
},
{
"cell_type": "markdown",
"id": "4380e478-e8ae-404b-9577-6b15475a6562",
Expand All @@ -375,7 +347,7 @@
"from langchain.chains import ConversationalRetrievalChain\n",
"\n",
"memory = ConversationSummaryMemory(llm=llm,memory_key=\"chat_history\",return_messages=True)\n",
"qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)\n",
"qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory, combine_docs_chain_kwargs={\"prompt\": QA_CHAIN_PROMPT})\n",
"qa(QUESTION1)"
]
},
Expand Down Expand Up @@ -406,7 +378,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.13"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit ed67e58

Please sign in to comment.