Add BabyAGI bootcamp (#334)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
zilliztech · May 11, 2023 · c9b2dc8 · c9b2dc8
1 parent d28f85d
commit c9b2dc8
Show file tree

Hide file tree

Showing 2 changed files with 362 additions and 0 deletions.
diff --git a/docs/bootcamp/langchain/baby_agi.ipynb b/docs/bootcamp/langchain/baby_agi.ipynb
@@ -0,0 +1,361 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "517a9fd4",
+   "metadata": {},
+   "source": [
+    "# BabyAGI User Guide\n",
+    "\n",
+    "This notebook demonstrates how to implement [BabyAGI](https://github.com/yoheinakajima/babyagi/tree/main) by [Yohei Nakajima](https://twitter.com/yoheinakajima). BabyAGI is an AI agent that can generate and pretend to execute tasks based on a given objective, and you can find the origin notebook in [LangChain example](https://github.com/hwchase17/langchain/blob/master/docs/use_cases/autonomous_agents/baby_agi.ipynb).\n",
+    "\n",
+    "This guide will help you understand the components to create your own recursive agents, And it will also show how to use GPTCache to cache the response. You can also try this example on [Google Colab](https://colab.research.google.com/drive/1WTvWIujioZtpwwVz7GGDzYhooul-rBeG?usp=sharing).\n",
+    "\n",
+    "Although BabyAGI uses specific vectorstores/model providers (Pinecone, OpenAI), one of the benefits of implementing it with LangChain is that you can easily swap those out for different options. In this implementation we use a Milvus vector datavase (because it runs locally and is free).\n",
+    "\n",
+    " "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47ef6fe7",
+   "metadata": {},
+   "source": [
+    "## Go into GPTCache\n",
+    "\n",
+    "Please [install gptcache](https://gptcache.readthedocs.io/en/latest/index.html#) first, then we can initialize the cache.There are two ways to initialize the cache, the first is to use the map cache (exact match cache) and the second is to use the DataBse cache (similar search cache), it is more recommended to use the second one, but you have to install the related requirements.\n",
+    "\n",
+    "Before running the example, make sure the `OPENAI_API_KEY` environment variable is set by executing `echo $OPENAI_API_KEY`. If it is not already set, it can be set by using `export OPENAI_API_KEY=YOUR_API_KEY` on Unix/Linux/MacOS systems or `set OPENAI_API_KEY=YOUR_API_KEY` on Windows systems. And there is `get_content_func` for the cache settings:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "9cbf1dbe",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# get the content(only question) form the prompt to cache\n",
+    "def get_content_func(data, **_):\n",
+    "    return data.get(\"prompt\").split(\"Question\")[-1]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93a8cb06",
+   "metadata": {},
+   "source": [
+    "### 1. Init for exact match cache"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "daebf19a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# from gptcache import cache\n",
+    "# cache.init(pre_embedding_func=get_content_func)\n",
+    "# cache.set_openai_key()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1f328a8",
+   "metadata": {},
+   "source": [
+    "### 2. Init for similar match cache\n",
+    "\n",
+    "When initializing gptcahe, the following four parameters are configured:\n",
+    "\n",
+    "- `pre_embedding_func`: pre-processing before extracting feature vectors, it will use the `get_content_func` method\n",
+    "- `embedding_func`: the method to extract the text feature vector\n",
+    "- `data_manager`: DataManager for cache management\n",
+    "- `similarity_evaluation`: the evaluation method after the cache hit\n",
+    "\n",
+    "The `data_manager` is used to audio feature vector, response text in the example, it takes [Milvus](https://milvus.io/docs) (please make sure it is started), you can also configure other vector storage, refer to [VectorBase API](https://gptcache.readthedocs.io/en/latest/references/manager.html#module-gptcache.manager.vector_data)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "23197b36",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from gptcache import cache\n",
+    "from gptcache.embedding import Onnx\n",
+    "from gptcache.manager import CacheBase, VectorBase, get_data_manager\n",
+    "from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation\n",
+    "\n",
+    "\n",
+    "onnx = Onnx()\n",
+    "cache_base = CacheBase('sqlite')\n",
+    "vector_base = VectorBase('milvus', host='127.0.0.1', port='19530', dimension=onnx.dimension)\n",
+    "data_manager = get_data_manager(cache_base, vector_base)\n",
+    "cache.init(\n",
+    "    pre_embedding_func=get_content_func,\n",
+    "    embedding_func=onnx.to_embeddings,\n",
+    "    data_manager=data_manager,\n",
+    "    similarity_evaluation=SearchDistanceEvaluation(),\n",
+    "    )\n",
+    "cache.set_openai_key()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2092f7e6",
+   "metadata": {},
+   "source": [
+    "After initializing the cache, you can use the LangChain LLMs with `gptcache.adapter.langchain_models`. At this point **gptcache** will cache the answer, the only difference from the original example is to change `llm = OpenAI(temperature=0)` to `llm = LangChainLLMs(llm=OpenAI(temperature=0))`, which will be commented in the code block. And you can also set the `session` to set the session settings with `llm = LangChainLLMs(llm=OpenAI(temperature=0), session=session)`, more details refer to [session example](https://github.com/zilliztech/GPTCache/tree/main/examples#how-to-run-with-session).\n",
+    "\n",
+    "Then you will find that it will be more fast when search the similar content, let's play with it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "556af556",
+   "metadata": {},
+   "source": [
+    "## Install and Import Required Modules"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "c8a354b6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from collections import deque\n",
+    "from typing import Dict, List, Optional, Any\n",
+    "\n",
+    "from langchain import LLMChain, OpenAI, PromptTemplate\n",
+    "from langchain.embeddings import OpenAIEmbeddings\n",
+    "from langchain.llms import BaseLLM\n",
+    "from langchain.vectorstores.base import VectorStore\n",
+    "from pydantic import BaseModel, Field\n",
+    "from langchain.chains.base import Chain\n",
+    "from langchain.experimental import BabyAGI\n",
+    "\n",
+    "from gptcache.adapter.langchain_models import LangChainLLMs\n",
+    "from gptcache.session import Session\n",
+    "session = Session(name=\"baby_agi\") # set session for LangChainLLMs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09f70772",
+   "metadata": {},
+   "source": [
+    "## Connect to the Vector Store\n",
+    "\n",
+    "Depending on what vectorstore you use, this step may look different."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "794045d4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# from langchain.vectorstores import FAISS\n",
+    "from langchain.vectorstores import Milvus\n",
+    "from langchain.docstore import InMemoryDocstore"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "6e0305eb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define your embedding model\n",
+    "embeddings_model = OpenAIEmbeddings()\n",
+    "embedding_size = 1536\n",
+    "vectorstore = Milvus(embeddings_model,\n",
+    "                     collection_name=\"baby_agi\",\n",
+    "                     connection_args={\"host\": \"127.0.0.1\", \"port\": \"19530\"},\n",
+    "                    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05ba762e",
+   "metadata": {},
+   "source": [
+    "### Run the BabyAGI\n",
+    "\n",
+    "Now it's time to create the BabyAGI controller and watch it try to accomplish your objective."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "3d220b69",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "OBJECTIVE = \"Write a weather report for SF today\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "8a8e5543",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# llm = OpenAI(temperature=0)\n",
+    "llm = LangChainLLMs(llm=OpenAI(temperature=0), session=session)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "3d69899b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Logging of LLMChains\n",
+    "verbose = False\n",
+    "# If None, will keep on going forever\n",
+    "max_iterations: Optional[int] = 3\n",
+    "baby_agi = BabyAGI.from_llm(\n",
+    "    llm=llm, vectorstore=vectorstore, verbose=verbose, max_iterations=max_iterations\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "f7957b51",
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[95m\u001b[1m\n",
+      "*****TASK LIST*****\n",
+      "\u001b[0m\u001b[0m\n",
+      "1: Make a todo list\n",
+      "\u001b[92m\u001b[1m\n",
+      "*****NEXT TASK*****\n",
+      "\u001b[0m\u001b[0m\n",
+      "1: Make a todo list\n",
+      "\u001b[93m\u001b[1m\n",
+      "*****TASK RESULT*****\n",
+      "\u001b[0m\u001b[0m\n",
+      "\n",
+      "\n",
+      "1. Check the weather forecast for San Francisco today\n",
+      "2. Make note of the temperature, humidity, wind speed, and other relevant weather conditions\n",
+      "3. Write a weather report summarizing the forecast\n",
+      "4. Check for any weather alerts or warnings\n",
+      "5. Share the report with the relevant stakeholders\n",
+      "\u001b[95m\u001b[1m\n",
+      "*****TASK LIST*****\n",
+      "\u001b[0m\u001b[0m\n",
+      "2: Check the current temperature in San Francisco\n",
+      "3: Check the current humidity in San Francisco\n",
+      "4: Check the current wind speed in San Francisco\n",
+      "5: Check for any weather alerts or warnings in San Francisco\n",
+      "6: Check the forecast for the next 24 hours in San Francisco\n",
+      "7: Check the forecast for the next 48 hours in San Francisco\n",
+      "8: Check the forecast for the next 72 hours in San Francisco\n",
+      "9: Check the forecast for the next week in San Francisco\n",
+      "10: Check the forecast for the next month in San Francisco\n",
+      "11: Check the forecast for the next 3 months in San Francisco\n",
+      "1: Write a weather report for SF today\n",
+      "\u001b[92m\u001b[1m\n",
+      "*****NEXT TASK*****\n",
+      "\u001b[0m\u001b[0m\n",
+      "2: Check the current temperature in San Francisco\n",
+      "\u001b[93m\u001b[1m\n",
+      "*****TASK RESULT*****\n",
+      "\u001b[0m\u001b[0m\n",
+      "\n",
+      "\n",
+      "I will check the current temperature in San Francisco. I will use an online weather service to get the most up-to-date information.\n",
+      "\u001b[95m\u001b[1m\n",
+      "*****TASK LIST*****\n",
+      "\u001b[0m\u001b[0m\n",
+      "3: Check the current UV index in San Francisco.\n",
+      "4: Check the current air quality in San Francisco.\n",
+      "5: Check the current precipitation levels in San Francisco.\n",
+      "6: Check the current cloud cover in San Francisco.\n",
+      "7: Check the current barometric pressure in San Francisco.\n",
+      "8: Check the current dew point in San Francisco.\n",
+      "9: Check the current wind direction in San Francisco.\n",
+      "10: Check the current humidity levels in San Francisco.\n",
+      "1: Check the current temperature in San Francisco to the average temperature for this time of year.\n",
+      "2: Check the current visibility in San Francisco.\n",
+      "11: Write a weather report for SF today.\n",
+      "\u001b[92m\u001b[1m\n",
+      "*****NEXT TASK*****\n",
+      "\u001b[0m\u001b[0m\n",
+      "3: Check the current UV index in San Francisco.\n",
+      "\u001b[93m\u001b[1m\n",
+      "*****TASK RESULT*****\n",
+      "\u001b[0m\u001b[0m\n",
+      "\n",
+      "\n",
+      "I have checked the current UV index in San Francisco and it is currently at a moderate level of 5. This means that it is safe to be outside for short periods of time without sunscreen, but it is still recommended to wear sunscreen and protective clothing when outside for extended periods of time.\n",
+      "\u001b[91m\u001b[1m\n",
+      "*****TASK ENDING*****\n",
+      "\u001b[0m\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'objective': 'Write a weather report for SF today'}"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "baby_agi({\"objective\": OBJECTIVE})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "898a210b",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/toc.bak b/docs/toc.bak
@@ -16,6 +16,7 @@
    bootcamp/langchain/qa_generation
    bootcamp/langchain/question_answering
    bootcamp/langchain/sqlite
+   bootcamp/langchain/baby_agi
    bootcamp/openai/chat
    bootcamp/openai/language_translate
    bootcamp/openai/sql_translate