{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "ijGzTHJJUCPY" }, "outputs": [], "source": [ "# Copyright 2023 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "VEqbX8OhE8y9" }, "source": [ "# Getting Started with the Vertex AI PaLM API & Python SDK\n", "\n", "\n", " \n", " \n", " \n", "
\n", " \n", " \"Colab Run in Colab\n", " \n", " \n", " \n", " \"GitHub\n", " View on GitHub\n", " \n", " \n", " \n", " \"Vertex\n", " Open in Vertex AI Workbench\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "VK1Q5ZYdVL4Y" }, "source": [ "## Overview\n", "\n", "### What are LLMs?\n", "Large language models (LLMs) are deep learning models trained on massive datasets of text. LLMs can translate language, summarize text, generate creative writing, generate code, power chatbots and virtual assistants, and complement search engines and recommendation systems. \n", "\n", "### PaLM\n", "Following its predecessor, [PaLM](https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html), [PaLM 2](https://ai.google/discover/palm2) is Google's next generation large language model that builds on Google’s legacy of breakthrough research in machine learning and responsible AI. PaLM 2 excels at tasks like advanced reasoning, translation, and code generation because of how it was built. \n", "\n", "PaLM 2 [excels](https://ai.google/static/documents/palm2techreport.pdf) at advanced reasoning tasks, including code and math, classification and question answering, translation and multilingual proficiency, and natural language generation better than our previous state-of-the-art LLMs, including PaLM. It can accomplish these tasks because of the way it was built – bringing together compute-optimal scaling, an improved dataset mixture, and model architecture improvements.\n", "\n", "PaLM 2 is grounded in Google’s approach to building and deploying AI responsibly. It was evaluated rigorously for its potential harms and biases, capabilities and downstream uses in research and in-product applications. It’s being used in other state-of-the-art models, like Med-PaLM 2 and Sec-PaLM, and is powering generative AI features and tools at Google, like Bard and the PaLM API.\n", "\n", "PaLM is pre-trained on a wide range of text data using an unsupervised learning approach, without any specific task. During this pre-training process, PaLM learns to predict the next word in a sentence, given the preceding words. This enables the model to generate coherent, fluent text resembling human writing.\n", "This large size enables it to learn complex patterns and relationships in language and generate high-quality text for various applications. This is why models like PaLM are referred to as \"foundational models.\"\n", "\n", "Creating an LLM requires massive amounts of data, significant compute resources, and specialized skills. Because LLMs require a big investment to create, they target broad rather than specific use cases. On Vertex AI, you can customize a foundation model for more specific tasks or knowledge domains by using prompt design and model tuning.\n", "\n", "### Vertex AI PaLM API\n", "The Vertex AI PaLM API, [released on May 10, 2023](https://cloud.google.com/vertex-ai/docs/generative-ai/release-notes#may_10_2023), is powered by [PaLM 2](https://ai.google/discover/palm2).\n", "\n", "### Using Vertex AI PaLM API\n", "\n", "You can interact with the Vertex AI PaLM API using the following methods:\n", "\n", "* Use the [Generative AI Studio](https://cloud.google.com/generative-ai-studio) for quick testing and command generation.\n", "* Use cURL commands in Cloud Shell.\n", "* Use the Python SDK in a Jupyter notebook\n", "\n", "This notebook focuses on using the Python SDK to call the Vertex AI PaLM API. For more information on using Generative AI Studio without writing code, you can explore [Getting Started with the UI instructions](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/getting-started/getting_started_ui.md)\n", "\n", "\n", "For more information, check out the [documentation on generative AI support for Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview)." ] }, { "cell_type": "markdown", "metadata": { "id": "RQT500QqVPIb" }, "source": [ "### Objectives\n", "\n", "In this tutorial, you will learn how to use PaLM API with the Python SDK and explore its various parameters.\n", "\n", "By the end of the notebook, you should be able to understand various nuances of generative model parameters like `temperature`, `top_k`, `top_p`, and how each parameter affects the results.\n", "\n", "The steps performed include:\n", "\n", "- Installing the Python SDK \n", "- Using Vertex AI PaLM API\n", " - Text generation model with `text-bison@001`\n", " - Understanding model parameters (`temperature`, `max_output_token`, `top_k`, `top_p`)\n", " - Chat model with `chat-bison@001`\n", " - Embeddings model with `textembedding-gecko@001`\n", " " ] }, { "cell_type": "markdown", "metadata": { "id": "1y6_3dTwV2fI" }, "source": [ "### Costs\n", "This tutorial uses billable components of Google Cloud:\n", "\n", "* Vertex AI Generative AI Studio\n", "\n", "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing),\n", "and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n", "to generate a cost estimate based on your projected usage." ] }, { "cell_type": "markdown", "metadata": { "id": "6ae098456471" }, "source": [ "### Data security\n", "**Q: Does Google use customer data to improve its foundation models?** \n", "A: No, Google does not use customer data to improve foundation models. Customer data is only used to generate a response from the model.\n", "\n", "**Q: Do Google employees see data that I submit to the model?** \n", "A: No, Google employees have no access to customer data and all data is encrypted in-transit, in-use, and at-rest. \n", "\n", "**Q: Does Google store any of the customer data that is sent to the model?** \n", "A: No, Google does not store customer data. However, Google may temporarily cache customer data for the duration of the request, such as prompt tuning pipeline and batch prediction. \n", "\n", "**Q: Does Google log data?** \n", "A: No, Google does not log customer data. System-level logs help Google ensure system health and availability." ] }, { "cell_type": "markdown", "metadata": { "id": "fc389a25bf64" }, "source": [ "### Responsible AI\n", "Large language models (LLMs) can translate language, summarize text, generate creative writing, generate code, power chatbots and virtual assistants, and complement search engines and recommendation systems. At the same time, as an early-stage technology, its evolving capabilities and uses create potential for misapplication, misuse, and unintended or unforeseen consequences. Large language models can generate output that you don't expect, including text that's offensive, insensitive, or factually incorrect.\n", "\n", "What's more, the incredible versatility of LLMs is also what makes it difficult to predict exactly what kinds of unintended or unforeseen outputs they might produce. Given these risks and complexities, the PaLM API is designed with [Google's AI Principles](https://ai.google/principles/) in mind. However, it is important for developers to understand and test their models to deploy safely and responsibly. To aid developers, the Generative AI Studio has built-in content filtering, and the PaLM API has safety attribute scoring to help customers test Google's safety filters and define confidence thresholds that are right for their use case and business. Please refer to the [Safety filters and attributes](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/responsible-ai#safety_filters_and_attributes) section to learn more.\n", "\n", "When the PaLM API is integrated into a customer's unique use case and context, additional responsible AI considerations and [PaLM limitations](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/responsible-ai#palm_limitations) may need to be considered. We encourage customers to leverage fairness, interpretability, privacy and security [recommended practices](https://ai.google/responsibilities/responsible-ai-practices/)." ] }, { "cell_type": "markdown", "metadata": { "id": "QDU0XJ1xRDlL" }, "source": [ "## Getting Started" ] }, { "cell_type": "markdown", "metadata": { "id": "N5afkyDMSBW5" }, "source": [ "### Install Vertex AI SDK" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kc4WxYmLSBW5" }, "outputs": [], "source": [ "!pip install google-cloud-aiplatform --upgrade --user" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YmY9HVVGSBW5" }, "outputs": [], "source": [ "# # Automatically restart kernel after installs so that your environment can access the new packages\n", "# import IPython\n", "\n", "# app = IPython.Application.instance()\n", "# app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": { "id": "6Fom0ZkMSBW6" }, "source": [ "### Authenticating your notebook environment\n", "* If you are using **Colab** to run this notebook, uncomment the cell below and continue.\n", "* If you are using **Vertex AI Workbench**, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "LCaCx6PLSBW6" }, "outputs": [], "source": [ "# from google.colab import auth\n", "# auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "GckO4EysV5BT" }, "source": [ "## Vertex AI PaLM API models" ] }, { "cell_type": "markdown", "metadata": { "id": "BDYqwDmTLgEy" }, "source": [ "The Vertex AI PaLM API enables you to test, customize, and deploy instances of Google’s large language models (LLM) called as PaLM, so that you can leverage the capabilities of PaLM in your applications.\n", "\n", "### Model naming scheme\n", "Foundation model names have three components: use case, model size, and version number. The naming convention is in the format: \n", "`-@`\n", "\n", "For example, text-bison@001 represents the Bison text model, version 001.\n", "\n", "The model sizes are as follows:\n", "- **Bison**: The best value in terms of capability and cost.\n", "- **Gecko**: The smallest and cheapest model for simple tasks.\n", "\n", "### Available models\n", "\n", "The Vertex AI PaLM API currently supports three models:\n", "\n", "* `text-bison@001` : Fine-tuned to follow natural language instructions and is suitable for a variety of language tasks.\n", "* `chat-bison@001` : Fine-tuned for multi-turn conversation use cases like building a chatbot.\n", "* `textembedding-gecko@001` : Returns model embeddings for text inputs.\n", "\n", "You can find more information about the properties of these [foundational models in the Generative AI Studio documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#foundation_models).\n" ] }, { "cell_type": "markdown", "metadata": { "id": "BuQwwRiniVFG" }, "source": [ "### Import libraries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Colab only:** Uncomment the following cell to initialize the Vertex AI SDK. For Vertex AI Workbench, you don't need to run this. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# import vertexai\n", "\n", "# PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n", "# vertexai.init(project=PROJECT_ID, location=\"us-central1\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4zjV4alsiVql" }, "outputs": [], "source": [ "import pandas as pd\n", "import seaborn as sns\n", "from IPython.display import Markdown, display\n", "from sklearn.metrics.pairwise import cosine_similarity\n", "from vertexai.preview.language_models import (ChatModel, InputOutputTextPair,\n", " TextEmbeddingModel,\n", " TextGenerationModel)" ] }, { "cell_type": "markdown", "metadata": { "id": "_mU6EZEhakVu" }, "source": [ "## Text generation with `text-bison@001`\n", "\n", "The text generation model from PaLM API that you will use in this notebook is `text-bison@001`.\n", "It is fine-tuned to follow natural language instructions and is suitable for a variety of language tasks, such as:\n", "\n", "- Classification\n", "- Sentiment analysis\n", "- Entity extraction\n", "- Extractive question-answering\n", "- Summarization\n", "- Re-writing text in a different style\n", "- Ad copy generation\n", "- Concept ideation\n", "- Concept simplification" ] }, { "cell_type": "markdown", "metadata": { "id": "4437b7608c8e" }, "source": [ "#### Load model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2998506fe6d1" }, "outputs": [], "source": [ "generation_model = TextGenerationModel.from_pretrained(\"text-bison@001\")" ] }, { "cell_type": "markdown", "metadata": { "id": "7a5d006f3813" }, "source": [ "#### Prompt design\n", "Prompt design is the process of creating prompts that elicit the desired response from a language model. Prompt design is an important part of using language models because it allows non-specialists to control the output of the model with minimal overhead. By carefully crafting the prompts, you can nudge the model to generate a desired result. Prompt design can be an efficient way to experiment with adapting an LLM for a specific use case. The iterative process of repeatedly updating prompts and assessing the model’s responses is sometimes called prompt engineering." ] }, { "cell_type": "markdown", "metadata": { "id": "kEAJ0ipmbndQ" }, "source": [ "#### Hello PaLM" ] }, { "cell_type": "markdown", "metadata": { "id": "tCgBDJvNRCF5" }, "source": [ "Create your first prompt and send it to the text generation model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cx_o455SRCF5" }, "outputs": [], "source": [ "prompt = \"What is a large language model?\"\n", "\n", "response = generation_model.predict(prompt=prompt)\n", "\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Try out your own prompt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- What are the top 10 trends in the tech industry?\n", "- What are the biggest challenges facing the healthcare industry?\n", "- What are the latest developments in the automotive industry?\n", "- What are the biggest opportunities in the retail industry?\n", "- (Try your own prompts!)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "prompt = \"What are the top 10 trends in the tech industry?\" # try your own prompt\n", "\n", "response = generation_model.predict(prompt=prompt)\n", "\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "EsglQtgDRCF5" }, "source": [ "#### Prompt templates" ] }, { "cell_type": "markdown", "metadata": { "id": "9BP1BKWiRCF6" }, "source": [ "Prompt templates are useful if you have found a good way to structure your prompt that you can re-use. This can be also be helpful in limiting the open-endedness of freeform prompts. There are many ways to implement prompt templates, and below is just one example using f-strings." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2USfPyOuFhlB" }, "outputs": [], "source": [ "my_industry = \"tech\" # try changing this to a different industry\n", "\n", "response = generation_model.predict(\n", " prompt=f\"What are the top 10 trends in the {my_industry} industry?\"\n", ")\n", "\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "m65AyLt8yvdB" }, "source": [ "### Model parameters for `text-bison@001`" ] }, { "cell_type": "markdown", "metadata": { "id": "vQuZh6GT0Yn4" }, "source": [ "You can customize how the PaLM API behaves in response to your prompt by using the following parameters for `text-bison@001`:\n", "\n", " - `temperature`: higher means more \"creative\" responses\n", " - `max_output_tokens`: sets the max number of tokens in the output\n", " - `top_p`: higher means it will pull from more possible next tokens, based on cumulative probability\n", " - `top_k`: higher means it will sample from more possible next tokens\n", " \n", "The section below covers each parameter and how to use them." ] }, { "cell_type": "markdown", "metadata": { "id": "JF76AKzaF2IP" }, "source": [ "#### The `temperature` parameter (range: 0.0 - 1.0, default 0)\n", "\n", "##### What is _temperature_?\n", "The temperature is used for sampling during the response generation, which occurs when top_p and top_k are applied. Temperature controls the degree of randomness in token selection.\n", "\n", "##### How does _temperature_ affect the response?\n", "Lower temperatures are good for prompts that require a more deterministic and less open-ended response. In comparison, higher temperatures can lead to more \"creative\" or diverse results. A temperature of `0` is deterministic: the highest probability response is always selected. For most use cases, try starting with a temperature of `0.2`.\n", "\n", "A higher temperature value will result in a more exploratative output, with a higher likelihood of generating rare or unusual words or phrases. Conversely, a lower temperature value will result in a more conservative output, with a higher likelihood of generating common or expected words or phrases.\n", "\n", "##### Example:\n", "\n", "For example,\n", "\n", "`temperature = 0.0`:\n", "\n", "* _The cat sat on the couch, watching the birds outside._\n", "* _The cat sat on the windowsill, basking in the sun._\n", "\n", "`temperature = 0.9`:\n", "\n", "* _The cat sat on the moon, meowing at the stars._\n", "* _The cat sat on the cheeseburger, purring with delight._\n", "\n", "**Note**: It's important to note that while the temperature parameter can help generate more diverse and interesting text, it can also increase the likelihood of generating nonsensical or inappropriate text (i.e. hallucinations). Therefore, it's important to use it carefully and with consideration for the desired outcome.\n", "\n", "For more information on the `temperature` parameter for text models, please refer to the [documentation on model parameters](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#text_model_parameters)." ] }, { "cell_type": "markdown", "metadata": { "id": "aMEz2P18SBW-" }, "source": [ "If you run the following cell multiple times, it should always return the same response, as `temperature=0` is deterministic." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cxPM1A8uR81l" }, "outputs": [], "source": [ "temp_val = 0.0\n", "prompt_temperature = \"Complete the sentence: As I prepared the picture frame, I reached into my toolkit to fetch my:\"\n", "\n", "response = generation_model.predict(\n", " prompt=prompt_temperature,\n", " temperature=temp_val,\n", ")\n", "\n", "print(f\"[temperature = {temp_val}]\")\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "5B91rIFiSBW_" }, "source": [ "If you run the following cell multiple times, it may return different responses, as higher temperature values can lead to more diverse results, even though the prompt is the same as the above cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "z2mKiDB5SBW_" }, "outputs": [], "source": [ "temp_val = 1.0\n", "\n", "response = generation_model.predict(\n", " prompt=prompt_temperature,\n", " temperature=temp_val,\n", ")\n", "\n", "print(f\"[temperature = {temp_val}]\")\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "JRYTOKQpGpfP" }, "source": [ "#### The `max_output_tokens` parameter (range: 1 - 1024, default 128)\n", "\n", "##### Tokens\n", "A single token may be smaller than a word. For example, a token is approximately four characters. So 100 tokens correspond to roughly 60-80 words. It's essential to be aware of the token sizes as models have a limit on input and output tokens.\n", "\n", "##### What is _max_output_tokens_?\n", "`max_output_tokens` is the maximum number of tokens that can be generated in the response.\n", "\n", "##### How does _max_output_tokens_ affect the response?\n", "\n", "Specify a lower value for shorter responses and a higher value for longer responses. A token may be smaller than a word. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.\n", "\n", "For more information on the `max_output_tokens` parameter for text models, please refer to the [documentation on model parameters](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#text_model_parameters)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZUjANX_mNLuI" }, "outputs": [], "source": [ "max_output_tokens_val = 5\n", "\n", "response = generation_model.predict(\n", " prompt=\"List ten ways that generative AI can help improve the online shopping experience for users\",\n", " max_output_tokens=max_output_tokens_val,\n", ")\n", "\n", "print(f\"[max_output_tokens = {max_output_tokens_val}]\")\n", "print(response.text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9DiHeUgSNX1m" }, "outputs": [], "source": [ "max_output_tokens_val = 500\n", "\n", "response = generation_model.predict(\n", " prompt=\"List ten ways that generative AI can help improve the online shopping experience for users\",\n", " max_output_tokens=max_output_tokens_val,\n", ")\n", "\n", "print(f\"[max_output_tokens = {max_output_tokens_val}]\")\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For easier reading, you can also render Markdown in Jupyter:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "kD3S2XsnHL50" }, "source": [ "#### The `top_p` parameter (range: 0.0 - 1.0, default 0.95)\n", "\n", "##### What is _top_p_?\n", "`top_p` controls how the model selects tokens for output by adjusting the probability distribution of the next word in the generated text based on a cumulative probability cutoff. Specifically, it selects the smallest set of tokens whose cumulative probability exceeds the given cutoff probability _p_, and samples from this set uniformly.\n", "\n", "\n", "For example, suppose tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the `top_p` value is 0.5. In that case, the model will select either A or B as the next token (using temperature) and not consider C, because the cumulative probability of top_p is <= 0.5. Specify a lower value for less random responses and a higher value for more random responses.\n", "\n", "##### How does _top_p_ affect the response?\n", "\n", "The `top_p` parameter is used to control the diversity of the generated text. A higher `top_p` parameter value results in more \"diverse\" and \"interesting\" outputs, with the model being allowed to sample from a larger pool of possibilities. In contrast, a lower `top_p` parameter value resulted in more predictable outputs, with the model being constrained to a smaller set of possible tokens.\n", "\n", "\n", "##### Example:\n", "\n", "`top_p = 0.1`:\n", "\n", "- The cat sat on the mat.\n", "- The cat sat on the floor.\n", "\n", "`top_p = 0.9`:\n", "\n", "- The cat sat on the windowsill, soaking up the sun's rays.\n", "- The cat sat on the edge of the bed, watching the birds outside.\n", "\n", "For more information on the `top_p` parameter for text models, please refer to the [documentation on model parameters](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#text_model_parameters)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "RAJiVYprNle1" }, "outputs": [], "source": [ "top_p_val = 0.0\n", "prompt_top_p_example = (\n", " \"Create a marketing campaign for jackets that involves blue elephants and avocados.\"\n", ")\n", "\n", "response = generation_model.predict(\n", " prompt=prompt_top_p_example, temperature=0.9, top_p=top_p_val\n", ")\n", "\n", "print(f\"[top_p = {top_p_val}]\")\n", "print(response.text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zm69kbcyN2gg" }, "outputs": [], "source": [ "top_p_val = 1.0\n", "\n", "response = generation_model.predict(\n", " prompt=prompt_top_p_example, temperature=0.9, top_p=top_p_val\n", ")\n", "\n", "print(f\"[top_p = {top_p_val}]\")\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "Krop7HXOIy8f" }, "source": [ "#### The `top_k` parameter (range: 0.0 - 40, default 40)\n", "\n", "##### What is _top_k_?\n", "`top_k` changes how the model selects tokens for output. A `top_k` of 1 means the selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding). In contrast, a `top_k` of 3 means that the next token is selected from the top 3 most probable tokens (using temperature). For each token selection step, the `top_k` tokens with the highest probabilities are sampled. Then tokens are further filtered based on `top_p` with the final token selected using temperature sampling.\n", "\n", "##### How does _top_k_ affect the response?\n", "\n", "Specify a lower value for less random responses and a higher value for more random responses.\n", "\n", "For more information on the `top_k` parameter for text models, please refer to the [documentation on model parameters](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#text_model_parameters)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qK76o1hYO3ej" }, "outputs": [], "source": [ "prompt_top_k_example = \"Write a 2-day itinerary for France.\"\n", "top_k_val = 1\n", "\n", "response = generation_model.predict(\n", " prompt=prompt_top_k_example, max_output_tokens=300, temperature=0.9, top_k=top_k_val\n", ")\n", "\n", "print(f\"[top_k = {top_k_val}]\")\n", "print(response.text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Hhu9d23vPGmK" }, "outputs": [], "source": [ "top_k_val = 40\n", "\n", "response = generation_model.predict(\n", " prompt=prompt_top_k_example,\n", " max_output_tokens=300,\n", " temperature=0.9,\n", " top_k=top_k_val,\n", ")\n", "\n", "print(f\"[top_k = {top_k_val}]\")\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "14ada40abecc" }, "source": [ "## Chat model with `chat-bison@001`" ] }, { "cell_type": "markdown", "metadata": { "id": "1923b5583a2d" }, "source": [ "The `chat-bison@001` model lets you have a freeform conversation across multiple turns. The application tracks what was previously said in the conversation. As such, if you expect to use conversations in your application, use the `chat-bison@001` model because it has been fine-tuned for multi-turn conversation use cases." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1325438b9188" }, "outputs": [], "source": [ "chat_model = ChatModel.from_pretrained(\"chat-bison@001\")\n", "\n", "chat = chat_model.start_chat()\n", "\n", "print(\n", " chat.send_message(\n", " \"\"\"\n", "Hello! Can you write a 300 word abstract for a research paper I need to write about the impact of generative AI on society?\n", "\"\"\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "ef81f564bed1" }, "source": [ "As shown below, the model should respond based on what was previously said in the conversation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "30ab5afac7dc" }, "outputs": [], "source": [ "print(\n", " chat.send_message(\n", " \"\"\"\n", "Could you give me a catchy title for the paper?\n", "\"\"\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "fe15a6b083e5" }, "source": [ "### Advanced Chat model with the SDK\n", "You can also provide a `context` and `examples` to the model. The model will then respond based on the provided context and examples. You can also use `temperature`, `max_output_tokens`, `top_p`, and `top_k`. These parameters should be used when you start your chat with `chat_model.start_chat()`.\n", "\n", "For more information on chat models, please refer to the [documentation on chat model parameters](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#chat_model_parameters)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4047b525961e" }, "outputs": [], "source": [ "chat = chat_model.start_chat(\n", " context=\"My name is Ned. You are my personal assistant. My favorite movies are Lord of the Rings and Hobbit.\",\n", " examples=[\n", " InputOutputTextPair(\n", " input_text=\"Who do you work for?\",\n", " output_text=\"I work for Ned.\",\n", " ),\n", " InputOutputTextPair(\n", " input_text=\"What do I like?\",\n", " output_text=\"Ned likes watching movies.\",\n", " ),\n", " ],\n", " temperature=0.3,\n", " max_output_tokens=200,\n", " top_p=0.8,\n", " top_k=40,\n", ")\n", "print(chat.send_message(\"Are my favorite movies based on a book series?\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "685184ee4159" }, "outputs": [], "source": [ "print(chat.send_message(\"When where these books published?\"))" ] }, { "cell_type": "markdown", "metadata": { "id": "67b6eef99f56" }, "source": [ "## Embedding model with `textembedding-gecko@001`" ] }, { "cell_type": "markdown", "metadata": { "id": "64a58515878c" }, "source": [ "Text embeddings are a dense, often low-dimensional, vector representation of a piece of content such that, if two pieces of content are semantically similar, their respective embeddings are located near each other in the embedding vector space. This representation can be used to solve common NLP tasks, such as:\n", "\n", "* **Semantic search**: Search text ranked by semantic similarity.\n", "* **Recommendation**: Return items with text attributes similar to the given text.\n", "* **Classification**: Return the class of items whose text attributes are similar to the given text.\n", "* **Clustering**: Cluster items whose text attributes are similar to the given text.\n", "* **Outlier Detection**: Return items where text attributes are least related to the given text.\n", "\n", "Please refer to the [text embedding model documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings) for more information." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "f0f175d5b02a" }, "outputs": [], "source": [ "embedding_model = TextEmbeddingModel.from_pretrained(\"textembedding-gecko@001\")\n", "\n", "embeddings = embedding_model.get_embeddings([\"What is life?\"])\n", "\n", "for embedding in embeddings:\n", " vector = embedding.values\n", " print(f\"Length = {len(vector)}\")\n", " print(vector)" ] }, { "cell_type": "markdown", "metadata": { "id": "601bd7e7ef1d" }, "source": [ "#### Embeddings and Pandas DataFrames" ] }, { "cell_type": "markdown", "metadata": { "id": "9e69d2ba877f" }, "source": [ "If your text is stored in a column of a DataFrame, you can create a new column with the embeddings with the example below." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d99f08bab254" }, "outputs": [], "source": [ "text = [\n", " \"i really enjoyed the movie last night\",\n", " \"so many amazing cinematic scenes yesterday\",\n", " \"had a great time writing my Python scripts a few days ago\",\n", " \"huge sense of relief when my .py script finally ran without error\",\n", " \"O Romeo, Romeo, wherefore art thou Romeo?\",\n", "]\n", "\n", "df = pd.DataFrame(text, columns=[\"text\"])\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "fabd92d8ddb6" }, "source": [ "Create a new column, `embeddings`, using the [apply](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html) function in pandas with the embeddings model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "096cbf3f2698" }, "outputs": [], "source": [ "df[\"embeddings\"] = [\n", " emb.values for emb in embedding_model.get_embeddings(df.text.values)\n", "]\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "69ebe1a6514d" }, "source": [ "#### Comparing similarity of text examples using cosine similarity" ] }, { "cell_type": "markdown", "metadata": { "id": "04d0f13acedb" }, "source": [ "By converting text into embeddings, you can compute similarity scores. There are many ways to compute similarity scores, and one common technique is using [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity).\n", "\n", "In the example from above, two of the sentences in the `text` column relate to enjoying a _movie_, and the other two relates to enjoying _coding_. Cosine similarity scores should be higher (closer to 1.0) when doing pairwise comparisons between semantically-related sentences, and scores should be lower between semantically-different sentences. \n", "\n", "The DataFrame output below shows the resulting cosine similarity scores between the embeddings:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "dc499dae438a" }, "outputs": [], "source": [ "cos_sim_array = cosine_similarity(list(df.embeddings.values))\n", "\n", "# display as DataFrame\n", "df = pd.DataFrame(cos_sim_array, index=text, columns=text)\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "97a5e2e32df5" }, "source": [ "To make this easier to understand, you can use a heatmap. Naturally, text is most similar when they are identical (score of 1.0). The next highest scores are when sentences are semantically similar. The lowest scores are when sentences are quite different in meaning." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "560ea2a11535" }, "outputs": [], "source": [ "ax = sns.heatmap(df, annot=True, cmap=\"crest\")\n", "ax.xaxis.tick_top()\n", "ax.set_xticklabels(text, rotation=90)" ] } ], "metadata": { "colab": { "name": "intro_palm_api.ipynb", "toc_visible": true }, "environment": { "kernel": "python3", "name": "common-cpu.m108", "type": "gcloud", "uri": "gcr.io/deeplearning-platform-release/base-cpu:m108" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.12" } }, "nbformat": 4, "nbformat_minor": 4 }