-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
1 changed file
with
358 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,358 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "DKE2bARNVhDs" | ||
}, | ||
"source": [ | ||
"[](https://colab.research.google.com/github/supabase/supabase/blob/master/examples/ai/vector_hello_world.ipynb)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "oQeCPgFKf3MQ" | ||
}, | ||
"source": [ | ||
"#Vector \"Hello, World\" Quickstart\n", | ||
"\n", | ||
"`vecs` is a python client for managing and querying vector stores in PostgreSQL with the [pgvector extension](https://github.com/pgvector/pgvector). This guide will help you get started with using vecs.\n", | ||
"\n", | ||
"If you don't have a Postgres database with the pgvector extension installed, see [hosting](https://supabase.github.io/vecs/hosting/) for easy options.\n", | ||
"\n", | ||
"\n", | ||
"##Installation\n", | ||
"\n", | ||
"Requires:\n", | ||
"\n", | ||
"- Python 3.7+\n", | ||
"\n", | ||
"You can install vecs using pip:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "q46dFslpgVDC" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"pip install vecs" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "AeGQKpZegfCv" | ||
}, | ||
"source": [ | ||
"##Usage\n", | ||
"###Connecting\n", | ||
"\n", | ||
"Before you can interact with vecs, create the client to communicate with Postgres.\n", | ||
"\n", | ||
"Note: In Supabase go to [Database Settings](https://app.supabase.com/project/_/settings/database) to get your Postgres connection string." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "2snoGFPRgkoG" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import vecs\n", | ||
"\n", | ||
"DB_CONNECTION = \"postgresql://<user>:<password>@<host>:<port>/<db_name>\"\n", | ||
"\n", | ||
"# create vector store client\n", | ||
"vx = vecs.create_client(DB_CONNECTION)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "zbwuFOgQgog6" | ||
}, | ||
"source": [ | ||
"###Create collection\n", | ||
"\n", | ||
"You can create a collection to store vectors specifying the collections name and the number of dimensions in the vectors you intend to store." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "V36NI1ZZgrJ9" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"docs = vx.create_collection(name=\"docs\", dimension=3)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "djKCRhbbhAQ8" | ||
}, | ||
"source": [ | ||
"###Get an existing collection\n", | ||
"\n", | ||
"To access a previously created collection, use `get_collection` to retrieve it by name" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "71IyRUbDhEGz" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"docs = vx.get_collection(name=\"docs\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "a0_KSZRNhHPk" | ||
}, | ||
"source": [ | ||
"###Upserting vectors\n", | ||
"\n", | ||
"`vecs` combines the concepts of \"insert\" and \"update\" into \"upsert\". Upserting records adds them to the collection if the `id` is not present, or updates the existing record if the `id` does exist." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "VikCHcKShOEJ" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# add records to the collection\n", | ||
"docs.upsert(\n", | ||
" vectors=[\n", | ||
" (\n", | ||
" \"vec0\", # the vector's identifier\n", | ||
" [0.1, 0.2, 0.3], # the vector. list or np.array\n", | ||
" {\"year\": 1973} # associated metadata\n", | ||
" ),\n", | ||
" (\n", | ||
" \"vec1\",\n", | ||
" [0.7, 0.8, 0.9],\n", | ||
" {\"year\": 2012}\n", | ||
" )\n", | ||
" ]\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "t1t9pcK4hQoK" | ||
}, | ||
"source": [ | ||
"###Create an index\n", | ||
"\n", | ||
"Collections can be queried immediately after being created. However, for good performance, the collection should be indexed after records have been upserted.\n", | ||
"\n", | ||
"Indexes should be created **after** the collection has been populated with records. Building an index on an empty collection will result in significantly reduced recall. Once the index has been created you can still upsert new documents into the collection but you should rebuild the index if the size of the collection more than doubles.\n", | ||
"\n", | ||
"Only one index may exist per-collection. By default, creating an index will replace any existing index.\n", | ||
"\n", | ||
"To create an index:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "kOzVOcFAhUxx" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"##\n", | ||
"# INSERT RECORDS HERE\n", | ||
"##\n", | ||
"\n", | ||
"# index the collection to be queried by cosine distance\n", | ||
"docs.create_index(measure=vecs.IndexMeasure.cosine_distance)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "jh7E_-9thW6L" | ||
}, | ||
"source": [ | ||
"Available options for query `measure` are:\n", | ||
"\n", | ||
"- `vecs.IndexMeasure.cosine_distance`\n", | ||
"- `vecs.IndexMeasure.l2_distance`\n", | ||
"- `vecs.IndexMeasure.max_inner_product`\n", | ||
"\n", | ||
"which correspond to different methods for comparing query vectors to the vectors in the database.\n", | ||
"\n", | ||
"If you aren't sure which to use, stick with the default (cosine_distance) by omitting the parameter i.e." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "WMMEs0IMheSU" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"docs.create_index()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "vvNwjXgYhhHE" | ||
}, | ||
"source": [ | ||
"Note: The time required to create an index grows with the number of records and size of vectors. For a few thousand records expect sub-minute a response in under a minute. It may take a few minutes for larger collections." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "OdJvFYdAhl11" | ||
}, | ||
"source": [ | ||
"###Query\n", | ||
"\n", | ||
"Given a collection `docs` with several records:\n", | ||
"\n", | ||
"####Basic\n", | ||
"\n", | ||
"The simplest form of search is to provide a query vector.\n", | ||
"\n", | ||
"Note: Indexes are essential for good performance. See [creating an index](https://supabase.github.io/vecs/api/#create-an-index) for more info.\n", | ||
"\n", | ||
"If you do not create an index, every query will return a warning\n", | ||
"\n", | ||
"`query does not have a covering index for cosine_similarity. See Collection.create_index`\n", | ||
"\n", | ||
"that incldues the `IndexMeasure` you should index." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"colab": { | ||
"base_uri": "https://localhost:8080/" | ||
}, | ||
"id": "lOCAT1Eqh2kH", | ||
"outputId": "1f9e7053-0136-4d06-aea4-dad18695e6fe" | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"['vec1', 'vec0']" | ||
] | ||
}, | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"docs.query(\n", | ||
" query_vector=[0.4,0.5,0.6], # required\n", | ||
" limit=5, # number of records to return\n", | ||
" filters={}, # metadata filters\n", | ||
" measure=\"cosine_distance\", # distance measure to use\n", | ||
" include_value=False, # should distance measure values be returned?\n", | ||
" include_metadata=False, # should record metadata be returned?\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "4aCYtmrbh4Qa" | ||
}, | ||
"source": [ | ||
"Which returns a list of vector record `ids`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "t6-r7gMth7Kn" | ||
}, | ||
"source": [ | ||
"### Metadata Filtering\n", | ||
"\n", | ||
"The metadata that is associated with each record can also be filtered during a query.\n", | ||
"\n", | ||
"As an example, `{\"year\": {\"$eq\": 2005}}` filters a year metadata key to be equal to 2005\n", | ||
"\n", | ||
"In context:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"colab": { | ||
"base_uri": "https://localhost:8080/" | ||
}, | ||
"id": "ypj8DnOaiABC", | ||
"outputId": "0cd878c5-a1a6-4c53-d8cb-21356e599d77" | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"['vec1']" | ||
] | ||
}, | ||
"execution_count": 9, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"docs.query(\n", | ||
" query_vector=[0.4,0.5,0.6],\n", | ||
" filters={\"year\": {\"$eq\": 2012}}, # metadata filters\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "eZ85uF35iCp8" | ||
}, | ||
"source": [ | ||
"For a complete reference, see the [metadata guide](https://supabase.github.io/vecs/concepts_metadata/)." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"colab": { | ||
"provenance": [] | ||
}, | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |