Vespa sample applications

The Vespa sample applications are created to run both self-hosted and on Vespa Cloud. You can easily deploy the sample applications to Vespa Cloud without changing the files - just follow the same steps as for logo Managed Vector Search using Vespa Cloud, adding security credentials.

First-time users should go through the getting-started guides first.

See examples/operations for operational sample applications.

Getting started

logo Album Recommendations is the intro application to Vespa. Learn how to configure the schema for simple recommendation and search use cases.

logo Pyvespa: Hybrid Search - Quickstart and logo Pyvespa: Hybrid Search - Quickstart on Vespa Cloud create a hybrid text search application combining traditional keyword matching with semantic vector search (dense retrieval). They also demonstrate the Vespa native embedder functionality. These are intro level applications for Python users using more advanced Vespa features. Use logo Pyvespa: Authenticating to Vespa Cloud for Vespa Cloud credentials.

logo Pyvespa: Querying Vespa is a good start for Python users, exploring how to query Vespa using the Vespa Query Language (YQL).

logo Pyvespa: Read and write operations documents ways to feed, get, update and delete data; Using context manager with for efficiently managing resources and feeding streams of data using feed_iter which can feed from streams, Iterables, Lists and files by the use of generators.

logo Pyvespa: Application packages is a good intro to the concept of application packages in Vespa. Try logo Pyvespa: Advanced Configuration for Vespa Services configuration.

logo Pyvespa: Examples is a repository of small snippets and examples, e.g. really simple vector distance search applications.

The logo News and Recommendation Tutorial demonstrates basic search functionality, and is a great place to start exploring Vespa features. It creates a recommendation system where the approximate nearest neighbor search in a shared user/item embedding space is used to retrieve recommended content for a user. This app also demonstrates using parent-child relationships.

The logo Text Search Tutorial demonstrates traditional text search using BM25/Vespa nativeRank, and is a good start into using the MS Marco dataset.

Vector Search, Hybrid Search and Embeddings

There is a growing interest in AI-powered vector representations of unstructured multimodal data and searching efficiently over these representations. logo Managed Vector Search using Vespa Cloud describes how to unlock the full potential of multimodal AI-powered vector representations using Vespa Cloud.

logo Simple Semantic Search demonstrates indexed vector search using HNSW, creating embedding vectors from a transformer language model inside Vespa, and hybrid text and semantic ranking. This app also demonstrates using native Vespa embedders.

logo Vespa Multi-Vector Indexing with HNSW and logo Pyvespa: Multi-vector indexing with HNSW demonstrate how to index multiple vectors per document field for semantic search for longer documents.

logo Vector Streaming Search uses vector streaming search for naturally partitioned data, she the blog post for details.

logo Multilingual Search with multilingual embeddings demonstrates multilingual semantic search with multilingual text embedding models.

logo Simple hybrid search with SPLADE uses the Vespa splade-embedder for semantic search using sparse vector representations, and is a good intro into SPLADE and sparse learned weights for ranking.

logo Customizing Frozen Data Embeddings in Vespa demonstrates how to adapt frozen embeddings from foundational embedding models - see the blog post. Frozen data embeddings from foundational models is an emerging industry practice for reducing the complexity of maintaining and versioning embeddings. The frozen data embeddings are re-used for various tasks, such as classification, search, or recommendations.

logo Pyvespa: Using Cohere Binary Embeddings in Vespa demonstrates how to use the Cohere binary vectors with Vespa, including a re-ranking phase that uses the float query vector version for improved accuracy.

logo Pyvespa: Billion-scale vector search with Cohere binary embeddings in Vespa uses the Cohere int8 & binary Embeddings with a coarse-to-fine search and re-ranking pipeline that reduces costs, but offers the same retrieval (nDCG) accuracy. The packed binary vector representation is stored in memory, with an optional HNSW index using hamming distance. The int8 vector representation is stored on disk using Vespa’s paged option.

logo Pyvespa: Multilingual Hybrid Search with Cohere binary embeddings and Vespa demonstrates:

Building a multilingual search application over a sample of the German split of Wikipedia using binarized Cohere embeddings.
Indexing multiple binary embeddings per document; without having to split the chunks across multiple retrievable units.
Hybrid search, combining the lexical matching capabilities of Vespa with Cohere binary embeddings.
Re-scoring the binarized vectors for improved accuracy.

logo Pyvespa: BGE-M3 - The Mother of all embedding models demonstrates how to use the BGE-M3 embeddings and represent all three embedding representations in Vespa. This code is inspired by the BAAI/bge-m3 README.

logo Pyvespa: Evaluating retrieval with Snowflake arctic embed shows how different rank profiles in Vespa can be set up and evaluated. For the rank profiles that use semantic search, we will use the small version of Snowflake’s arctic embed model series for generating embeddings.

logo Pyvespa: Exploring the potential of OpenAI Matryoshka 🪆 embeddings with Vespa demonstrates the effectiveness of using the recently released (as of January 2024) OpenAI text-embedding-3 embeddings with Vespa. Specifically, we are interested in the Matryoshka Representation Learning technique used in training, which lets us "shorten embeddings (i.e. remove some numbers from the end of the sequence) without the embedding losing its concept-representing properties". This allow us to trade off a small amount of accuracy in exchange for much smaller embedding sizes, so we can store more documents and search them faster.

logo Pyvespa: Using Mixedbread.ai embedding model with support for binary vectors shows how to use the mixedbread-ai/mxbai-embed-large-v1 model with support for binary vectors with Vespa. The notebook example also includes a re-ranking phase that uses the float query vector version for improved accuracy. The re-ranking step makes the model perform at 96.45% of the full float version, with a 32x decrease in storage footprint.

Retrieval Augmented Generation (RAG) and Generative AI

logo Retrieval Augmented Generation (RAG) in Vespa is an end-to-end RAG application where all the steps are run within Vespa. This application focuses on the generation part of RAG, with a simple text search using BM25. This application has three versions of an end-to-end RAG application:

Using an external LLM service to generate the final response.
Using local LLM inference to generate the final response.
Deploying to Vespa Cloud and using GPU accelerated LLM inference to generate the final response. This includes using Vespa Cloud's Secret Store to save the OpenAI API key.

logo Pyvespa: Visual PDF RAG with Vespa - ColPali demo application is an end-to-end demo application for visual retrieval of PDF pages, including a frontend web application - try vespa-engine-colpali-vespa-visual-retrieval.hf.space for a live demo. The main goal of the demo is to make it easy to create your own PDF Enterprise Search application using Vespa!

logo Pyvespa: Building cost-efficient retrieval-augmented personal AI assistants uses streaming mode for cost-efficient retrieval for applications that store and retrieve personal data. This notebook connects a custom LlamaIndex Retriever with a Vespa app using streaming mode to retrieve personal data.

logo Pyvespa: Turbocharge RAG with LangChain and Vespa Streaming Mode for Partitioned Data uses streaming mode to build cost-efficient RAG applications over naturally sharded data - also available as a blog post: Turbocharge RAG with LangChain and Vespa Streaming Mode for Sharded Data. Also try logo Pyvespa: Chat with your pdfs with ColBERT, LangChain, and Vespa - this demonstrates how you can now use ColBERT ranking natively in Vespa, which handles the ColBERT embedding process with no custom code.

Visual Search

logo Pyvespa: Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models demonstrates how to retrieve PDF pages using the embeddings generated by the ColPali model. ColPali is a powerful Vision Language Model (VLM) that can generate embeddings for images and text. This notebook uses ColPali to generate embeddings for images of PDF pages and store them in Vespa. We also store the base64-encoded image of the PDF page and some metadata like title and url.

logo Pyvespa: Scaling ColPALI (VLM) Retrieval demonstrates how to represent ColPali in Vespa and to scale to large collections. Also see the Scaling ColPali to billions of PDFs with Vespa blog post.

logo Pyvespa: ColPali Ranking Experiments on DocVQA shows how to reproduce the ColPali results on DocVQA with Vespa. The dataset consists of PDF documents with questions and answers. We demonstrate how we can binarize the patch embeddings and replace the float MaxSim scoring with a hamming-based MaxSim without much loss in ranking accuracy but with a significant speedup (close to 4x) and reducing the memory (and storage) requirements by 32x.

logo Pyvespa: PDF-Retrieval using ColQWen2 (ColPali) with Vespa is a continuation of the notebooks related to the ColPali models (above) for complex document retrieval, and demonstrates use of the ColQWen2 model checkpoint.

Ranking

With Vespa’s phased ranking capabilities, doing cross-encoder inference for a subset of documents at a later stage in the ranking pipeline can be a good trade-off between ranking performance and latency. logo Pyvespa: Using Mixedbread.ai cross-encoder for reranking in Vespa.ai shows how to use the Mixedbread.ai cross-encoder for global-phase reranking in Vespa.

logo Pyvespa: Standalone ColBERT with Vespa for end-to-end retrieval and ranking illustrates using the colbert-ai package to produce token vectors, instead of using the native Vespa ColBERT embedder. The guide illustrates how to feed and query using a single passage representation:

Compress token vectors using binarization compatible with Vespa's unpack_bits used in ranking. This implements the binarization of token-level vectors using numpy.
Use Vespa hex feed format for binary vectors.
Query examples.

As a bonus, this also demonstrates how to use ColBERT end-to-end with Vespa for both retrieval and ranking. The retrieval step searches the binary token-level representations using hamming distance. This uses 32 nearestNeighbor operators in the same query, each finding 100 nearest hits in hamming space. Then the results are re-ranked using the full-blown MaxSim calculation.

ColBERT token-level embeddings:

Simple hybrid search with ColBERT uses a single vector embedding model for retrieval and ColBERT (multi-token vector representation) for re-ranking. This semantic search application demonstrates the colbert-embedder and the tensor expressions for ColBERT MaxSim. It also features reciprocal rank fusion to fuse different rankings.
Long-Context ColBERT demonstrates Long-Context ColBERT (multi-token vector representation) with extended context windows for long-document retrieval, as announced in Vespa Long-Context ColBERT. The app demonstrates the colbert-embedder and the tensor expressions for performing two types of extended ColBERT late-interaction for long-context retrieval. This app uses trec-eval for evaluation using nDCG.
Pyvespa: Standalone ColBERT + Vespa for long-context ranking is a guide on how to use the ColBERT package to produce token-level vectors, as an alternative to using the native Vespa ColBERT embedder. It illustrates how to feed multiple passages per Vespa document (long-context):
- Compress token vectors using binarization compatible with Vespa's unpack_bits.
- Use Vespa hex feed format for binary vectors with mixed vespa tensors.
- How to query Vespa with the ColBERT query tensor representation.

logo Pyvespa: LightGBM: Training the model with Vespa features deploys and uses a LightGBM model in a Vespa application. The tutorial runs through how to:

Train a LightGBM classification model with variable names supported by Vespa.
Create Vespa application package files and export then to an application folder.
Export the trained LightGBM model to the Vespa application folder.
Deploy the Vespa application using the application folder.
Feed data to the Vespa application.
Assert that the LightGBM predictions from the deployed model are correct.

logo Pyvespa: LightGBM: Mapping model features to Vespa features shows how to deploy a LightGBM model with feature names that do not match Vespa feature names. In addition to the steps in the app above, this tutorial:

Trains a LightGBM classification model with generic feature names that will not be available in the Vespa application.
Creates an application package and include a mapping from Vespa feature names to LightGBM model feature names.

Performance

logo Pyvespa: Feeding performance intends to shine some light on the different modes of feeding documents to Vespa, looking at 4 different methods:

Using VespaSync
Using VespaAsync
Using feed_iterable()
Using Vespa CLI

Use logo Feeding to Vespa Cloud to test feeding using Vespa Cloud.

More advanced sample applications

Billion-scale Image Search

logo Billion-Scale Image Search demonstrates billion-scale image search using a CLIP model exported in ONNX-format for retrieval. It features separation of compute from storage and query-time vector similarity de-duping. It uses PCA to reduce from 768 to 128 dimensions.

State-of-the-art Text Ranking

logo MS Marco Passage Ranking shows how to represent state-of-the-art text ranking using Transformer (BERT) models. It uses the MS Marco passage ranking datasets and features bi-encoders, cross-encoders, and late-interaction models (ColBERT).

Next generation E-Commerce Search

The logo e-commerce application is an end-to-end shopping engine, using the Amazon product data set. This use case bundles a frontend application. It demonstrates building next generation E-commerce Search using Vespa, and is a good intro into using the Vespa Cloud CI/CD tests.

Also try logo Vespa Product Ranking for using learning-to-rank (LTR) techniques (using XGBoost and LightGBM) for improving product search ranking.

Search as you type and query suggestions

logo Incremental Search shows search-as-you-type functionality, where for each keystroke of the user, it retrieves matching documents. It also demonstrates search suggestions (query auto-completion).

Vespa as ML inference server

logo Stateless model evaluation demonstrates using Vespa as a stateless ML model inference server where Vespa takes care of distributing ML models to multiple serving containers, offering horizontal scaling and safe deployment. It features model versioning and a feature processing pipeline, as well as using custom code in Searchers, Document Processors and Request Handlers.

Vespa Documentation Search

logo Vespa Documentation Search is the search application that powers search.vespa.ai - refer to this for GitHub Actions automation. This sample app is a good start for automated deployments, as it has system, staging and production test examples. It uses the Document API both for regular PUT operations but also for UPDATE with create-if-nonexistent. It also has Vespa Components for custom code.

CORD-19 Search

logo cord19.vespa.ai is a full-featured application, based on the Covid-19 Open Research Dataset:

cord-19: frontend
cord-19-search: search backend

Note: Applications with pom.xml are Java/Maven projects and must be built before deployment. Refer to the Developer Guide for more information.

Contribute to the Vespa sample applications.

Name		Name	Last commit message	Last commit date
Latest commit History 5,126 Commits
.github/workflows		.github/workflows
_plugins-linkcheck		_plugins-linkcheck
_plugins-vespafeed		_plugins-vespafeed
album-recommendation-java		album-recommendation-java
album-recommendation		album-recommendation
assets		assets
billion-scale-image-search		billion-scale-image-search
billion-scale-vector-search		billion-scale-vector-search
colbert-long		colbert-long
colbert		colbert
commerce-product-ranking		commerce-product-ranking
custom-embeddings		custom-embeddings
examples		examples
incremental-search		incremental-search
model-inference		model-inference
msmarco-ranking		msmarco-ranking
multi-vector-indexing		multi-vector-indexing
multilingual-search		multilingual-search
news		news
retrieval-augmented-generation		retrieval-augmented-generation
simple-semantic-search		simple-semantic-search
splade		splade
test		test
text-image-search		text-image-search
text-search		text-search
text-video-search		text-video-search
transformers		transformers
use-case-shopping		use-case-shopping
vector-streaming-search		vector-streaming-search
visual-retrieval-colpali		visual-retrieval-colpali
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
README.md		README.md
Vagrantfile		Vagrantfile
_config.yml		_config.yml
_paragraphs_config.yml		_paragraphs_config.yml
_suggestions_config.yml		_suggestions_config.yml
feed-split.py		feed-split.py
feed_to_vespa.py		feed_to_vespa.py
pom.xml		pom.xml
questions.jsonl		questions.jsonl
renovate.json		renovate.json
screwdriver.yaml		screwdriver.yaml
suggestions_index.json		suggestions_index.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly