- This project utilizes Ro-LLM Langchain and ChromaDB to establish a Retrieval Augmented Generation (RAG) system.
- Retriever augmented generation (RAG) is a system that improves the response generated by a LLM in two ways: -- First, the information is retrieved from a dataset that is stored in vector database; the query is used to perform similarity search in the documents stored in the vector database. -- Second, by restraining the context provided to the LLM to content that is similar with the initial query, stored in the vector database, we can reduce significantly (or even eliminate) LLM's halucinations, since the answer is provided from the context of the stored documents.
- An important advantage of this approach is that we do not need to fine-tune the LLM with our custom data; instead, the data is ingested (cleaned, transformed, chunked, and indexed in the vector database).
- We start by using directly a prompt with questions for the LLM using a transformers pipeline.
- We test the pipeline with few questions.
- We then ingest the text of the Letters of Ion Ghica in a vector database that will serve as the retriever. The vector database used is ChromaDB.
- We assemble the retriever-generation system. For this, we compose a prompt with the system message instructing the LLM how to use the initial query and the context retrieved from the vector database; in the user message we give the query as input.
- We test with the same questions we tried without the RAG system (directly prompting the LLM) and more questions.
- Install Packages
- Prepare the Model Pipeline
- Test the Text Generator Function
- Run Tests with Text Generator
- Retrieval Preparation
- Ingest the Test
- Define QUery
- Perform Tests with RAG
- Conclusion
- LLM: Large Language Model
- Langchain: Framework designed to streamline the creation of applications utilizing LLMs
- Vector database: Database that organizes data using high-dimensional vectors
- ChromaDB: Vector database
- RAG: Retrieval Augmented Generation (see below for more details)
- We implemented a RAG system for Romanian. The LLM used was a quantized version of the Romanian LLM.
- Using RAG, we can focus the answers on the exact set of documents we are targeting. We could see that, for some of the questions, we get very "to the point" answers in the case of RAG system while using the direct prompting the LLM method we get also some halucinations.
- The answers to the questions are delivered in relatively short time, around 2-3 seconds most of them. The quality of the responses is rather good.