Skip to content

This project is a demonstration of how to prototype a Retrieval-Augmented Generation (RAG) Assistant employing a suite of open-source technologies, frameworks and fine-tuned Large Language Models. It can be adapted to many other business-specific use cases.

License

Notifications You must be signed in to change notification settings

amr-sheriff/arxiv-assistant

Repository files navigation

Alt text

A RAG Assistant Prototype. The live app can be accessed at arXiv Assistant.


This project is a demonstration of how to prototype a Retrieval-Augmented Generation (RAG) Assistant employing a suite of open-source technologies, frameworks and fine-tuned Large Language Models.

It can be adapted to many other business-specific use cases.

Demo 🎥

The arXiv Assistant is a simple demo designed to help researchers and practitioners stay up-to-date with the latest advancements in various fields by retrieving, summarizing, and answering queries about research papers from arXiv.

The assistant can retrieve and select relevant research papers based on user-specified criteria such as submission/revision date, domain/category, and topic. Additionally, it can also answer questions about the papers and highlight key points.

Alt text

Development Process 🛠️

The development process involved leveraging state-of-the-art techniques to implement a lightweight yet efficient LLM system. Key technologies used include:

  • Instruction Tuning Dataset: Generate domain-specific synthetic dataset consisting of instructions and QAs pairs.
  • QLoRA Quantization: For efficient memory usage.
  • Parameter-Efficient Fine-Tuning (PEFT): Utilizing LoRA adapters. PEFT mdel available here.
  • Function Calling and In-Context Learning: To enhance the assistant's capabilities.
  • Retrieval Augmented Generation (RAG) process: To improve contextual understanding and relevance, which enhances the retrieval and generation process.
  • vLLM: As the serving and inference engine.
  • HuggingFace Text Embedding Inference: Serving the embedding model using HuggingFace to provide high-quality embeddings for document retrieval and processing.

Key Features ✨

  • Retrieve and Summarize Papers: Quickly find and get summaries of the latest research papers.
  • Answer Queries: Get answers to specific questions about research papers.
  • Web-Based UI: Built with Chainlit for an interactive user experience.
  • Observability Functionality: Supported by Literal AI, providing insights and monitoring for the assistant's performance and operations.
  • In-Chat Memory: Allows the assistant to remember previous interactions within a session.
  • Resume Chat Capability: Enables users to continue previous chat sessions seamlessly.
  • Chain of Thought Visualization: Supported by Literal AI, providing an intuitive understanding of the assistant's reasoning process.
  • Data persistence & human feedback: Ensures that the data is retained and allows for continuous improvement through user feedback.

Integrations & Frameworks 🔌

  • Weights & Biases: For monitoring and logging the fine-tuning process.
  • LangChain: For implementing retrieval and processing modules.
  • HuggingFace TEI: For text embedding inference, serving high-quality embeddings.
  • vLLM: As the serving and inference engine.
  • Chainlit: to build scalable conversational AI or agentic applications.
  • Literal AI: LLM evaluation and observability platform

Quickstart Guide ⚡

To get started with arXiv Assistant, open the terminal and follow these steps:

  1. Clone the Repository:

    $ git clone https://github.com/your-repo/arxiv-assistant.git
    $ cd arxiv-assistant
  2. Create a .env file in the root directory with the following environment variables:

    $ nano .env
    LITERAL_API_KEY=<your-literal-api-key>
    VLLM_API_KEY=<your-vllm-server-key>
    CHAINLIT_AUTH_SECRET=<your-chainlit-auth-secret>
    OAUTH_GOOGLE_CLIENT_ID=<your-oauth-google-client-id>
    OAUTH_GOOGLE_CLIENT_SECRET=<your-oauth-google-client-secret>
    HF=<your-huggingface-token>
    EMBED_ENDPOINT=<your-tei-endpoint>
    VLLM_ENDPOINT=<your-vllm-server-endpoint>
    MAX_ARXIV_CHAR=<max-character-to-load-from-each-arxiv-document>  # set to None to disable
  3. Build docker image:

    $ docker build -t arxiv-assistant:latest .
  4. Run countainer:

    $ docker run -d --env-file .env -p 8080:8080 arxiv_assistant:latest

Run the app locally and navigate to localhost:8080/arxiv-assistant 🥂

License 📜

This project is licensed under the Apache 2.0 license.

About

This project is a demonstration of how to prototype a Retrieval-Augmented Generation (RAG) Assistant employing a suite of open-source technologies, frameworks and fine-tuned Large Language Models. It can be adapted to many other business-specific use cases.

Resources

License

Stars

Watchers

Forks