Recommender System and Retrieval Augmented Generation (RAG)

Introduction

This project consists of two phases aimed at building a recommender system for some items and designing an assistant based on the Retrieval Augmented Generation (RAG) system.

Phase 1: Recommender System

In this phase, we utilize a JSON file named revised_data.json, which contains information about headphone items which we previously crawled from the Amazon website. (The implementation of this can be found in the repository.)

Data Preparation

The revised_data.json file provides details about the headphone items, including their names, descriptions, and other relevant information. To create the recommender system, we employ Elasticsearch to store and retrieve the data efficiently. We merge the 'name' and 'description' tags into a new tag called 'detail' for simplicity, which serves as the basis for similarity calculations.

Similarity Calculation

Using Elasticsearch's default scoring function, TF-IDF, we calculate the similarity between items based on their 'detail' tag. The recommender system generates a list of similar items for a given input item.

Evaluation Metrics

To evaluate the performance of the recommender system, we define two matrices:

Predictions Matrix: This matrix contains 20 top similar items for each item in the dataset based on the recommender system's output.
Ground Truth Matrix: This matrix is created using cosine similarity as the scoring function to represent the true similarity between items.

Evaluation metrics used:

Spearman Correlation Coefficient
Mean Average Precision (MAP)

Additional Approach

In an attempt to enhance performance, we experimented with representing data as vectors using embeddings before storing them in Elasticsearch. This approach involved converting data into chunks and then using embedding techniques to convert these chunks into vectors. However, we observed a decrease in system performance and the quality of results compared to the initial approach.

Phase 2: Assistant based on RAG Systems

In this phase, we develop an assistant that leverages the Retrieval Augmented Generation (RAG) system. RAG systems are an easy and popular way to use your own data. You can provide it as part of the prompt with which you query the LLM model. As you would retrieve the relevant data and use it as augmented context for the LLM. Instead of relying solely on knowledge derived from the training data, a RAG workflow pulls relevant information and connects static LLMs with real-time data retrieval.

Workflow

User inputs a question related to the headphone items (in 'revised_data.json' file).
The recommender system retrieves the most similar documents based on the query.
User's question and retrieved data serve as context for the Large Language Model (LLM), named "LLaMA".

Sample Output

example 1
example 2
example 3

Usage

In order to work with elasticsearch you have to use your own API key and Cloud ID in google colab. You might download json file revised_data.json as well.

Contact Us

We're excited to hear from you! If you have any questions, suggestions, or need assistance, don't hesitate to reach out. Feel free to contact us via email at:

We're here to help and would love to hear about your experience using this project.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
IR_HW4.ipynb		IR_HW4.ipynb
README.md		README.md
revised_data.json		revised_data.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recommender System and Retrieval Augmented Generation (RAG)

Introduction

Phase 1: Recommender System

Data Preparation

Similarity Calculation

Evaluation Metrics

Evaluation metrics used:

Additional Approach

Phase 2: Assistant based on RAG Systems

Workflow

Sample Output

Usage

Contact Us

About

Releases

Packages

Contributors 2

Languages

MehrnazSadeghieh/recommender-system-and-RAG

Folders and files

Latest commit

History

Repository files navigation

Recommender System and Retrieval Augmented Generation (RAG)

Introduction

Phase 1: Recommender System

Data Preparation

Similarity Calculation

Evaluation Metrics

Evaluation metrics used:

Additional Approach

Phase 2: Assistant based on RAG Systems

Workflow

Sample Output

Usage

Contact Us

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages