DEV Community: Marco Gonzalez

RAG Integration: DeepSeek’s New BFF in the AI World

Marco Gonzalez — Mon, 27 Jan 2025 06:51:34 +0000

In this tutorial, I'll show you how to build a backend application using Azure OpenAI's Language Model (LLM) and introduce you to what's new with DeepSeek's LLM. It's simpler than it might sound!

Important Notes:

I will use Azure OpenAI Cloud service as an example. However, the steps and tips are applicable to any cloud provider you might be using.

May difference between OpenAI and DeepSeek does not lie on the setup, but the performance, so feel free to replace "DeepSeek" everytime you see "OpenAI" in this blog entry.

Topology

Explanation:

A) Data Ingestion Implementation:

1. Extract & Split text: The document (e.g., PDF or Excel files) is broken into smaller chunks of data.

2. Data Chunk: These chunks contain portions of the document text, ready for embedding.

3. Vector Representation: An embedding model processes each chunk of data, converting them into vector representations.

4. Data Vector Embedding: The data chunks are transformed into embeddings (numeric vectors) that represent the content.

5. Index & Save: The embeddings are stored in a vector database (Vector Store) for later retrieval.

Data Retrieval Implementation:

1. User Query: A user submits a query to the system. Key request Body parameters to consider are: Max Tokens, Temperature, Top_K

2. Query Vector Embedding: The embedding model converts the user query into an embedding vector.

3. Similarity Check: The query vector is compared against the stored data vector embeddings to find similar content.

4. Retrieval: The system retrieves the most relevant document chunks based on the similarity check.

5. Relevant Document Chunks: The retrieved relevant chunks of data are prepared for further processing.

6. Prompt: The system combines the user query and the relevant document chunks.

7. Multimodal Model: A multimodal model (like ChatGPT) processes the combined information.

8. Answer: The final answer is generated in JSON format and presented to the user.

Tools to use

For this tutorial, the following tools and Information will be used:

Python
Visual Studio Code
Azure OpenAI Service
DeepSeek
Endpoint: API Key

Implementation

Data Ingestion Implementation:

This step involves processing and storing external knowledge sources (e.g., documents, databases, or files) into a system where the LLM can later retrieve relevant information.
In this tutorial, we will use the Open-source Framework LangChain and Azure OpenAI service. LangChain is a framework for developing applications powered by large language models (LLMs). You can find more information about this framework here: Introduction | 🦜️🔗 LangChain

We will start by creating a new project/open an existing project in VSC.
Then we will open a Git Bash prompt and enter the following:

python3 -m venv venv
source venv/Scripts/activate

Before implementing the Data Ingestion, you need to install the following dependencies using the pip command within the virtual environment created in the previous step.

pip install langchain langchain-openai langchain langchain python-dotenv langchainhub black langchain-community python-dotenv faiss-cpu tiktoken

Now I will create 4 files for this tutorial:

.env: We will store all environment variables as a good security practice. In Commercial environment, this file will be stored in Key vault

rag-text.txt: We will use a text file to create our Context source for RAG implementation.

rag_ingestion.py: We will create the code here to ingest a demo file into Vector Store.

rag_retrieval.py: We will create the code here to use LLM and the indexed data to retrieve relevant pieces of information for a specific question.

We will now describe the content of each file:

.env File:
Thi file will include all necessary credentials to interact with LLM APIs, including Keys, model names and versions

# OpenAI Configuration
AZURE_OPENAI_ENDPOINT='https://[UR]/'
AZURE_OPENAI_API_KEY=<API_KEY>

OPENAI_MODEL_NAME_EMBEDDING=<OPENAI_EMBEDDING_MODEL>
OPENAI_API_VERSION_EMBEDDING=<OPENAI_EMBEDDING_API_VERSION>

OPENAI_MODEL_NAME_LLM=<OPENAI_LLM_MODEL> #e.g. GPT-4o
OPENAI_API_VERSION_LLM=<OPENAI_LLM_API_VERSION>

'rag-text' file:

For the ‘rag-text’ file, we need to paste the reference Data to be copied and pasted the content of this URL: rag-text

‘rag_ingestion.py’ file:

I will import the necessary packages and call the .env file. Based on the input data ‘rag-text’, we will divide data into chunks, generate vector embeddings and store into the faiss_index Database.

rag_ingestion.py

import os
from dotenv import load_dotenv

from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import CharacterTextSplitter
from langchain_openai import AzureOpenAIEmbeddings

load_dotenv()

if __name__ == '__main__':
    print("Ingesting...")
    loader = TextLoader('rag-text.txt', encoding='utf-8')
    document = loader.load()

    print("splitting...")
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(document)
    print(f"created {len(texts)} chunks")

    embeddings = AzureOpenAIEmbeddings(
    model=os.environ["OPENAI_MODEL_NAME_EMBEDDING"],
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    openai_api_version=os.environ["OPENAI_API_VERSION_EMBEDDING"]  #Use the correct version for the LLM Model selected
    )

    print("vector storing...")
    db = FAISS.from_documents(texts, embeddings)
    print(db.index.ntotal)
    # Write our index to disk.
    db.save_local("faiss_index")
    print("Ingestion is finish")

Let's discuss in detail each of the steps performed:

1.Extract & Split text & 2. Data Chunks: The document (e.g., PDF or Excel files) is broken into smaller chunks of data. I use Langchain modules: TextLoader and CharacterTextSplitter.

Chunk size: Customized parameter and the value depends on the Content type and desired retrieval quality. I chose 1000 as a good-balance number for standard text document, but you can adjust it based on your needs.

Overlap: If you need to ensure that no context is lost between chunks, using some overlap (e.g., 50-200 characters) can help. This makes sure that important information at the boundaries of chunks is not missed.

from langchain.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

[...]
    loader = TextLoader('text-files/rag-text.txt', encoding='utf-8')
    document = loader.load()

[...]
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(document)
[...]

Vector Representation & 4. Generate embeddings: The below code is configuring embeddings for an Azure OpenAI model using LangChain. Here's what each part does:

embeddings= AzureOpenAIEmbeddings(...): This initializes an instance of the AzureOpenAIEmbeddings class, which is part of LangChain. This class is responsible for generating vector embeddings from text using an Azure-hosted OpenAI model.

model=os.environ["OPENAI_MODEL_NAME_EMBEDDING"]: This retrieves the name of the embedding model from an environment variable (OPENAI_MODEL_NAME_EMBEDDING). This model will be used to generate the embeddings.

azure_endpoint=os.environ["OPENAI_ENDPOINT"]: The code pulls the Azure OpenAI endpoint URL from an environment variable (OPENAI_ENDPOINT). This is the endpoint of the Azure OpenAI service where the API requests are sent.

api_key=os.environ["OPENAI_API_KEY"]: It retrieves the API key for authenticating requests to Azure OpenAI from an environment variable (OPENAI_API_KEY). This key is necessary to access the Azure OpenAI service.

openai_api_version="2023-05-15": This specifies the version of the OpenAI API being used, ensuring compatibility with the correct API features and models. The version "2023-05-15" is the API version used for the selected language model.

    embeddings = AzureOpenAIEmbeddings(
    model=os.environ["OPENAI_MODEL_NAME_EMBEDDING"],
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    openai_api_version="2023-05-15"  #Use the correct version for the LLM Model selected
)

Index & Save:

The last part of the code uses the FAISS module to store the embedded vectors we have generated.

    print("vector storing...")
    db = FAISS.from_documents(texts, embeddings)
    print(db.index.ntotal)
    # Write our index to disk.
    db.save_local("faiss_index")
    print("Ingestion is finish")

To execute, run the below command in the command prompt

python rag_ingestion.py

(openai) C:\Users>python rag_ingestion.py
Ingesting...
splitting...
Created a chunk of size 1180, which is longer than the specified 1000      
Created a chunk of size 1058, which is longer than the specified 1000      
created 16 chunks
vector storing...
16
Ingestion is finish

B) Data Retrieval Implementation:

I will keep using LangChain for this last part of Data Retrieval Implementation, including the Azure OpenAI LLM Model chatgpt4o and embedding Model text-embedding-ada-3-large.

‘rag_retrieve.py’ file

Import the necessary packages and call the .env file. Using the vector embeddings stored in the faiss_index Database, we will perform a RAG query.

rag_retrieve.py

import os
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI,AzureOpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA

load_dotenv()
if __name__ == "__main__":
    print("Retrieving...")
    # Initialize Azure OpenAI embeddings with custom model name and correct API version
    embeddings = AzureOpenAIEmbeddings(
            model=os.environ["OPENAI_MODEL_NAME_EMBEDDING"],  # Use custom embedding model name from environment variable
            azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
            api_key=os.environ["AZURE_OPENAI_API_KEY"], 
            openai_api_version=os.environ["OPENAI_API_VERSION_EMBEDDING"],  # Correct API version for embedding
            openai_api_type="azure",  # Specify the API type
        )
    # Load the saved FAISS store from the disk.
    db = FAISS.load_local("faiss_index",  embeddings, allow_dangerous_deserialization=True
)

    llm = AzureChatOpenAI(
        azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
        azure_deployment=os.environ["OPENAI_MODEL_NAME_LLM"],  # Correct deployment model
        api_version=os.environ["OPENAI_API_VERSION_LLM"],  # Correct API version for deployment model
    )

    qa = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=db.as_retriever()
    )

    # Define the query
    query = "What is Pinecone in Machine Learning?"
    result = qa.invoke({"query":query})
    # Print the result
    print(result)

To execute, run the following command:

python rag_retrieve.py

(openai) C:\>python rag_retrieve.py
Retrieving...
{'query': 'What is Pinecone in Machine Learning?', 'result': "Pinecone is a fully managed cloud-based vector database designed to facilitate the building and deployment of large-scale machine learning (ML) applications. It is optimized for storing and retrieving vector embeddings generated by ML models, which represent complex data such as images, text, or audio as numerical vectors. These embeddings capture the essential features of the data, enabling efficient processing and analysis.\n\nKey features of Pinecone in the context of machine learning include:\n\n1. **Scalability**: Pinecone can 
handle millions or billions of data points, making it suitable for large-scale ML applications.\n2. **Performance**: It offers high query throughput and low latency search, ensuring fast and efficient retrieval of similar data points based on their vector representations.\n3. **Real-time Updates**: 
Pinecone supports real-time updates, allowing for efficient updates to the 
vector database as new data points are added.\n4. **Infrastructure Management**: Pinecone provides infrastructure management and maintenance, alleviating the need for users to handle these tasks themselves.\n5. **Security**: 
It meets the security needs of businesses and organizations.\n6. **User-friendly API**: Pinecone offers a simple API for storing and retrieving vector data, making it easy to integrate into existing ML workflows.\n7. **Integration**: Pinecone can be synced with data from various sources using tools 
like Airbyte and monitored using Datadog.\n\nOverall, Pinecone's capabilities make it an ideal platform for building and deploying ML applications that require efficient and scalable management of vector data."}

Making the Shift to DeepSeek

1/27/25 Update

I will update this with more detailed information, but as of now the easiest way to call DeepSeek API I found it here: DeepSeek

Note:
Instead of OpenAI embbedding model, I will use HuggingFaceEmbedding to interact with DeepSeek

rag_retrieve.py

import os
from dotenv import load_dotenv
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain_openai.chat_models.base import BaseChatOpenAI  # Import DeepSeek model
from langchain_community.embeddings import HuggingFaceEmbeddings  

load_dotenv()

if __name__ == "__main__":
    print("Retrieving...")

    # Initialize DeepSeek-compatible embeddings (using HuggingFace as an example)
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-mpnet-base-v2"  # Replace with a DeepSeek-compatible model if available
    )

    # Load the saved FAISS store from the disk.
    db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)

    # Replace AzureChatOpenAI with DeepSeek BaseChatOpenAI
    llm = BaseChatOpenAI(
        model='deepseek-chat', 
        openai_api_key=os.environ["DEEPSEEK_API_KEY"],  # Replace with DeepSeek API key from .env
        openai_api_base='https://api.deepseek.com/', 
        max_tokens=1024
    )

    qa = RetrievalQA.from_chain_type(
        llm=llm,
        retriever=db.as_retriever()
    )

    # Define the query
    query = "What is Pinecone in Machine Learning?"
    result = qa.invoke({"query": query})
    # Print the result
    print(result)

Want to know more about DeepSeek? Check below links:

Explore DeepSeek Resources

📚 DeepSeek API Documentation

🚀 Dive into the official DeepSeek API docs to get started with integration and advanced features.

🔗 DeepSeek Pricing

💡 Learn about DeepSeek's pricing plans and choose the best option for your needs.

Troubleshoot your OpenAI integration - 101

Marco Gonzalez — Wed, 11 Sep 2024 07:38:14 +0000

Hey everyone!

In this tutorial, I'm going to walk you through how to troubleshoot various scenarios when integrating your backend application with OpenAI's Language Model (LLM) solution.

Important Note:

For this guide, I'll be using Cloud AI services as an example. However, the steps and tips I'll share are applicable to any cloud provider you might be using. So, let's dive in!

Tools to use
1. Visual Studio Code
2. Postman
3. Postman Installation
  1. Step 1: Download the Postman App
  2. Step 2: Install Postman
  3. Step 3: Launch Postman
Troubleshooting
1. Troubleshooting API Integration - Multimodal Model
  1. Step 0: Collect OpenAI related information
  2. Step 1: Verify Correct Endpoint
  3. Step 2: Understand Body Configuration
  4. Step 3: Test OpenAI Endpoint
  5. Step 4: Test OpenAI Endpoint - VSC
2. Troubleshooting API Integration - Embedding Model
Useful Links

Tools to use

For this tutorial, I will use the following tools and Information:

Visual Studio Code
Postman
Azure AI Service
- Azure OpenAI
  - Endpoint
  - API Key

Visual Studio Code

Visual Studio Code (VS Code) is a powerful and versatile code editor developed by Microsoft. 🖥️ It supports various programming languages and comes equipped with features like debugging, intelligent code completion, and extensions for enhanced functionality. 🛠️ VS Code's lightweight design and customization options make it popular among developers worldwide. 🌍

Postman

Postman is a popular software tool that allows developers to build, test, and modify APIs. It provides a user-friendly interface for sending requests to web servers and viewing responses, making it easier to understand and debug the interactions between client applications and backend APIs. Postman supports various HTTP methods and functionalities, which helps in creating more efficient and effective API solutions.

Postman Installation

Step 1: Download the Postman App

Visit the Postman Website: Open your web browser and go to the Postman website.
Navigate to Downloads: Click on the "Download" option from the main menu, or scroll to the "Downloads" section on the Postman homepage.
Select the Windows Version: Choose the appropriate version for your Windows architecture (32-bit or 64-bit). If you are unsure, 64-bit is the most common for modern computers.

Step 2: Install Postman

Run the Installer: Once the download is complete, open the executable file (Postman-win64-<version>-Setup.exe for 64-bit) to start the installation process.
Follow the Installation Wizard: The installer will guide you through the necessary steps. You can choose the default settings, which are suitable for most users.
Finish Installation: After the installation is complete, Postman will be installed on your machine. You might find a shortcut on your desktop or in your start menu.

Step 3: Launch Postman

Open Postman: Click on the Postman icon from your desktop or search for Postman in your start menu and open it.
Sign In or Create an Account: When you first open Postman, you’ll be prompted to sign in or create a new Postman account. This step is optional but recommended for syncing your data across devices and with the Postman cloud.

Troubleshooting

Troubleshooting API Integration - Multimodal Model

To start troubleshooting API integration, I will refer to the following common error messages while verifying the integration:

Resource Not Found Error
Timeout Error
Incorrect API key provided Error

Step 0: Collect OpenAI related information

Let's retrieve the following information before starting our troubleshooting:

OpenAI Endpoint = https://[endpoint_url]/openai/deployments/[deployment_name]/chat/completions?api-version=[OpenAI_version]
OpenAI API Key = API_KEY
OpenAI version = [OpenAI_version]

Step 1: Verify Correct Endpoint

Let's review the OpenAI Endpoint we will use:

https://[endpoint_url]/openai/deployments/[deployment_name]/chat/completions?api-version=[OpenAI_version]

URL Breakdown

# 1. Protocol: `https`

Description: This protocol (https) stands for HyperText Transfer Protocol Secure, representing a secure version of HTTP. It uses encryption to protect the communication between the client and server.

# 2. Host: `[endpoint_url]`

Description: This part indicates the domain or endpoint where the service is hosted, serving as the base address for the API server. The [endpoint_url] is a placeholder, replaceable by the actual server domain or IP address.

# 3. Path: `/openai/deployments/[deployment_name]/chat/completions`

Description:
- /openai: This segment signifies the root directory or base path for the API, related specifically to OpenAI services.
- /deployments: This indicates that the request targets specific deployment features of the services.
- /[deployment_name]: A placeholder for the name of the deployment you're interacting with, replaceable with the actual deployment name.
- /chat/completions: Suggests that the API call is for obtaining text completions within a chat or conversational context.

# 4. Query: `?api-version=[OpenAI_version]`

Description: This is the query string, beginning with ?, and it includes parameters that affect the request:
- api-version: Specifies the version of the API in use, with [OpenAI_version] serving as a placeholder for the actual version number, ensuring compatibility with your application.

We will go to "Collections" and go to API tests/POST Functional folder. Then we need to verify the following:

REST API operation must be set to "POST"
Endpoint should have all required values, including Endpoint_URL, Deployment_Name and API-version.
API-key must be added in the "Headers" section

Find the below image for better reference:

Step 2: Understand Body Configuration

For this example, I will use the following sample Body data:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a mechanic who loves to help customers and responds in a very friendly manner to a car related questions"
    },
    {
        "role": "user",
        "content": "Please explain the role of the radiators in a car."
    }
  ]
}

Explanation of the `messages` Array

The messages array in the provided JSON object is structured to facilitate a sequence of interactions within a chat or conversational API environment. Each entry in the array represents a distinct message, defined by its role and content. Here's a detailed breakdown:

Message 1 🛠️

Role: "system"
- Description: This role typically signifies the application or service's backend logic. It sets the scenario or context for the conversation, directing how the interaction should proceed.
Content: "You are a mechanic who loves to help customers and responds in a very friendly manner to car related questions"
- Description: The content here acts as a directive or script, informing the recipient of the message about the character they should portray — in this case, a friendly and helpful mechanic, expert in automotive issues.

Message 2 🗣️

Role: "user"
- Description: This designates a participant in the dialogue, generally a real human user or an external entity engaging with the system.
Content: "Please explain the role of the radiators in a car."
- Description: This message poses a direct question intended for the character established previously (the mechanic). It seeks detailed information about the function of radiators in cars, initiating a topic-specific discussion within the established role-play scenario.

Each message in the array is crafted to foster an engaging dialogue by defining roles and providing content cues, which guide responses and interaction dynamics. This methodology is widespread in systems designed to simulate realistic conversations or provide role-based interactive experiences.

Find the below image for better reference. Note that I also select format as "raw" and the content type as "JSON":

Step 3: Test OpenAI Endpoint

If you have followed all above steps, you're ready to start testing your OpenAI Endpoint! Refer to the below image for the final steps and a sample result you should see.

Step 4: Test OpenAI Endpoint - VSC

The following Python code replicates above steps. Feel free to use after POSTMAN tests are successful

import requests
import json

# Define the URL of the API endpoint
url = "https://[endpoint_url]/openai/deployments/[deployment_name]/chat/completions?api-version=[OpenAI_version]"

# Define the API token
headers = {
    "api-key": "API_KEY",
    "Content-Type": "application/json"
}

# Define the JSON body of the request
data = {
    "messages": [
        {
            "role": "system",
            "content": "You are a mechanic who loves to help customers and responds in a very friendly manner to car related questions"
        },
        {
            "role": "user",
            "content": "Please explain the role of the radiators in a car."
        }
    ]
}

# Make the POST request to the API
response = requests.post(url, headers=headers, json=data)

# Check if the request was successful
if response.status_code == 200:
    # Print the response content if successful
    print("Response received:")
    print(json.dumps(response.json(), indent=4))
else:
    # Print the error message if the request was not successful
    print("Failed to get response, status code:", response.status_code)
    print("Response:", response.text)

Troubleshooting API Integration - Embedding Model

Under preparation 🛠️🔧🚧

Useful Links:

If you are using Azure AI and OpenAI LLM solutions, the following link will help you to understand how API integration is done:

Building your first ROSA🌹 with Red Hat and AWS

Marco Gonzalez — Thu, 11 Jul 2024 04:55:43 +0000

“When life throws thorns, hunt for roses.” – Anonymous

When market trends, pessimistic forecasts, and Global economics throw companies and us developers thorns (on many levels), hunt for a Rosa (the Spanish word for 'Rose'). In this ever-changing market, looking for the most suitable solution can always make a difference between a successful business and a "what if we have done this differently". In this blog entry, I introduce a solution that combines the best of both worlds, a top-notch Cloud Services provider (and market leader) and the most complete Container Management platform offered by Red Hat, Red Hat OpenShift enterprise Kubernetes platform on AWS.

Definition
What Makes Openshift Special
Architecture
Pre-requisites to create ROSA
ROSA Cluster Implementation
Delete ROSA Cluster

1. Definition

ROSA (Red Hat Openshift on AWS) is an example of a Platform service offered by Red Hat, as a sub-group of Red Hat Cloud Services.

Why is it important? it helps companies to spend more time building and deploying applications and less time managing infrastructure.

2. What makes Openshift special

To answer this, we should benchmark Red Hat Openshift to existing Cloud solutions. In this first blog entry, I will discuss about existing AWS solutions for Container Management such as AWS EKS (In my opinion, the closest AWS service that resembles ROSA):

Peerspot did a great job comparing both Products (Check detailed report here I will summarize some details I found relevant for this Blog entry.