DEV Community: MrDoe

Accelerating ClippyAI with Embedding LLM and Vector Database

MrDoe — Sun, 10 Nov 2024 08:12:47 +0000

This is a submission for the Open Source AI Challenge with pgai and Ollama

What I Built

ClippyAI is an innovative, open-source multi-platform AI project designed to automate and simplify repetitive tasks like generating email responses, explaining, summarizing, and translating texts. Recently this year, I posted on DEV.to about ClippyAI, which uses Ollama to automatically generate answers for repetitive emails.

As this was working quite well, I extended it to a multi-purpose application, which seamlessly integrates into the Windows or Linux/X11 clipboard that can be used, e.g. to explain, summarize or translate texts and code. The only bottleneck has been the execution speed of the LLM interference, because to run larger models in Ollama at an adequate speed requires at least a modern CPU, or better a dedicated GPU.

Based on this contest, the idea to use an embedding LLM along with a vector database came to my mind. The database could serve as a cache to store the most common answers, so that there would be no need to generate all of them everytime.

In this project, I integrated an embedding LLM (nomic-embed-text) hosted by Ollama, a PostgreSQL vector database, and pg.ai to create a system that caches embeddings of template answers in a vector database. This setup allows for rapid retrieval of templates for similar questions or tasks, significantly reducing response times to only a few seconds on modern CPUs.

Basic Concept

Pg.ai provides us with the command ollama_embed, which can be used to send text to the embedding LLM nomic-embed-text hosted by Ollama. As result, it will return a vector with 768 dimensions. Such vector can be imagined like a compressed semantic description of the text. When you compare two resulting vectors of different text inputs by its euclidian distance, the meaning is the decisive factor, not the similarity in terms of the words or characters as it would be with classical string distance functions like the levenshtein algorithm.

Now, instead of directly passing the clipboard data and the task to a generative LLM hosted by Ollama, it is first being sent to a PostgreSQL database.

Storing Embeddings

Embeddings are high-dimensional vectors that represent the semantic meaning of text. A vector database stores these embeddings to enable rapid similarity searches.

To use this concept, we must first fill our vector database with data.
Before storing, we concatenate the clipboard data with the task description in the variable @question.
Then we calculate the embedding vector for the @question variable and store it together with the answer generated by the general purpose LLM:

INSERT INTO clippy (question, answer, embedding_question, embedding_answer)
SELECT
  @question,
  @answer,
  ai.ollama_embed('nomic-embed-text', @question);

From the ClippyAI GUI, this function will be executed when the thumb-up button is clicked, or the Store all responses as embeddings mode is active.

Embeddings which should not serve as templates, can also be removed by clicking the thumbs-down button.

Retrieving Answers from Embeddings

By using the extension pg.ai, the database can execute request to Ollama directly from a SQL query:

SELECT
  answer, 
  embedding_question <->
    ai.ollama_embed('nomic-embed-text', @question)
    as distance
FROM clippy
WHERE embedding_question <->
  ai.ollama_embed('nomic-embed-text', @question) <= @threshold
ORDER BY 3;

The <-> operator calculates the distance of two vectors and we are only taking results, which are below the user-specified @threshold variable.
Finally, we order the result set descending by the distance.
On the GUI side, users can scroll through the answers by pressing the >> button.

Benefits of This Integration

Enhanced Data Privacy: All data processing happens locally, ensuring high levels of data privacy.
Efficient Text Processing: Using an embedding LLM for text comparisions provides better results than just calculating the distance two strings, because the similarity is measured by a semantic likeliness.
Scalable AI Solutions: Combining PostgreSQL with pgai and pgvector allows for scalable and efficient AI solutions.
Cross-Platform Support: This setup works seamlessly on both Windows and Linux platforms.

Demo

Download the latest version at https://github.com/MrDoe/ClippyAI.
See the installation instructions for how to set up the PostgreSQL database with pg.ai.

Before submitting a task:

After submitting a task:

Tools Used

In my project, I utilized several powerful tools to build the AI system:

pgvector: This PostgreSQL extension allows for efficient storage and retrieval of high-dimensional vector data, enabling rapid similarity searches.
pgai: Provided the ability to execute requests to Ollama directly from SQL queries, making the integration seamless and efficient.
pgai Vectorizer: The vectorizer tool from pgai was used to generate embeddings from the text, which are then stored in the vector database.
Ollama's nomic-embed-text: This embedding LLM was crucial for transforming text into high-dimensional vectors that capture semantic meaning.
Docker: I used the PostgreSQL + pgai container to set up and run the database environment smoothly.
.NET SDK 8.0: Open Source Framework for C# applications from Microsoft.
Avalonia: Platform-independent UI Framework for .NET

Final Thoughts

Building this application was an exciting journey. I learned a lot about integrating and using embedding AI models with vector databases. The combination of PostgreSQL, pgai, and Ollama proved to be a powerful setup for text processing tasks.

I believe this project could significantly enhance productivity in various domains by providing quick and relevant responses to common queries. The seamless integration into the clipboard makes it a handy tool for everyday use, and the local data processing ensures that user data remains private and secure.

Prize Categories

This submission may qualify for the following prize categories:

Open-source Models from Ollama: For utilizing Ollama with the free nomic-embed-text LLM.
Vectorizer Vibe: For integrating pgai Vectorizer and leveraging vector databases.

Team Submissions

This project was a solo effort, so no additional team members need to be credited.

ClippyAI - Developing a Local AI Agent

MrDoe — Tue, 02 Jul 2024 22:50:48 +0000

Introduction

As a developer, I’ve always been passionate about creating tools that solve real-world problems. But there was one issue that consistently irked me: the never-ending stream of repetitive emails. Whether it was customer inquiries, tech support requests, or project updates, my inbox overflowed with similar questions day in and day out. It was like Groundhog Day, but with email threads.

The Annoyance Factor

Picture this: You’re sipping your morning coffee, already diving into some exciting coding challenges, and suddenly, ping! —another email lands in your inbox. It’s the same query you’ve answered a hundred times before. You sigh, type in your well-crafted response, and hit send. Rinse and repeat. It’s not just time-consuming; it’s soul-draining, because you totally lose focus of your coding work.

The Eureka Moment

One fateful afternoon, as I stared at my screen, contemplating the meaning of life (and another email), it hit me: Why not build an AI agent that assists you on these repetitive tasks?
Of course could I just use ChatGPT, but that's notowed at my company due to data privacy reasons. It's also very distracting and time-consuming to navigate to the website and copy and paste the source email, ask ChatGPT to write an answer and copy and paste the reply back to your email application.

The data protection part is easily solvable: With today's modern CPUs and GPUs it is possible to use Ollama and host the interference of an AI model of your choice locally at reasonable speed.

But how to integrate an agent for Ollama into your OS?

The DeepL Windows desktop app came into mind, where you just hit Ctrl+C twice and the app instantly translates the text you selected.

So my idea was to create a daemon who watches the clipboard for changes and then sends the content along with a task description to Ollama.

ClippyAI wouldn’t just suggest — I wanted it to take action. When I hit reply, it would automatically type out the response. Imagine the joy of watching ClippyAI do the grunt work while I sipped my coffee.

I chose the name ClippyAI as a mixture of "Clipboard" and "AI" and it was also inspired by the nostalgic Microsoft Office paperclip, which everyone hated these days.

Because I'm mostly a .NET developer, I used .NET 8 as foundation. I wanted to create a multi-platform application, because I'm using Windows at work and Linux at home, so I chose the Avalonia framework for this project.

After I realized, that the main idea was working well, I extended the ClippyAI's tasks to not just answering emails, but to also explain or translate the copied text or even to do some custom user defined tasks with it.

Key Features

Clipboard Integration: ClippyAI monitors your clipboard activity in real-time. Whenever you copy text, URLs, or other content, it automatically sends it to the Ollama AI model for analysis.
Context-Aware Responses: Modern AI models such as Llama3 or Gemma2 are able to consider the context of your task, if you give them enough input. Whether you’re drafting an email, writing code, or composing a document, ClippyAI can provide relevant and accurate responses.
Workflow Enhancement: By automating repetitive typing tasks, ClippyAI frees up your time and mental energy. Say goodbye to monotonous copy-paste routines!

Getting Started

Install Ollama from https://ollama.com.
Download and install the latest release from https://github.com/MrDoe/ClippyAI.

Early Development Phase

While ClippyAI shows some promise, it’s essential to note that it’s still in its early development phase. As with any cutting-edge technology, there are risks involved. Here’s what you should be aware of:

Use it at your own risk: ClippyAI is experimental. It may occasionally produce unexpected results or errors. Always double-check the generated content before finalizing it.
Document safety: ClippyAI may unintentionally delete or overwrite your existing documents, if you are using the keyboard output and Auto-Mode. So be careful where to place your cursor!
Known issues: German umlauts and other special characters are currently not typed in keyboard mode under Linux/X11.
Developers Wanted: ClippyAI is an open-source project that is open for contributions from developers like you. If you want to join the development, clone the repo and submit a pull request!

Conclusion

ClippyAI is still a work in progress. It won’t win any Turing Awards yet, but it’s a little side project I want to extend further. So, the next time you receive a prompt reply from me, know that ClippyAI is doing its thing. And if it ever goes rogue, blame the coffee.

Disclaimer: ClippyAI may occasionally channel its inner HAL 9000. Use at your own risk.

Hosting Your Own AI Chatbot on Android Devices

MrDoe — Sat, 06 Apr 2024 23:27:50 +0000

Are you tired of handing over your personal data to big tech companies every time you interact with an AI assistant? Well, I've got good news - there's a way to run powerful language models right on your Android smartphone or tablet, and it all starts with llama.cpp.

In this in-depth tutorial, I'll walk you through the process of setting up llama.cpp on your Android device, so you can experience the freedom and customizability of local AI processing. No more relying on distant servers or worrying about your data being compromised. It's time to take back control and unlock the full potential of modern machine learning technology.

The Advantages of Running a Large Language Model (LLM) Locally

Before we dive into the technical details, let's explore what's the reason for running AI models locally on Android devices.

Firstly, it gives you complete control over your data. When you engage with a cloud-based AI assistant, your conversations, queries, and even personal information are sent to remote servers, where you have little to no visibility or control over how it's used or even sold to third party companies.

With llama.cpp, everything happens right on your device. Your interactions with the AI never leave your smartphone or tablet, ensuring your privacy remains intact. Plus, you can even use these local AI models in places where you don't have an internet connection or aren't allowed to access cloud-based AI services, like some workplaces.

But the benefits don't stop there. By running a local Ai, you also have the power to customize it. Instead of being limited to the pre-built models offered by big tech companies, you can hand-pick AI models that are tailored to your specific needs and interests. Or, if you own the right hardware and are experienced with AI models, you can even fine-tune the models yourself to create a truly personalized AI experience.

Getting Started with llama.cpp on Android

Alright, let's dive into setting up llama.cpp on your Android device.

Prerequisites

Before we begin, make sure your Android device meets the following requirements:

Android 8.0 or later
At least 6-8GB of RAM for optimal performance
A modern Snapdragon or Mediatek CPU with at least 4 cores
Enough storage space for the application and language model files (typically 1-8GB)

Step 1: Install F-Droid and Termux

First, you'll need to install the F-Droid app repository on your Android device. F-Droid is a great source for open-source software, and it's where we'll be getting the Termux terminal emulator.

Head over to the F-Droid website and follow the instructions to install the app. Once that's done, open F-Droid and search for Termux and install the latest version.
Please don't use Google Play Store to install Termux, as the version there is very outdated.

Setup Termux Repositories (optional)

If you change the termux repository server to one in your country you can gain faster download speeds when installing packages:

termux-change-repo

If you need help, check the Termux Wiki site.

Step 2: Set up the llama.cpp Environment

With Termux installed, it's time to get the llama.cpp project up and running. Start by opening the Termux app and install the following packages, which we'll need later for compiling llama.cpp:

pkg i clang wget git cmake

Now clone the llama.cpp git repository to your phone:

git clone https://github.com/ggerganov/llama.cpp.git

Next, we need to set up the Android NDK (Native Development Kit) to compile the llama.cpp project. Visit the Termux-NDK repository and download the latest NDK release. Extract the ZIP file, then set the NDK path in Termux:

unzip [NDK_ZIP_FILE].zip
export NDK=~/[EXTRACTED_NDK_PATH]

Step 3.1: Compile llama.cpp with Android NDK

With the NDK set up, you can now compile llama.cpp for your Android device. There are two options: with or without GPU acceleration. I recommend starting with the non-GPU version, as it's a bit simpler to set up.

mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-24 -DCMAKE_C_FLAGS=-march=native ..
make

If everything goes well, you should now have working llama.cpp binaries in the build folder of the project. You can now continue with downloading a model file (Step 4).

Step 3.2 Build llama.cpp with GPU Acceleration (optional)

Building llama.cpp with OpenCL and CLBlast support can increase the overall performance, but requires some additional steps:

Download necessary packages:

apt install ocl-icd opencl-headers opencl-clhpp clinfo libopenblas

Download CLBlast, compile it and copy clblast.h into the llama.cpp folder:

git clone https://github.com/CNugteren/CLBlast.git
cd CLBlast
cmake .
cmake --build . --config Release
mkdir install
cmake --install . --prefix ~/CLBlast/install
cp libclblast.so* $PREFIX/lib
cp ./include/clblast.h ../llama.cpp

Copy OpenBLAS files to llama.cpp:

cp /data/data/com.termux/files/usr/include/openblas/cblas.h .
cp /data/data/com.termux/files/usr/include/openblas/openblas_config.h .

Build llama.cpp with CLBlast:

cd ~/llama.cpp
mkdir build
cd build
cmake -DLLAMA_CLBLAST=ON -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-24 -DCMAKE_C_FLAGS=-march=native -DCLBlast_DIR=~/CLBlast/install/lib/cmake/CLBlast ..
cd ..
make

Add LD_LIBRARY_PATH under ~/.bashrc（Run program directly on physical GPU）：

echo "export LD_LIBRARY_PATH=/vendor/lib64:$LD_LIBRARY_PATH:$PREFIX" >> ~/.bashrc

Check if GPU is available for OpenCL:

clinfo -l

If everything is working fine, e.g. for a Qualcomm Snapdragon SoC, it will display:

Platform #0: QUALCOMM Snapdragon(TM)
 `-- Device #0: QUALCOMM Adreno(TM)

Step 4: Download and Copy a Language Model

Finally, you'll need to download a compatible language model and copy it to the ~/llama.cpp/models directory. Head over to Hugging Face and search for a GGUF-formatted model that fits within your device's available RAM. I'd recommend starting with TinyLlama-1.1B.

Once you've downloaded the model file, use the

termux-setup-storage

command in Termux to grant access to your device's shared storage. Then, move the model file to the llama.cpp models directory:

mv ~/storage/downloads/model_name.gguf ~/llama.cpp/models

Step 5: Running llama.cpp

With the llama.cpp environment set up and a language model in place, you're ready to start interacting with your very own local AI assistant. I recommend to run the llama.cpp web server:

cd llama.cpp
./server -m models/[YourModelName].gguf -t [#threads]

Replace #threads with the number of cores of your Android device minus 1, otherwise it may become unresponsive.

And then access the AI chatbot locally by opening http://localhost:8080 in your mobile browser.

Alternatively, you can run the llama.cpp chat directly in Termux:

./main -m models/[YourModelName].gguf --color -inst

Conclusion

While performance will vary based on your device's hardware capabilities, even mid-range phones should be able to run llama.cpp reasonably well as long as you choose small enough models that fit into your device's memory. High-end devices will, of course, be able to take fuller advantage of the model's capabilities.