E.V.A. - Enhanced Voice Assistant🎙️

Multimodal, Multilingual, Cross Platform, Modular Architecture

🎯 Overview of EVA

Hello Github wizards! Thanks for stopping by~ 🤗

So, here's the story - I used to sling code back in the days (like, when Perl was still cool), but then a year ago AI came along and stroke me with awe. I became very interested in human-AI interaction and how it can be applied in our daily life. However, most of the online projects only focused on a few specific tasks. So I spent a few months to develop EVA myself.

EVA is an experimental voice assistant that explores human-AI experience through proactive engagement and autonomous behavior. Built with a modular architecture, it aims to provide a more natural and dynamic interaction to users, include an extensive tool framework that allows for continuous enhancement of its capabilities.

✨ Key Features

EVA is built on LangGraph framework, with some customized modules and tools. Importantly, You can run it purely local with no cost. (if you have a GPU computer)

🎙️ Cross platform modular design

Configurable model selection for LLM, TTS, STT, etc.
Integrated with OpenAI, Anthropic, Groq, Google, and Ollama.
Easy modification of prompts and tools.
Supports both desktop and mobile app

🖼️ Interactive experience

Voice ID and vision ID for personalized interaction.
Proactive style communication (varies between models)
Multi-modal outputs with asynchronous action.

🔌 Dynamic Tool system

Web search through DuckDuckGo/Tavily
Youtube video search
Discord Midjourney AI image generation
Suno music generation
Screenshot and analysis
Compatible with all Langchain tools
Easy implementation of new tool with single file.

📁 Project Structure

EVA/
├── app/
│   ├── client/          # Client-side implementation
│   ├── config/          # Configuration files and log
│   ├── core/            # Core process
│   ├── data/            # Data storage
│   ├── tools/           # Tool implementations
│   └── utils/           # Utility functions
│       ├── agent/       # LLM agent classes and functions
│       ├── memory/      # Mmeory module classes 
│       ├── prompt/      # Utility prompts
│       ├── stt/         # Speech recognition models and classes
│       ├── tts/         # Text-to-Speech models and classes
│       └── vision/      # Vision models and functions
├── tests/               # Test cases (😢)
└── docs/                # Documentation (😩)

🚀 Setup Guide

💻System Requirements

Python 3.10+
CUDA-compatible GPU (if you want to run locally)

📥 Quick Start

Clone repository

git clone https://github.com/Genesis1231/EVA.git
cd EVA

Create virtual environment

python3 -m venv eva_env
source eva_env/bin/activate

Install system dependencies in case you don't have them

sudo apt-get update
sudo apt-get install -y cmake build-essential ffmpeg chromium mpv

Install Python dependencies

pip install -r requirements.txt
pip install git+https://github.com/wenet-e2e/wespeaker.git

Configure .env with your API keys

cp .env.example .env

Run EVA

python app/main.py

Similarly, you can run EVA with docker.

# Use official Python image with FastAPI
FROM tiangolo/uvicorn-gunicorn-fastapi

# Set working directory
WORKDIR /app

# Copy requirements first for better caching
COPY requirements.txt .

# Install system dependencies 
RUN apt-get update && apt-get install -y \
    build-essential \
    cmake \
    libsndfile1 \
    ffmpeg \
    chromium \

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application
COPY . .

# Run the application 
CMD ["python", "/app/main.py"]

🛠️ Configuration

configure EVA setting in app/config/config.py

eva_configuration = {
    "DEVICE": "desktop", # Currently "desktop" or "mobile" (testing)
    "LANGUAGE": "en", # Englsih or Multilingual (much slower)
    "BASE_URL": "http://localhost:11434", # URL for local Ollama server, you can leave it if you dont plan to use local models
    "CHAT_MODEL": "anthropic", # Supports Anthropic-Claude3.5, Groq-llama3.1-70b, OpenAI-ChatGPT-4o, Mistral Large, Google Gemini 1.5 Pro, and Ollama models, Recommend: Claude or Chatgpt 
    "IMAGE_MODEL": "openai", # Supports Chatgpt-4o-mini and Ollama llava-phi3/llava13b(local), recommend: 4omini, but llava-phi3 is very small and free. 
    "STT_MODEL": "faster-whisper", # supports OpenAI Whisper, Groq(free) and Faster-whisper(local).  
    "TTS_MODEL": "elevenlabs", # Supports elevenlabs, openAI and coqui TTS (local).
    "SUMMARIZE_MODEL": "llama" # Supports groq-llama3.1-8b, Anthropic-claude-sonnet3.5 and Ollama-llama3.1(local).
}

The best combination(my preference):

Claude3.5/Chatgpt-4o as the chat model. The response is more coherent with larger amount of input information.
Chatgpt-4o-mini as the image model, because of accuracy and low cost.
Faster-whisper as the STT model. since this local approach is actually 2x faster than all online models.
Elevenlabs as the TTS model, for the best quality.

EVA also works with a completely free combination:

Groq-llama-3.2 as the chat model. (if you have a good GPU, you can also use Ollama-llama3.1-70b)
Ollama-llava-phi3 as the image model.
Faster-whisper as the speech recognition model.
Coqui TTS as the TTS model.

The performance is also good if you have a decent GPU. Groq is free too but it has a limit for token usage per minute. So you might run out of tokens quickly.

🔧 Tool Setup

Music generation tool Requires a Suno-API docker running on the base_url. Install from https://github.com/gcui-art/suno-api
Image generation tool requires a midjourney account and a private discord server. Need include the discord channel information in .env file.

If you want to disable some tools, just change the client setting in related .py file.

    client: str = "none"

But I like to leave them all on since it is very interesting to observe how AI select tools.

🧰 Exit & Shutdown

EVA will shutdown if you say "exit" or "bye"

🤝 Contribution

Due to my limited time, the code is far from perfect. I would be very grateful if anyone is willing to contribute🍝

📜 License

This project is licensed under the MIT License.

📊 Credits & Acknowledgments

This project wouldn't be possible without these amazing open-source projects:

Core & Language Models

LangChain - Amazing AI Dev Framework
Groq - Free LLM access and really fast
Ollama - Best local model deployment
Numpy - The Numpy
FastAPI - Excellent API framework
Tqdm - Great progress bar

Utility modules

OpenCV - Legendary Vision Library
Faster-Whisper - Fastest Speech transcription
Coqui TTS - Admirable text-to-speech synthesis
Face Recognition - Face detection
Speech Recognition - Easy-to-use Speech detection
PyAudio - Powerful Audio I/O
Wespeaker - Speaker verification
NLTK - Natural Language Toolkit

Tools development

Chromium - Best open-source web browser
DuckDuckGo - Free Web search
Youtube_search - YouTube search
Suno-API - Music generation API for Suno
PyautoGUI - cross-platform GUI automation

_{Built with ❤️ by the Adam}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
app		app
docs		docs
sample		sample
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
logo.png		logo.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E.V.A. - Enhanced Voice Assistant🎙️

🎯 Overview of EVA

✨ Key Features

🎙️ Cross platform modular design

🖼️ Interactive experience

🔌 Dynamic Tool system

📁 Project Structure

🚀 Setup Guide

💻System Requirements

📥 Quick Start

🛠️ Configuration

🔧 Tool Setup

🧰 Exit & Shutdown

🤝 Contribution

📜 License

📊 Credits & Acknowledgments

Core & Language Models

Utility modules

Tools development

About

Releases

Languages

License

Genesis1231/EVA

Folders and files

Latest commit

History

Repository files navigation

E.V.A. - Enhanced Voice Assistant🎙️

🎯 Overview of EVA

✨ Key Features

🎙️ Cross platform modular design

🖼️ Interactive experience

🔌 Dynamic Tool system

📁 Project Structure

🚀 Setup Guide

💻System Requirements

📥 Quick Start

🛠️ Configuration

🔧 Tool Setup

🧰 Exit & Shutdown

🤝 Contribution

📜 License

📊 Credits & Acknowledgments

Core & Language Models

Utility modules

Tools development

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages