Skip to content

Enhanced Voice Assistant, cross-platform, multimodal and modular design. Supports OpenAI, Anthropic, Groq, Google, and Ollama.

License

Notifications You must be signed in to change notification settings

Genesis1231/EVA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

41 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

E.V.A. - Enhanced Voice AssistantπŸŽ™οΈ

EVA Logo

Multimodal, Multilingual, Cross Platform, Modular Architecture

Python Version GitHub Issues GitHub Stars License

🎯 Overview of EVA

Hello Github wizards! Thanks for stopping by~ πŸ€—

So, here's the story - I used to sling code back in the days (like, when Perl was still cool), but then a year ago AI came along and stroke me with awe. I became very interested in human-AI interaction and how it can be applied in our daily life. However, most of the online projects only focused on a few specific tasks. So I spent a few months to develop EVA myself.

EVA is an experimental voice assistant that explores human-AI experience through proactive engagement and autonomous behavior. Built with a modular architecture, it aims to provide a more natural and dynamic interaction to users, include an extensive tool framework that allows for continuous enhancement of its capabilities.

EVA Demo

✨ Key Features

EVA is built on LangGraph framework, with some customized modules and tools. Importantly, You can run it purely local with no cost. (if you have a GPU computer)

πŸŽ™οΈ Cross platform modular design

  • Configurable model selection for LLM, TTS, STT, etc.
  • Integrated with OpenAI, Anthropic, Groq, Google, and Ollama.
  • Easy modification of prompts and tools.
  • Supports both desktop and mobile app

πŸ–ΌοΈ Interactive experience

  • Voice ID and vision ID for personalized interaction.
  • Proactive style communication (varies between models)
  • Multi-modal outputs with asynchronous action.

πŸ”Œ Dynamic Tool system

  • Web search through DuckDuckGo/Tavily
  • Youtube video search
  • Discord Midjourney AI image generation
  • Suno music generation
  • Screenshot and analysis
  • Compatible with all Langchain tools
  • Easy implementation of new tool with single file.

πŸ“ Project Structure

EVA/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ client/          # Client-side implementation
β”‚   β”œβ”€β”€ config/          # Configuration files and log
β”‚   β”œβ”€β”€ core/            # Core process
β”‚   β”œβ”€β”€ data/            # Data storage
β”‚   β”œβ”€β”€ tools/           # Tool implementations
β”‚   └── utils/           # Utility functions
β”‚       β”œβ”€β”€ agent/       # LLM agent classes and functions
β”‚       β”œβ”€β”€ memory/      # Mmeory module classes 
β”‚       β”œβ”€β”€ prompt/      # Utility prompts
β”‚       β”œβ”€β”€ stt/         # Speech recognition models and classes
β”‚       β”œβ”€β”€ tts/         # Text-to-Speech models and classes
β”‚       └── vision/      # Vision models and functions
β”œβ”€β”€ tests/               # Test cases (😒)
└── docs/                # Documentation (😩)

πŸš€ Setup Guide

πŸ’»System Requirements

  • Python 3.10+
  • CUDA-compatible GPU (if you want to run locally)

πŸ“₯ Quick Start

Clone repository

git clone https://github.com/Genesis1231/EVA.git
cd EVA

Create virtual environment

python3 -m venv eva_env
source eva_env/bin/activate  

Install system dependencies in case you don't have them

sudo apt-get update
sudo apt-get install -y cmake build-essential ffmpeg chromium mpv

Install Python dependencies

pip install -r requirements.txt
pip install git+https://github.com/wenet-e2e/wespeaker.git

Configure .env with your API keys

cp .env.example .env

Run EVA

python app/main.py

Similarly, you can run EVA with docker.

# Use official Python image with FastAPI
FROM tiangolo/uvicorn-gunicorn-fastapi

# Set working directory
WORKDIR /app

# Copy requirements first for better caching
COPY requirements.txt .

# Install system dependencies 
RUN apt-get update && apt-get install -y \
    build-essential \
    cmake \
    libsndfile1 \
    ffmpeg \
    chromium \

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application
COPY . .

# Run the application 
CMD ["python", "/app/main.py"]

πŸ› οΈ Configuration

configure EVA setting in app/config/config.py

eva_configuration = {
    "DEVICE": "desktop", # Currently "desktop" or "mobile" (testing)
    "LANGUAGE": "en", # Englsih or Multilingual (much slower)
    "BASE_URL": "http://localhost:11434", # URL for local Ollama server, you can leave it if you dont plan to use local models
    "CHAT_MODEL": "anthropic", # Supports Anthropic-Claude3.5, Groq-llama3.1-70b, OpenAI-ChatGPT-4o, Mistral Large, Google Gemini 1.5 Pro, and Ollama models, Recommend: Claude or Chatgpt 
    "IMAGE_MODEL": "openai", # Supports Chatgpt-4o-mini and Ollama llava-phi3/llava13b(local), recommend: 4omini, but llava-phi3 is very small and free. 
    "STT_MODEL": "faster-whisper", # supports OpenAI Whisper, Groq(free) and Faster-whisper(local).  
    "TTS_MODEL": "elevenlabs", # Supports elevenlabs, openAI and coqui TTS (local).
    "SUMMARIZE_MODEL": "llama" # Supports groq-llama3.1-8b, Anthropic-claude-sonnet3.5 and Ollama-llama3.1(local).
}

The best combination(my preference):

  • Claude3.5/Chatgpt-4o as the chat model. The response is more coherent with larger amount of input information.
  • Chatgpt-4o-mini as the image model, because of accuracy and low cost.
  • Faster-whisper as the STT model. since this local approach is actually 2x faster than all online models.
  • Elevenlabs as the TTS model, for the best quality.

EVA also works with a completely free combination:

  • Groq-llama-3.2 as the chat model. (if you have a good GPU, you can also use Ollama-llama3.1-70b)
  • Ollama-llava-phi3 as the image model.
  • Faster-whisper as the speech recognition model.
  • Coqui TTS as the TTS model.

The performance is also good if you have a decent GPU. Groq is free too but it has a limit for token usage per minute. So you might run out of tokens quickly.

πŸ”§ Tool Setup

  • Music generation tool Requires a Suno-API docker running on the base_url. Install from https://github.com/gcui-art/suno-api

  • Image generation tool requires a midjourney account and a private discord server. Need include the discord channel information in .env file.

If you want to disable some tools, just change the client setting in related .py file.

    client: str = "none"

But I like to leave them all on since it is very interesting to observe how AI select tools.

🧰 Exit & Shutdown

EVA will shutdown if you say "exit" or "bye"

🀝 Contribution

Due to my limited time, the code is far from perfect. I would be very grateful if anyone is willing to contribute🍝

πŸ“œ License

This project is licensed under the MIT License.

πŸ“Š Credits & Acknowledgments

This project wouldn't be possible without these amazing open-source projects:

Core & Language Models

  • LangChain - Amazing AI Dev Framework
  • Groq - Free LLM access and really fast
  • Ollama - Best local model deployment
  • Numpy - The Numpy
  • FastAPI - Excellent API framework
  • Tqdm - Great progress bar

Utility modules

Tools development

Built with ❀️ by the Adam

About

Enhanced Voice Assistant, cross-platform, multimodal and modular design. Supports OpenAI, Anthropic, Groq, Google, and Ollama.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published