Hello Github wizards! Thanks for stopping by~ π€
So, here's the story - I used to sling code back in the days (like, when Perl was still cool), but then a year ago AI came along and stroke me with awe. I became very interested in human-AI interaction and how it can be applied in our daily life. However, most of the online projects only focused on a few specific tasks. So I spent a few months to develop EVA myself.
EVA is an experimental voice assistant that explores human-AI experience through proactive engagement and autonomous behavior. Built with a modular architecture, it aims to provide a more natural and dynamic interaction to users, include an extensive tool framework that allows for continuous enhancement of its capabilities.
EVA is built on LangGraph framework, with some customized modules and tools. Importantly, You can run it purely local with no cost. (if you have a GPU computer)
- Configurable model selection for LLM, TTS, STT, etc.
- Integrated with OpenAI, Anthropic, Groq, Google, and Ollama.
- Easy modification of prompts and tools.
- Supports both desktop and mobile app
- Voice ID and vision ID for personalized interaction.
- Proactive style communication (varies between models)
- Multi-modal outputs with asynchronous action.
- Web search through DuckDuckGo/Tavily
- Youtube video search
- Discord Midjourney AI image generation
- Suno music generation
- Screenshot and analysis
- Compatible with all Langchain tools
- Easy implementation of new tool with single file.
EVA/
βββ app/
β βββ client/ # Client-side implementation
β βββ config/ # Configuration files and log
β βββ core/ # Core process
β βββ data/ # Data storage
β βββ tools/ # Tool implementations
β βββ utils/ # Utility functions
β βββ agent/ # LLM agent classes and functions
β βββ memory/ # Mmeory module classes
β βββ prompt/ # Utility prompts
β βββ stt/ # Speech recognition models and classes
β βββ tts/ # Text-to-Speech models and classes
β βββ vision/ # Vision models and functions
βββ tests/ # Test cases (π’)
βββ docs/ # Documentation (π©)
- Python 3.10+
- CUDA-compatible GPU (if you want to run locally)
Clone repository
git clone https://github.com/Genesis1231/EVA.git
cd EVA
Create virtual environment
python3 -m venv eva_env
source eva_env/bin/activate
Install system dependencies in case you don't have them
sudo apt-get update
sudo apt-get install -y cmake build-essential ffmpeg chromium mpv
Install Python dependencies
pip install -r requirements.txt
pip install git+https://github.com/wenet-e2e/wespeaker.git
Configure .env with your API keys
cp .env.example .env
Run EVA
python app/main.py
Similarly, you can run EVA with docker.
# Use official Python image with FastAPI
FROM tiangolo/uvicorn-gunicorn-fastapi
# Set working directory
WORKDIR /app
# Copy requirements first for better caching
COPY requirements.txt .
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
cmake \
libsndfile1 \
ffmpeg \
chromium \
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application
COPY . .
# Run the application
CMD ["python", "/app/main.py"]
configure EVA setting in app/config/config.py
eva_configuration = {
"DEVICE": "desktop", # Currently "desktop" or "mobile" (testing)
"LANGUAGE": "en", # Englsih or Multilingual (much slower)
"BASE_URL": "http://localhost:11434", # URL for local Ollama server, you can leave it if you dont plan to use local models
"CHAT_MODEL": "anthropic", # Supports Anthropic-Claude3.5, Groq-llama3.1-70b, OpenAI-ChatGPT-4o, Mistral Large, Google Gemini 1.5 Pro, and Ollama models, Recommend: Claude or Chatgpt
"IMAGE_MODEL": "openai", # Supports Chatgpt-4o-mini and Ollama llava-phi3/llava13b(local), recommend: 4omini, but llava-phi3 is very small and free.
"STT_MODEL": "faster-whisper", # supports OpenAI Whisper, Groq(free) and Faster-whisper(local).
"TTS_MODEL": "elevenlabs", # Supports elevenlabs, openAI and coqui TTS (local).
"SUMMARIZE_MODEL": "llama" # Supports groq-llama3.1-8b, Anthropic-claude-sonnet3.5 and Ollama-llama3.1(local).
}
The best combination(my preference):
- Claude3.5/Chatgpt-4o as the chat model. The response is more coherent with larger amount of input information.
- Chatgpt-4o-mini as the image model, because of accuracy and low cost.
- Faster-whisper as the STT model. since this local approach is actually 2x faster than all online models.
- Elevenlabs as the TTS model, for the best quality.
EVA also works with a completely free combination:
- Groq-llama-3.2 as the chat model. (if you have a good GPU, you can also use Ollama-llama3.1-70b)
- Ollama-llava-phi3 as the image model.
- Faster-whisper as the speech recognition model.
- Coqui TTS as the TTS model.
The performance is also good if you have a decent GPU. Groq is free too but it has a limit for token usage per minute. So you might run out of tokens quickly.
-
Music generation tool Requires a Suno-API docker running on the base_url. Install from https://github.com/gcui-art/suno-api
-
Image generation tool requires a midjourney account and a private discord server. Need include the discord channel information in .env file.
If you want to disable some tools, just change the client setting in related .py file.
client: str = "none"
But I like to leave them all on since it is very interesting to observe how AI select tools.
EVA will shutdown if you say "exit" or "bye"
Due to my limited time, the code is far from perfect. I would be very grateful if anyone is willing to contributeπ
This project is licensed under the MIT License.
This project wouldn't be possible without these amazing open-source projects:
- LangChain - Amazing AI Dev Framework
- Groq - Free LLM access and really fast
- Ollama - Best local model deployment
- Numpy - The Numpy
- FastAPI - Excellent API framework
- Tqdm - Great progress bar
- OpenCV - Legendary Vision Library
- Faster-Whisper - Fastest Speech transcription
- Coqui TTS - Admirable text-to-speech synthesis
- Face Recognition - Face detection
- Speech Recognition - Easy-to-use Speech detection
- PyAudio - Powerful Audio I/O
- Wespeaker - Speaker verification
- NLTK - Natural Language Toolkit
- Chromium - Best open-source web browser
- DuckDuckGo - Free Web search
- Youtube_search - YouTube search
- Suno-API - Music generation API for Suno
- PyautoGUI - cross-platform GUI automation