End-to-end open-source voice agents platform: Quickly build LLM based voice driven conversational applications
Bolna is the end--to-end open source production ready framework for quickly building LLM based voice driven conversational applications.
demo-create-agent-and-make-calls.mp4
Bolna helps you create AI Voice Agents which can be instructed to do tasks beginning with:
- Initiating a phone call using telephony providers like
Twilio
, etc. - Transcribing the conversations using
Deepgram
, etc. - Using LLMs like
OpenAI
, etc to handle conversations - Synthesizing LLM responses back to telephony using
AWS Polly
,XTTS
, etc. - Instructing the Agent to perform tasks like sending emails, text messages, booking calendar after the conversation has ended
Refer to the docs for a deepdive into all supported providers.
This repo contains the following types of agents in the agents/agent_types
directory which can be used to create conversational applications:
contextual_conversational_agent
: LLM-based free flow agentgraph_based_conversational_agent
: LLM-based agent with classificationextraction_agent
: Currently WIP. Feel free to contribute and open a PR
A basic local setup uses Twilio for telephony. We have dockerized the setup in local_setup/
. One will need to populate an environment .env
file from .env.sample
.
The setup consists of four containers:
- Twilio web server: for initiating the calls one will need to set up a Twilio account
- Bolna server: for creating and handling agents
ngrok
: for tunneling. One will need to add theauthtoken
tongrok-config.yml
redis
: for persisting agents & users contextual data
Running docker-compose up --build
will use the .env
as the environment file and the agent_data
to start all containers.
Once the docker containers are up, you can now start to create your agents and instruct them to initiate calls.
The repo contains examples as a reference for creating for application agents in the agent_data
directory:
airbnb_job
: Astreaming
conversation
agent where the agent screens potential candidates for a job at AirBnBsorting_hat
: Apreprocessed
conversation
agent which acts as a Sorting Hat for Hogwartsyc_screening
: Astreaming
conversation
agent which acts as a Y Combinator partner asking questions around the idea/startupindian_elections_vernacular
: Astreaming
conversation
agent which asks people for their outlook towards Indian elections in Hindi languagesample_agent
: A boilerplate sample agent to start building your own agent!
All agents are read from the agent_data
directory. We have provided some samples for getting started. There's a dashboard coming up [still in WIP] which will easily facilitate towards creating agents.
General structure of the agents:
your-awesome-agent-name
├── conversation_details.json # Compiled prompt
└── users.json # List of users that the call would be made to
Agent type | streaming agent |
preprocessed agent |
---|---|---|
Introduction | A streaming agent will work like a free-flow conversation following the prompt | Apart from following the prompt, a preprocessed agent will have all responses from the agent preprocessed in the form of audio which will be streamed as per the classification of human's response |
Prompt | Required (defined in conversation_details.json ) |
Required (defined in conversation_details.json ) |
Preprocessing | Not required | Required (using scripts/preprocessed.py ) |
Note
Currently, the users.json
has the following user attributes which gets substituted in the prompt to make it customized for the call. More to be added soon!
- first_name
- last_name
- honorific
For instance, in the case of a preprocessed agent, the initial intro could be customized to have the user's name.
Even the prompt could be customized to fill in user contextual details from users.json. For example, {first_name} defined in prompt and prompt intro
- Create a directory under
agent_data
directory with the name for your agent - Create your prompt and save in a file called
conversation_details.json
using the example provided - Optional: In case if you are creating a
preprocessed
agent, generate the audio data used by using the scriptscripts/preprocess.py
- At this point, the docker containers should be up and running
- Your agent prompt should be defined in the
agent_data/
directory withconversation_details.json
with the user list inusers.json
- Create your agent using the Bolna Create Agent API. An agent will get created with an
agent_id
- Instruct the agent to initiate call to users via
scripts/initiate_agent_call.py <agent_name> <agent_id>
Though the repository is completely open source, you can connect with us if interested in managed offerings or more customized solutions.
We love all types of contributions: whether big or small helping in improving this community resource.
- There are a number of open issues present which can be good ones to start with
- If you have suggestions for enhancements, wish to contribute a simple fix such as correcting a typo, or want to address an apparent bug, please feel free to initiate a new issue or submit a pull request
- If you're contemplating a larger change or addition to this repository, be it in terms of its structure or the features, kindly begin by creating a new issue open a new issue and outline your proposed changes. This will allow us to engage in a discussion before you dedicate a significant amount of time or effort. Your cooperation and understanding are appreciated