DEV Community: Tonya Sims

Identify Sales Insights from Meeting Audio

Tonya Sims — Tue, 27 Dec 2022 17:52:41 +0000

You just started your first day as a Python developer at Dunder Mifflin Paper Company, Inc. The President of Sales has an urgent request for you, to transcribe a sales meeting from speech to text with the regional manager, Michael Scott.

This is not just any sales meeting. Landing this client could determine the health and future of the company. You see, Michael Scott was kind of a goofball and had a habit of joking around too much during important sales calls so the VP of Sales, Jan, was sent to watch over him.

The President of Sales could not figure out why this client didn’t sign the contract.

Was there even a deal made?

Did Michael not close the sale?

Or did Michael scare the client away by telling his lame jokes?

He needed sales insights ASAP and the only way he could get them without being there was by using AI speech recognition and Python.

You’ve probably guessed by now, but if you haven’t, this is a classic scene from the hit sitcom, The Office.

If you want the full code sample of how to identify sales insights from meeting audio, skip to the bottom. If you want to know what happens next with the foolery then keep reading.

In this sales call scene from The Office Michael Scott moves the meeting to a restaurant, Chili’s, without anyone’s permission. Since this episode was released in the mid-2000s, we’re going to fast-forward to 2022. Let’s say this meeting didn’t happen in a restaurant, it occurred over everyone’s favorite, a video call.

You explained to the President of Sales that the meeting could be recorded, then uploaded to be transcribed using Python and speech-to-text. You elaborate that certain features can be used to gather sales insights from the meeting audio.

You ask the President what type of insights they need. They need a quick summary of the transcript, instead of reading through the whole thing, and the ability to search through the transcript to determine if Michael Scott mentioned business, deals, or jokes.

Conversation Intelligence and Sales Insights from Meeting Audio

ou have the perfect solution for a speech recognition provider, Deepgram. You get to coding, using their Python SDK.

The first thing you do is grab an API key here.

Then create a directory with a Python file inside. You use pip to install Deepgram pip install deepgram-sdk.

It was very easy to use with this code:

from deepgram import Deepgram
import json

DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’
PATH_TO_FILE = 'audio/the-office-meeting.mp3'

def main():
   # Initializes the Deepgram SDK
   deepgram = Deepgram(DEEPGRAM_API_KEY)
   # Open the audio file
   with open(PATH_TO_FILE, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       options = {
           'summarize': True,
           'search': ['business', 'deal', 'joke']
       }

       response = deepgram.transcription.sync_prerecorded(source, options)
       print(json.dumps(response, indent=4))

main()

You’re importing the libraries at the top:

from deepgram import Deepgram
import json

Copying and pasting your Deepgram API Key into the code and adding the path to the file you want to transcribe:

DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’
PATH_TO_FILE = 'audio/the-office-meeting.mp3'

Inside the main function, you’re initializing the Deepgram SDK. Then you open the audio file with open(PATH_TO_FILE, 'rb') as audio:. Since the file being transcribed is an MP3, that’s what you set as the mimetype, while passing the audio into the Python dictionary as well: source = {'buffer': audio, 'mimetype': 'audio/mp3'}.

You tap into their Summary and Search features as explained here, by creating an options object with those parameters.

options = {
           'summarize': True,
           'search': ['business', 'deal', 'joke']
       }

Lastly, this line response = deepgram.transcription.sync_prerecorded(source, options) will take in the audio and features and do the transcription. The results will then be printed with the following print(json.dumps(response, indent=4)).

You’ll receive a JSON response with the transcript, the summary, and the search findings. It looked something like this:

The Summary

"summaries": [
                            {
                                "summary": "Lack of one county has not been immune to the slow economic growth over the past five years. So for us, the name of the game is budget reduction.",
                                "start_word": 0,
                                "end_word": 597
                            }
]

The Search

"search": [
                    {
                        "query": "business",
                          "hits": [
                            {
                                "confidence": 1.0,
                                "start": 231.305,
                                "end": 231.705,
                                "snippet": "business"
                            },
                        "query": "deal",
                         "hits": [
                            {
                                "confidence": 0.7395834,
                                "start": 86.13901,
                                "end": 86.298805,
                                "snippet": "i'll"
                            },
                        "query": "joke",
                         "hits": [
                            {
                                "confidence": 1.0,
                                "start": 82.125,
                                "end": 82.284996,
                                "snippet": "one joke"
                            },

Your insights from the sales meeting. From the summary, it seems as if the customer wants to reduce costs and the search confidence indicates Michael Scott talked about business, didn’t discuss deals too much, and told some jokes.

You share this with the President of Sales. They now have a better understanding of what happened in the sales call, how to coach Michael Scott on closing future sales deals, and how to follow up with the customer.

Moving forward, all of Dunder Mifflin’s sales meetings were recorded, transcribed, and insights were derived using Deepgram to improve performance and maximize revenue. Corny jokes were only allowed if they helped build relationships with the customer.

The end.

Here’s the whole code sample:

from deepgram import Deepgram
import json

DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’
PATH_TO_FILE = 'audio/the-office-meeting.mp3'

def main():
   # Initializes the Deepgram SDK
   deepgram = Deepgram(DEEPGRAM_API_KEY)
   # Open the audio file
   with open(PATH_TO_FILE, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       options = {
           'summarize': True,
           'search': ['business', 'deal', 'joke']
       }

       response = deepgram.transcription.sync_prerecorded(source, options)
       print(json.dumps(response, indent=4))

main()

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.

Build an Agent Assist Bot with Python

Tonya Sims — Thu, 15 Dec 2022 21:36:58 +0000

Icing my swollen, disfigured hand, I was sitting on the couch, unable to drive to the store to grab some bandages and medication for the intense pain. I pulled up the website for the nearest store and started typing in the items I was looking for, all with one hand. It was my non-dominant hand at that.

I gave up. I was simply in too much pain and it was taking me forever to order these items online for delivery.

You may have spotted the problem. I couldn’t type fast enough and got impatient. My hand and fingers ballooned in size, and the pharmacy was also losing business because I couldn’t order what I needed.

You might be wondering how I broke my hand and what this has to do with building an agent-assist bot in Python. To keep a long story short, someone accidentally slammed the car door shut on my hand. It seemed fine, until a few hours later when it started turning blue and the pain became immense.

I didn’t go to the ER quickly enough and no one was around to take me. So I did what some people would do, I put an icepack on my hand hoping the swelling would go down.

Nope, didn’t work!

That’s when I started to panic. At that moment I picked up my phone, barely, and that’s when I tried placing an order for emergency items with my “good” hand.

Super frustrated I gave up.

That would have been a wonderful opportunity to use a speech-to-text chatbot, so an agent could have helped me quicker instead of ordering every item separately and adding each to an online checkout cart.

Enter Python.

Using a Speech-to-Text Provider With a Chatbot in Python for Agent-Assist

The situation with my now very hideous hand inspired the idea for this blog post tutorial. I thought to myself, how could my life have been made easier…and hand prettier, in the most simple, easiest way possible?

I would have loved to have just pushed a button and chatted with customer service, so my items could be ordered. By chat, I don’t mean type but rather talk and they send me a response based on what I say. That is pretty much an agent-assist chatbot using AI speech-to-text technology.

In this tutorial, I built a command line implementation of what that could have looked like using Deepgram, a speech recognition provider, ChatterBot a chatbot based on machine learning, and Python.

If you’d like to see the full code, skip to the end of the blog post. Before jumping into the code explanation, let’s take a look at why we might need speech-to-text and chatbots.

Why We Need AI Speech-to-Text With Customer Assist Using Python

There are many reasons why you might need automated speech recognition (ASR) for your next project, including:

Increase Accessibility - speech-to-text makes technology more accessible for people in various situations.
It’s Faster than Typing - think of all the time that could be saved if you could just speak and not have to type anything.
Increases Productivity and Profitability - speaking of time, it’s a great productivity and profitability booster for all involved.

These are just a few, but there are a bunch more use cases.

Why We need Chatbots Customer Assist Using Python

Many companies need chat along with phone support and use chatbots for interactions with customers. A few advantages of chatbots are:

They have 24/7 Availability - they are available all hours of the day for customers to get their questions answered.
Collect and analyze data - data can be collected and analyzed quicker from the chatbot sessions which improves customer experience.

Now we know why both speech-to-text and chatbots are important, so let’s dive into the tech and discover which tools to use to build our agent-assist chatbot with Python.

Speech-to-Text Chatbot with Python

There are a few things I needed to get set up first before I started coding.

Step 1 - Make sure to use a version of Python that is at or below 3.9, to work with our selected chatbot Python library, ChatterBot.
Step 2 - Grab a Deepgram API Key from our Console. Deepgram is a speech recognition provider that transcribes prerecorded or live-streaming audio from speech to text.
Step 3 - Create a directory called python-agent-bot on my computer and opened it with a code editor, like VS Code.
Step 4 - Inside the directory create a new Python file. I called mine chatbot.py.
Step 5 - It’s recommended to create a virtual environment and install all the Python libraries inside, but not required. For more on creating a virtual environment, check out this blog post.
Step 6 - Install the following Python libraries inside the virtual environment with pip like this:

pip install chatterbot==1.0.2
pip install pytz
pip install pyaudio
pip install websockets

Wonderful! Now that everything is set up let’s walk through the Python code section by section. Make sure to add it to the file chatbot.py.

Here we are importing the necessary Python packages and libraries we need for our speech-to-text chatbot with ChatterBot.

from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer
import pyaudio
import asyncio
import websockets
import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.ERROR)

Copy and paste the Deepgram API Key you created in the console and add it here:

DEEPGRAM_API_KEY = ‘YOUR_DEEPGRAM_API_KEY_GOES_HERE`

The below are setting we need for PyAudio, to grab the audio from your computer’s mic:

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 8000

Create a new instance of ChatBot and start training the chatbot to respond to you.

bot = ChatBot('Bot')
trainer = ListTrainer(bot)

 trainer.train([
   'Hi',
   'Hello',
   'I need to buy medication.',
   'Sorry you are not feeling well. How much medication do you need?',
   'Just one, please',
   'Medication added. Would you like anything else?',
   'No Thanks',
   'Your order is complete! Your delivery will arrive soon.'
])

This callback is needed for PyAudio which puts an item into the queue without blocking.

audio_queue = asyncio.Queue()

def callback(input_data, frame_count, time_info, status_flag):
   audio_queue.put_nowait(input_data)
   return (input_data, pyaudio.paContinue)

Next, we access the mic on our machine with PyAudio.

async def microphone():
   audio = pyaudio.PyAudio()
   stream = audio.open(
       format = FORMAT,
       channels = CHANNELS,
       rate = RATE,
       input = True,
       frames_per_buffer = CHUNK,
       stream_callback = callback
   )

   stream.start_stream()

   while stream.is_active():
       await asyncio.sleep(0.1)

   stream.stop_stream()
   stream.close()

Here the WebSocket gets handled and hits the Deepgram API endpoint. In the nested receiver function is where we get the transcript, what the customer says, and print the agent’s response.

async def process():
   extra_headers = {
       'Authorization': 'token ' + DEEPGRAM_API_KEY
   }

   async with websockets.connect('wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=16000&channels=1', extra_headers = extra_headers) as ws:
       async def sender(ws):
           try:
               while True:
                   data = await audio_queue.get()
           except Exception as e:
               print('Error while sending: ', str(e))
               raise

       async def receiver(ws):
           async for msg in ws:
               msg = json.loads(msg)
               transcript = msg['channel']['alternatives'][0]['transcript']

               if transcript:
                   print('Customer(you):', transcript)
                   if transcript.lower() == "okay":
                       print('Agent: bye')
                       break
                   else:
                       response=bot.get_response(transcript)
                       print('Agent:', response)

       await asyncio.wait([
           asyncio.ensure_future(microphone()),
           asyncio.ensure_future(sender(ws)),
           asyncio.ensure_future(receiver(ws))
       ])

Finally, we call the main function to execute our code.

def main():
   asyncio.get_event_loop().run_until_complete(process())

if name == '__main__':
   main()

To run the program and give it a try, type python3 chatbot.py from your terminal. Start by saying Hi, then the agent will respond Hello in a typed message, and so on.

Here’s an example of what the conversation would look like:

I hope you enjoyed this tutorial and all the possibilities that come with speech-to-text and chatbots in Python. The full code is below.

Full Code of Speech-to-Text Chatbot with Python

from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer
import pyaudio
import asyncio
import websockets
import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.ERROR)

DEEPGRAM_API_KEY = "YOUR-DEEPGRAM-API-KEY"

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 8000

bot = ChatBot('Bot')
trainer = ListTrainer(bot)

trainer.train([
   'Hi',
   'Hello',
   'I need to buy medication.',
   'Sorry you are not feeling well. How much medication do you need?',
   'Just one, please',
   'Medication added. Would you like anything else?',
   'No Thanks',
   'Your order is complete! Your delivery will arrive soon.'
])

audio_queue = asyncio.Queue()

def callback(input_data, frame_count, time_info, status_flag):
   audio_queue.put_nowait(input_data)
   return (input_data, pyaudio.paContinue)


async def microphone():
   audio = pyaudio.PyAudio()
   stream = audio.open(
       format = FORMAT,
       channels = CHANNELS,
       rate = RATE,
       input = True,
       frames_per_buffer = CHUNK,
       stream_callback = callback
   )

   stream.start_stream()

   while stream.is_active():
       await asyncio.sleep(0.1)

   stream.stop_stream()
   stream.close()

async def process():
   extra_headers = {
       'Authorization': 'token ' + DEEPGRAM_API_KEY
   }

   async with websockets.connect('wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=16000&channels=1', extra_headers = extra_headers) as ws:
       async def sender(ws):
           try:
               while True:
                   data = await audio_queue.get()
           except Exception as e:
               print('Error while sending: ', str(e))
               raise

       async def receiver(ws):
           async for msg in ws:
               msg = json.loads(msg)
               transcript = msg['channel']['alternatives'][0]['transcript']

               if transcript:
                   print('Customer(you):', transcript)

                   if transcript.lower() == "okay":
                       print('Agent: bye')
                       break
                   else:
                       response=bot.get_response(transcript)
                       print('Agent:', response)

       await asyncio.wait([
           asyncio.ensure_future(microphone()),
           asyncio.ensure_future(sender(ws)),
           asyncio.ensure_future(receiver(ws))
       ])


def main():
   asyncio.get_event_loop().run_until_complete(process())

if name == '__main__':
   main()

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.

Taking Notes with Voice in Python

Tonya Sims — Thu, 01 Dec 2022 18:59:35 +0000

In this blog post tutorial, we’ll learn how to take notes in Python using our voice. This means we can take an audio file and use AI speech-to-text to transcribe it. One could imagine dozens of scenarios where this could be helpful: from capturing the content of voice memos to providing a tidy written recap of a meeting to folks who couldn't attend.

Getting transcriptions out of these recordings is a pretty straightforward process. This project builds on Deepgram's speech-to-text APIs, which deliver high-quality AI-generated transcripts from both real-time streaming and batch processing pre-recorded audio sources. The project we'll do in this tutorial works with pre-recorded audio files.

Let’s walk through step-by-step taking notes with the voice in Python.

A Learn-by-Doing Speech AI Project in Python

Here’s a list of what we’ll cover in this project:

Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK
Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python
Step 3 - Setup Your Python Project
Step 4 - Install Your Python Libraries and Packages using pip
Step 5 - How to Upload the Audio File in Python with Voice
Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python
Final Step - Run the Python Voice Note-Taking Project and Export the Results

Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK

Deepgram has a Python SDK that we can tap into that’s located on Github. We’ll also need to get started with an API key which we can grab in Console, a game-like hub in Deepgram to try the different types of transcriptions in many coding languages, including Python. When you first sign up, you'll get $150 in API credits to try out Deepgram's speech AI capabilities.

Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python

Our project, taking notes with voice in Python, will use the Deepgram speech-to-text transcription API and some of its more advanced capabilities to enhance our voice notes. Here are the following features we’ll use along with transcribing audio:

Diarization - Recognizes multiple people speaking and assigns a speaker to each word in the transcript.
Summarization - Summarize sections of the transcript so that you can quickly scan it.

We’ll see in a few sections how to easily implement these features in our Python project.

Step 3 - Setup Your Python Project

There are a few items we need to set up before we begin coding. I’m using Python3.10 for our project but any version equal to or higher than Python 3.7 will work. Create a folder directory anywhere on your computer, let’s call it voice-notes-with-python.

Then, open that same directory in a code editor like Visual Studio.

Next, create a virtual environment. This ensures our Python libraries get installed in that project and not system wide. Make sure we’re in the correct project directory and run these quick commands from the terminal to create the Python virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate

Finally, let’s create a Python file inside our directory called take_voice_notes.py.

Step 4 - Install Your Python Libraries and Packages using `pip`

Now we are ready to install Deepgram using pip. Make sure your virtual environment is activated and run the following command:

pip install deepgram-sdk

This allows us to use the Deepgram speech-to-text Python SDK for transcription, and tap into the features we mentioned earlier.

To verify that Deepgram was installed correctly, from the terminal type:

pip freeze

We should see the latest version of Deepgram from PyPI is installed and ready for use.

Step 5 - How to Transcribe the Audio File in Python with Voice

We’ll use Deepgram’s prerecorded transcription for this taking notes with voice Python project. This type of transcription is used to transcribe an audio file, either locally on your drive or by hosting it online. In this tutorial, we’ll transcribe audio using a local but this AI speech recognition provider, it’s very simple to do both. Let’s see how we transcribe an audio file either as a local download or an online file.

Transcribe a Local Audio File with Python

from deepgram import Deepgram
import json

DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’
PATH_TO_FILE = 'some/file.wav'

def main():
    # Initializes the Deepgram SDK
    deepgram = Deepgram(DEEPGRAM_API_KEY)

    # Open the audio file
    with open(PATH_TO_FILE, 'rb') as audio:
        # ...or replace mimetype as appropriate
        source = {'buffer': audio, 'mimetype': 'audio/wav'}
        response = deepgram.transcription.sync_prerecorded(source, {'punctuate': True})
        print(json.dumps(response, indent=4))

main()

Transcribe a Hosted Online Audio File with Python

from deepgram import Deepgram
import json

# The API key we created in step 3
DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’

# Hosted sample file
AUDIO_URL = "{YOUR_URL_TO_HOSTED_ONLINE_AUDIO_GOES_HERE}"

def main():
    # Initializes the Deepgram SDK
    dg_client = Deepgram(YOUR_API_KEY_GOES_HERE)
    source = {'url': AUDIO_URL}
    options = { "punctuate": True, "model": "general", "language": "en-US", "tier": "enhanced" }
    response = dg_client.transcription.sync_prerecorded(source, options)
    print(json.dumps(response, indent=4))

main()

Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python

Now that we have an idea of what our Python code looks like, let’s see an example with our diarize and summarization features. In the same function as above, we can just pass in those features to a Python dictionary as keys and set the values to True, like so:

 with open(PATH_TO_FILE, 'rb') as audio:
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = deepgram.transcription.sync_prerecorded(source,                                                          
                                         {'diarize': True,                                                    
                                         'summarize': True}
                                                           )

Final Step - Run the Python Voice Note-Taking Project and Export the Results

We’ve reached the final step! In this step, we need to run the Python project so we can see our JSON response with the transcript split into multiple speakers and summaries.

From our terminal type:

python3 take_voice_notes.py > notes.txt

This runs our project and outputs a file called notes.txt, which is now in our directory.

Open the file and we see a JSON response that looks like the following, depending on which audio file was transcribed:

"alternatives": [
                    {
                        "transcript": "Hello, and thank you for being in this meeting...",
                        "confidence": 0.9916992,
                        "words": [
                            {
                                "word": "hello",
                                "start": 15.259043,
                                "end": 15.338787,
                                "confidence": 0.95751953,
                                "speaker": 0,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "Hello,"
                            },
                            {
                                "word": "and",
                                "start": 15.418532,
                                "end": 15.617893,
                                "confidence": 0.99853516,
                                "speaker": 1,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "and"
                            },
                            {
                                "word": "thank",
                                "start": 15.617893,
                                "end": 15.777383,
                                "confidence": 0.9975586,
                                "speaker": 1,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "thank"
                            },
                            {
                                "word": "you",
                                "start": 15.777383,
                                "end": 15.9368725,
                                "confidence": 0.9975586,
                                "speaker": 0,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "you"
                            },
],
     "summaries": [
                            {
                                "summary": "Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes. How may I help you today? I'm having some serious problem with my phone. Can you describe in detail for me? What kind of issues you're having with your device? Well, it isn't working.",
                                "start_word": 0,
                                "end_word": 649
                            },
                            {
                                "summary": "My phone won't turn on. I don't know what's wrong. My dad said I should get a new phone, but I didn't listen to him. I also never backed up my photos on the cloud like I know I should.",
                                "start_word": 649,
                                "end_word": 1288
                            },
        }
]

We received the transcript, and each word in the transcript gets assigned a speaker and the summaries of the transcript at the end of the response.

Conclusion of the Python Voice Note-taking Project with Speech Recognition

We’ve learned how to transcribe audio and take notes in voice with Python and an AI speech-to-text provider.

There are many ways to extend this project by using some of Deepgram's other features like redaction which hides sensitive information like credit card numbers or social security numbers or the search feature which searches a transcript for terms and phrases. For a full list of all the features, please visit this page.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.

Compliance Monitoring for Call Centers

Tonya Sims — Thu, 01 Dec 2022 17:20:59 +0000

Ensuring legal and policy compliance is a critical issue for the folks managing and leading a call center operation. In the following post, we'll dig into how Deepgram's speech AI platform can integrate into monitoring and compliance workflows.

Whenever an agent speaks with a customer, it can be helpful to get a call transcript in real-time and detect if the agent is complying with standards. For example, a common phrase that everyone has likely heard when calling customer service is “this call may be recorded for quality assurance purposes”. Most times, the customer service agent is legally required to inform the customer that the call is recorded.

We’ll use Python and Deepgram's speech-to-text API to see how simple it is to receive a transcript with live streaming in real time. We’ll also tap into some features that will recognize each speaker in the conversation, quickly search through the transcript for a phrase and recognize words that the model hasn’t been trained on or hasn’t encountered frequently.

Before You Start with Compliance Monitoring in Python

In this post, I’m using Python 3.10, so if you want to follow along, make sure you have that version installed. You will also need to grab a Deepgram API Key, which you can get here.

Next: Create a directory, I called mine monitor_compliance.

Then: Go to that directory and create a virtual environment inside so all of the Python libraries can be installed there instead of globally on your computer. To install the virtual environment run the following command inside your directory in the terminal: python3 -m venv venv. Now activate it by doing this: source venv/bin/activate.

Installing Python Packages for Compliance Monitoring with Speech to Text

You’ll need to install some Python packages inside your virtual environment for the project to work properly. You can use Python’s pip command to install these packages. Make sure your virtual environment is active. Then, from your terminal, install the following:

pip install PyAudio
pip install websockets

You’ll only need two Python libraries, PyAudio and websockets. The PyAudio library allows you to get sound from your computer’s microphone. The WebSockets Python library is used too since we’re working with live streaming. Deepgram also has a Python SDK but in this post, we’ll hit the API endpoint directly.

Python Code Dependencies and File Setup

Create an empty Python file called monitor.py and add the following import statements:

import pyaudio
import asyncio
import websockets
import os
import json
from pprint import pprint

Next, add your Deepgram API Key:

DEEPGRAM_API_KEY=’REPLACE_WITH_YOUR_DEEPGRAM_API_KEY’

Define the Python Variables

Below the DEEPGRAM_API_KEY you’ll need to define some Python variables. The constants are PyAudio related and the audio_queue is an asynchronous queue that we’ll use throughout our code.

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 8000

audio_queue = asyncio.Queue()

The Python Callback Code for Compliance Monitoring with Speech to Text

We need this callback to pass as an argument when we create our PyAudio object to get the audio.

def callback(input_data, frame_count, time_info, status_flags):
  # Put an item into the queue without blocking.
   audio_queue.put_nowait(input_data)

   return (input_data, pyaudio.paContinue)

Getting the Microphone Audio in Python

We connect right away to the microphone in this asynchronous function, create our PyAudio object and open a stream.

async def microphone(): 
   audio = pyaudio.PyAudio()
   stream = audio.open(
       format = FORMAT,
       channels = CHANNELS,
       rate = RATE,
       input = True,
       frames_per_buffer = CHUNK,
       stream_callback = callback

   )

stream.start_stream()
   while stream.is_active():
       await asyncio.sleep(0.1)

   stream.stop_stream()
   stream.close()

Open the Websocket and Connect to Deepgram Real Time Speech to Text

This code authorizes Deepgram and opens the WebSocket to allow real-time audio streaming. We are passing in some of the Deepgram features in the API call like:

diarize - captures each speaker in the transcript and gives them an ID.

search - searches for the phrase in the transcript "this call may be recorded for quality and training purposes".

keywords - correctly identifies the participant's last name and terminology

async def process():
   extra_headers = {
       'Authorization': 'token ' + DEEPGRAM_API_KEY
   }

   async with websockets.connect('wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=16000&channels=1&'\
                                   '&punctuate=true' \
                                   '&diarize=true' \
                                   '&search=this+call+may+be+recorded+for+quality+and+training+purposes' \
                                   '&keywords=Warrens:2' \
                                   '&keyword_boost=standard',
                                   extra_headers = extra_headers) as ws:

       async def sender(ws): 
           try:
               while True:
                   data = await audio_queue.get() 
                   await ws.send(data)
           except Exception as e:
               print('Error while sending: ', + str(e))
               raise

       async def receiver(ws): # receives the transcript
           async for msg in ws:
               msg = json.loads(msg)
               pprint(msg)
               transcript = msg['channel']['alternatives'][0]['transcript']
               words = msg['channel']['alternatives'][0]['words']

               for speaker in words:
                   print(f"Speaker {speaker['speaker']}: {transcript} ")

                   break


       await asyncio.gather(sender(ws), receiver(ws))

Run the Python Code for Compliance Monitoring

Finally, we get to run the code for the project. To do so, add the below lines, and from your terminal type the following command: python3 monitor.py:

async def run():
   await asyncio.gather(microphone(),process())

if __name__ == '__main__':
   asyncio.run(run())

Depending on the streaming audio used, you can expect to get a response like the following:

Diarization

Speaker 0: Hello. 
Speaker 0: Can you hear me? 
Speaker 0: Hello, and thank you for calling Premier phone service. 
Speaker 0: Be aware that this call may be recorded for quality and training purposes. My name is Beth and will be assisting you today. 
Speaker 0: How are you doing? 
Speaker 1: Not too bad. 
Speaker 1: How are you today? 
Speaker 0: I'm doing well. Thank you. May I please have your name? 
Speaker 1: My name is Blake Warren.

Search

 'search': [{'hits': [{'confidence': 0.8900703,
                                   'end': 15.27,
                                   'snippet': 'this call may be recorded for '
                                              'quality and training purposes '
                                              'my name is',
                                   'start': 11.962303},
                                  {'confidence': 0.3164375,
                                   'end': 17.060001,
                                   'snippet': 'and training purposes my name '
                                              'is beth and i will be assisting '
                                              'you today',
                                   'start': 13.546514}],
                         'query': 'this call may be recorded for quality and '
                                  'training purposes'}]},

Extending the Project Compliance Monitoring with Speech to Text

Hopefully, you had fun working on this project. Monitoring compliance in call centers with Python and Deepgram can be simple and straightforward. You can extend the project further by using some of Deepgram’s other features for streaming.

The final code for this project is as follows:

import pyaudio
import asyncio
import websockets
import os
import json
from pprint import pprint

DEEPGRAM_API_KEY = "YOUR_DEEPGRAM_API_KEY"

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 8000

audio_queue = asyncio.Queue()

def callback(input_data, frame_count, time_info, status_flags):
   audio_queue.put_nowait(input_data) 
   return (input_data, pyaudio.paContinue)


async def microphone(): 
   audio = pyaudio.PyAudio()
   stream = audio.open(
       format = FORMAT,
       channels = CHANNELS,
       rate = RATE,
       input = True,
       frames_per_buffer = CHUNK,
       stream_callback = callback
   )

   stream.start_stream()

   while stream.is_active():
       await asyncio.sleep(0.1)


   stream.stop_stream()
   stream.close()

async def process():
   extra_headers = {
       'Authorization': 'token ' + DEEPGRAM_API_KEY
   }

   async with websockets.connect('wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=16000&channels=1&'\
                                   '&punctuate=true' \
                                   '&diarize=true' \
                                   '&search=this+call+may+be+recorded+for+quality+and+training+purposes' \
                                   '&keywords=Warrens:2' \
                                   '&keyword_boost=standard',
                                   extra_headers = extra_headers) as ws:

       async def sender(ws): 
           try:
               while True:
                   data = await audio_queue.get() 
                   await ws.send(data)
           except Exception as e:
               print('Error while sending: ', + str(e))
               raise

       async def receiver(ws): 
           async for msg in ws:
               msg = json.loads(msg)
               pprint(msg)
               transcript = msg['channel']['alternatives'][0]['transcript']
               words = msg['channel']['alternatives'][0]['words']

               for speaker in words:
                   print(f"Speaker {speaker['speaker']}: {transcript} ")

                   break


       await asyncio.gather(sender(ws), receiver(ws))



async def run():
   await asyncio.gather(microphone(),process())

if __name__ == '__main__':
   asyncio.run(run())

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.

How to Loop Through a Podcast Episode List using Async IO with Python

Tonya Sims — Fri, 18 Nov 2022 22:17:03 +0000

After reading this brief tutorial you’ll have a better understanding of how to transcribe a podcast episode list using a speech-to-text provider, Async IO, and looping through it with Python. To see the full code sample, scroll down to the bottom of this post. Otherwise, let’s walk through step-by-step what you’ll accomplish.

Working with the asyncio library in Python can be tricky, but with some guidance, it’s less painful than one would imagine. With the help of Deepgram’s speech-to-text Python SDK, we can loop through all the podcast episode files and transcribe them using our prerecorded transcription.

Before doing any AI speech recognition you might wonder why we might need to use Asynchronous IO and those pesky async/await Python keywords.

In the next section, we’ll discover the use of Python’s Async IO and the difference between asynchronous and synchronous code.

High-Level Overview of Asynchronous and Synchronous Code in Python

Running code synchronously or asynchronously are two different programming concepts that are important to understand, especially when it comes to Async IO. Whether a task is asynchronous or synchronous depends on how and when tasks are executed in a program. To understand each method, let’s dive in a bit at a high level.

We can think of synchronous programming as running discrete tasks sequentially: step-by-step, one after another. There is no overlap in the tasks, so they are being run sequentially. Imagine we are baking a cake and we’re following the recipe instructions. The following steps would be executed in order, without skipping a step or jumping ahead:

Pre-heat the oven to 350 degrees
Mix flour, baking powder and salt in a large bowl
Beat butter and sugar in a bowl
Add in the eggs
Add in the vanilla extract
Mix all the ingredients
Pour cake batter into a sheet pan or spring mold
Bake the cake for 30 minutes until golden brown

With asynchronous programming, we can imagine we’re multitasking or doing more than one task at the same time, instead of doing things sequentially.

Following the same example above, here's what asynchronous cake baking could look like, stepwise:

Pre-heat the oven to 350 degrees
While the oven pre-heats, mix flour, baking powder, and salt in a large bowl AND Beat butter and sugar in a bowl
Add in the eggs AND Add in the vanilla extract
Mix all the ingredients
Pour cake batter into a sheet pan or spring mold
Bake the cake for 30 minutes until golden brown

As we can see, in steps 2 and 3 you are doing multiple tasks at once. You may have heard the term “concurrency” in programming. This is the basis for asynchronous programming in Python, which means the task can run in an overlapping manner (e.g. "concurrently," or "in parallel" alongside another task).

You probably also noticed that there are fewer steps in the asynchronous programming recipe example than in the synchronous one. Since you can run multiple tasks simultaneously, synchronous code normally runs faster than its synchronous counterpart.

This is where Async IO splashes into the picture. We use the asyncio Python library to write concurrent code using async/await syntax in our asynchronous code.

In the next section, let's dive into the code for looping through a podcast episode list using the asyncio library with Python. You’ll see how to transcribe each of the episodes using a speech-to-text AI provider and have a clearer understanding of the async/await Python keywords.

Transcribing Podcast Audio with Python and Speech-to-Text Using AI

Here's how to use Deepgram to transcribe our prerecorded audio files. Deepgram is a speech recognition provider that can transcribe audio from real-time streaming sources or by batch-processing one or more pre-recorded files. Podcasts are generally distributed as pre-recorded audio files, so that's how we're going to proceed.

First off, we’ll need to grab a Deepgram API Key here to use our Python SDK. It’s super easy to sign up and create. You can either log in with Google, GitHub, or your email.

Once we have our API key let’s open up one of our favorite code editors. That could be something like Visual Studio Code, PyCharm, or something else.

Next, we proceed to make a directory called transcribe-audio-files. We’ll transcribe sports speeches from a podcast, so create a Python file called transcribe_speeches.py.

Let’s also create a folder inside the project called speeches, which is where we’ll put our audio MP3 files. (Note: MP3s are the traditional audio format for podcasts. Deepgram works with over 100 different audio codecs and file formats.)

It’s also recommended that we create a virtual environment with the project so our Python dependencies are installed just for that environment, rather than globally. (Don't worry though: this is more of a "best practice" than a requirement. You do you.)

We’ll need to install the Deepgram speech-to-text Python package. To do so, install it with pip like this:

pip install deepgram-sdk

Let’s take a look at the code now.

The Python Code with Async IO Keywords (async/await)

Use the below code and put it in your Python code file:

from deepgram import Deepgram
import asyncio, json
import os

DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"

async def get_audio_files():
   path_of_the_speeches = 'speeches'
   for filename in os.listdir(path_of_the_speeches):
       audio_file = os.path.join(path_of_the_speeches,filename)
       if os.path.isfile(audio_file):
           await main(audio_file)

   return audio_file



async def main(file):
   print(f"Speech Name: {file}")
   # Initializes the Deepgram SDK
   deepgram = Deepgram(DEEPGRAM_API_KEY)

   # Open the audio file
   with open(file, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
       print(json.dumps(response, indent=4))

asyncio.run(get_audio_files())

The Python Code and Explanation with Async IO Keywords(async/await)

Let’s walk through the code step-by-step to understand what’s happening.

Here we are importing Deepgram so we can use its Python SDK. We’re also importing asyncio and json. We need asyncio to tap into Async IO and json because later in the code we’ll convert a Python object into JSON using json.dumps.

from deepgram import Deepgram
import asyncio, json
import os

Next, we take the Deepgram key we created earlier and replace the placeholder text, YOUR_DEEPGRAM_API_KEY , with your API key.

DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"

For example, if your API KEY is abcdefg1234 then your code should look like this:

DEEPGRAM_API_KEY="abcdefg1234"

In the below Python code snippet, we are just looping through the audio files in the speeches folder and passing them to the main function so they can be transcribed. Notice the use of the async/await keywords here.

async def get_audio_files():
   path_of_the_speeches = 'speeches'
   for filename in os.listdir(path_of_the_speeches):
       audio_file = os.path.join(path_of_the_speeches,filename)
       if os.path.isfile(audio_file):
           await main(audio_file)

   return audio_file

To make a function asynchronous in Python we need to add async in the function definition. So instead of def get_audio_files() which is synchronous, we use async def get_audio_files().

Whenever we use the async Python keyword, we also use await if we’re calling another function inside. In this line of code await main(audio_file), we are saying call the main function and pass in the audio file. The await tells us to stop the execution of the function get_audio_files and wait on the main function to do whatever it is doing, but in the meantime, the program can do other stuff.

async def main(file):
   print(f"Speech Name: {file}")

   # Initializes the Deepgram SDK
   deepgram = Deepgram(DEEPGRAM_API_KEY)

   # Open the audio file
   with open(file, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = await deepgram.transcription.prerecorded(source, {'punctuate': True})
       print(json.dumps(response, indent=4))

asyncio.run(get_audio_files())

Now to the speech-to-text Python transcription. We initialize Deepgram and pass in the API KEY.

Then we open each file as an audio and read in the bytes in this line with open(file, 'rb') as audio. We create a Python dictionary called source to store the buffer as audio and the mimetype as audio/mp3.

Next, we do the actual prerecorded transcription on this line response = await deepgram.transcription.prerecorded(source, {'punctuate': True}). We pass the source and the punctuate:True parameter, which will provide punctuation in the transcript.

Now, we can print out the response so we can receive our transcript print(json.dumps(response, indent=4)).

Lastly, we run our program using asyncio.run(get_audio_files()).

Conclusion

Hopefully, you have a better understanding of transcribing audio using voice-to-text and looping through a podcast episode list using Async IO with Python. If you have any questions or need some help, please feel free to reach out to us on our Github Discussions page.

Full Python Code Sample of Looping Through a Podcast Episode

from deepgram import Deepgram
import asyncio, json
import os

DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"

async def get_audio_files():
   path_of_the_speeches = 'speeches'
   for filename in os.listdir(path_of_the_speeches):
       audio_file = os.path.join(path_of_the_speeches,filename)
       if os.path.isfile(audio_file):
           await main(audio_file)

   return audio_file


async def main(file):
   print(f"Speech Name: {file}")
   # Initializes the Deepgram SDK

   deepgram = Deepgram(DEEPGRAM_API_KEY)

   # Open the audio file
   with open(file, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = await deepgram.transcription.prerecorded(source, {'punctuate': True})

       print(json.dumps(response, indent=4))

asyncio.run(get_audio_files())

How to Transcribe Only What You Need with Python: Listening Before Connected

Tonya Sims — Tue, 01 Nov 2022 22:20:24 +0000

Imagine a fast food restaurant taking orders in real-time using a speech-to-text API. The challenge is the customer will start speaking and sending audio data before the WebSocket connection opens. We need a way to capture that audio along with transcribing whatever the customers say after the WebSocket has been opened until they are finished speaking their order.

One solution is using a buffer, or a queue, to store the audio data before the WebSocket is connected. In Python, we can implement a buffer by using a list. We can add the audio data in bytes to the queue before the WebSocket connection is made and even continue using the buffer during the speech-to-text transcription after the connection is made.

In the next section, we will see to implement this solution using Python and the Deepgram speech-to-text API.

Using a Buffer in Python to Store Audio Data from Speech-to-Text Transcription

To run this code you’ll need a few things.

Grab a Deepgram API key from Deepgram
Install the following packages using pip:

pip install deepgram-sdk
pip install PyAudio

The following is the solution implemented in Python with a quick explanation of the code:

import pyaudio
import asyncio
import websockets
import os
import json

DEEPGRAM_API_KEY = "YOUR_DEEPGRAM_API_KEY"

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 8000

audio_queue = asyncio.Queue()

def callback(input_data, frame_count, time_info, status_flags):
   audio_queue.put_nowait(input_data)

   return (input_data, pyaudio.paContinue)


async def microphone(): 
   audio = pyaudio.PyAudio()
   stream = audio.open(
       format = FORMAT,
       channels = CHANNELS,
       rate = RATE,
       input = True,
       frames_per_buffer = CHUNK,
       stream_callback = callback
   )

   stream.start_stream()

   while stream.is_active():
       await asyncio.sleep(0.1)


   stream.stop_stream()
   stream.close()

async def process():
   extra_headers = {
       'Authorization': 'token ' + DEEPGRAM_API_KEY
   }

   async with websockets.connect('wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=16000&channels=1', extra_headers = extra_headers) as ws:
       async def sender(ws): # sends audio to websocket
           try:
               while True:
                   data = await audio_queue.get().
                   await ws.send(data)
           except Exception as e:
               print('Error while sending: ', + str(e))
               raise

       async def receiver(ws): 
           async for msg in ws:
               msg = json.loads(msg)
               transcript = msg['channel']['alternatives'][0]['transcript']

               if transcript:
                   print(f'Transcript = {transcript}')

       await asyncio.gather(sender(ws), receiver(ws))



async def run():
   await asyncio.gather(microphone(),process())

if __name__ == '__main__':
   asyncio.run(run())

Python Code Explanation for Using a Buffer with Speech-to-Text Transcription

Since we’re working with Python’s asyncio, we need to create a callback function as defined by PyAudio. This callback puts an item into the queue without blocking.

def callback(input_data, frame_count, time_info, status_flags):
   audio_queue.put_nowait(input_data)

   return (input_data, pyaudio.paContinue)

We define a microphone() function, create a stream based on PyAudio, and pass in our callback in stream_callback. We then start the stream and loop through it while it’s active.

async def microphone(): 
   audio = pyaudio.PyAudio()
   stream = audio.open(
       format = FORMAT,
       channels = CHANNELS,
       rate = RATE,
       input = True,
       frames_per_buffer = CHUNK,
       stream_callback = callback
   )

   stream.start_stream()

   while stream.is_active():
       await asyncio.sleep(0.1)


   stream.stop_stream()
   stream.close()

Next, we define an outer function called process() that gets the authorization for Deepgram. We create a context manager to async with websockets.connect to connect to the Deepgram WebSocket server.

The sender() function sends audio to the WebSocket. The buffer audio_queue.get() removes and returns an item from the queue. If the queue is empty, it waits until an item is available.

The reciever() function receives the transcript, parses the JSON response, and prints the transcript to the console.

Lastly, we run the program using asyncio.run(run()) inside of main.

async def process():
   extra_headers = {
       'Authorization': 'token ' + DEEPGRAM_API_KEY
   }

   async with websockets.connect('wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=16000&channels=1', extra_headers = extra_headers) as ws:
       async def sender(ws):
           try:
               while True:
                   data = await audio_queue.get().
                   await ws.send(data)
           except Exception as e:
               print('Error while sending: ', + str(e))
               raise

       async def receiver(ws): # receives the transcript
           async for msg in ws:
               msg = json.loads(msg)
               transcript = msg['channel']['alternatives'][0]['transcript']

               if transcript:
                   print(f'Transcript = {transcript}')

       await asyncio.gather(sender(ws), receiver(ws))



async def run():
   await asyncio.gather(microphone(),process())

if __name__ == '__main__':
   asyncio.run(run())

Conclusion

We hope you enjoyed this short project. If you need help with the tutorial or running the code please don’t hesitate to reach out to us. The best place to start is in our GitHub Discussions.

Identifying the Best Agent to Respond in Your IVR System

Tonya Sims — Wed, 28 Sep 2022 15:45:03 +0000

What would you say if I told you that you could detect spoken conversational language using AI in a speech-to-text transcript with Python?

Would you spit your beer out?

Ok, maybe your water, but the point is I built a cool conversational AI project with an Interactive Voice Response (IVR) using Twilio, a speech recognition provider, and Python. The best part about it is that it was reasonably easy to build using Flask 2.0. The purpose was to identify the best virtual customer support agent to respond to a call.

I would love to walk you through the project, but if you want to skip ahead to the code, scroll to the bottom of this blog post.

Create Voice Recognition Phone IVR With Speech Recognition Using Twilio and Python

This project was my first attempt at building an IVR with AI in Python, so I researched how these interactive voice response systems work. Simply put, you can think of them as a tree with many branches. They allow you to interact with a system, like an automated phone customer support agent, before being connected or transferred to a representative.

For example, you may be prompted to press “2” on your phone to connect to a department and then “1” to speak to a live customer support agent. I’m sure we’ve all been in that situation.

Twilio is the best choice for building the IVR because of its easy-to-navigate dashboard and simplicity. Also, since I’m using Python, they have tons of tutorials on implementing IVR systems like the one in Flask I’m using for this tutorial.

I also needed a speech-to-text API and leveraged Deepgram. We have a Python SDK I tapped into that made it super quick and easy to get up and running with the voice recognition transcription.

Deepgram also has language detection with prerecorded audio in which you can detect over 30 supported languages like Hindi, Spanish, and Ukrainian, to name a few.

Let’s get to the meat of the project: the code.

Code Breakdown for Creating IVR Speech-to-Text With Language Detection Using Python

Imagine you had to build a Python application that detects different conversational languages. It would help if you rerouted phone calls from customers using an IVR system to the appropriate virtual customer agent who speaks their language.

The following Python code breakdown demonstrates how to do so. There are just a few things I had to set up before the coding started. It’s painless, I promise.

Grab a Deepgram API Key. I needed this to tap into the speech-to-text Python SDK.
Create a Twilio account and voice phone number here. This allowed me to make an outgoing call and navigate the IVR with dial prompts.
Install ngrok to test my webhooks locally.

Next, I made a new directory to hold all my Python files and activated a virtual environment to pip install all of my Python packages.

These are the packages I installed:

pip install Flask
pip install ‘flask[async]’
pip install Twilio
pip install deepgram-sdk
pip install python-dotenv

After creating my directory, I downloaded three audio files with different spoken languages from this website and added them to my project in a folder called languages.

I created a file called views.py that contains most of my Flask 2.0 Python code. You’ll see the entirety of this code at the bottom of this post, but I’ll walk through the most critical parts of it.

This code is where the Deepgram Python speech-to-text transcription magic happens. I’m transcribing the audio MP3 file and returning the transcript and detected language. The API detected the conversational language and provided a language code like es for Spanish.

async def deepgram_transcribe(PATH_TO_FILE):

  # Initializes the Deepgram SDK
   deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY"))

  # Open the audio file
   with open(PATH_TO_FILE, 'rb') as audio:

      # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = await deepgram.transcription.prerecorded(source, {"detect_language": True})


       if 'transcript' in response['results']['channels'][0]['alternatives'][0]:
           transcript = response['results']['channels'][0]['alternatives'][0]['transcript']


       if 'detected_language' in response['results']['channels'][0]:
           detected_language = response['results']['channels'][0]['detected_language']

  
   return transcript, detected_language

At the top of the file, I created a Python dictionary that acts as a lookup. This dictionary contains the language code as a key and the name of the customer support agent that speaks that language as the value.

customer_service_reps = {
                           "fr": "Sally",
                           "es": "Pete",
                           "de": "Ann"
                       }

I created a POST route and prompted the user to press either 1,2, or 3, each for different languages. For example, if a customer presses 2 when they call in, they’ll get routed to the agent who speaks French.

Whichever option is selected will invoke a private function, as noted in the menu function. When option 2 is pressed, the function _french_recording is called.

@app.route('/ivr/welcome', methods=['POST'])
def welcome():
   response = VoiceResponse()
   with response.gather(
       num_digits=1, action=url_for('menu'), method="POST"
   ) as g:

       g.say(message="Thanks for calling the Deepgram Speech-to-Text Python SDK. " +
             "Please press 1 for Spanish" +
             "Press 2 for French" +
             "Press 3 for German", loop=3)


   return twiml(response)


@app.route('/ivr/menu', methods=['POST'])
async def menu():
   selected_option = request.form['Digits']
   option_actions = {'1': _spanish_recording,
                     '2': _french_recording,
                     '3': _german_recording}


   if selected_option in option_actions:
       response = VoiceResponse()
       await option_actions[selected_option](response)

       return twiml(response)


   return _redirect_welcome()

I created a private function for each spoken language, and when they’re selected, that method will get called, and a phone response will say the message. For French, the automated IVR response will be ”This is the French response and Sally will help you.”

async def _spanish_recording(response):
   recording = "languages/spanish-recording.mp3"
   spanish_transcript = await deepgram_transcribe(recording)

   representative = customer_service_reps[spanish_transcript[1]]


   response.say(f"This is the Spanish response and {representative} will help you.",
                voice="alice", language="en-US")

   response.hangup()

   return response



async def _french_recording(response):

   recording = "languages/french-recording.mp3"

   french_transcript = await deepgram_transcribe(recording)



   representative = customer_service_reps[french_transcript[1]]

   response.say(f"This is the French response and {representative} will help you.",
                voice="alice", language="en-US")


   response.hangup()

   return response


async def _german_recording(response):
   recording = "languages/german-recording.mp3"

   german_transcript = await deepgram_transcribe(recording)



   representative = customer_service_reps[german_transcript[1]]



   response.say(f"This is the German response and {representative} will help you.",
                voice="alice", language="en-US")


   response.hangup()

   return response

I also created a templates folder in the main Python Flask project directory with a blank index.html file. We don’t need anything in this file but feel free to add any HTML or Jinja.

To run the application, I fired up two terminals simultaneously in Visual Studio Code, one to run my Flask application and another for ngrok. Both are important, and you’ll need the ngrok url to add to your Twilio dashboard.

To run the Flask application, I used this command from the terminal:

FLASK_APP=views.py FLASK_DEBUG=1 flask run allows my application to run in debug mode, so when changes are made to my code, there’s no need for me to keep stopping and starting the terminal.

In the other terminal window, I ran this command:

ngrok http 5000

Make sure to grab the ngrok url, which is different from the one in the Flask terminal. It looks something like this: https://3afb-104-6-9-133.ngrok.io.

In the Twilio dashboard, click on Manage -> Active Numbers, then click on the purchased number. Put the ngrok url in the webhook with the following endpoint: https://3afb-104-6-9-133.ngrok.io/ivr/welcome, which is the unique ngrok url followed by the Flask route in the Python application /ivr/welcome.

Now, dial the Twilio number and follow the prompts, and you’ll get routed to the best customer agent to handle your call based on speech-to-text language detection!

Conclusion

Please let me know if you followed this tutorial or built your project using Python with Deepgram’s language detection. Please hop over to our Deepgram Github Discussions and send us a message.

The Python Flask Code for the IVR Speech-To-Text Application

My project structure:

views.py

from deepgram import Deepgram
from flask import (
   Flask,
   render_template,
   request,
   url_for,
)

from twilio.twiml.voice_response import VoiceResponse
from view_helpers import twiml
from dotenv import load_dotenv
import asyncio, json, os

app = Flask(__name__)



customer_service_reps = {

                           "fr": "Sally",
                           "es": "Pete",
                           "de": "Ann"

                       }



async def deepgram_transcribe(PATH_TO_FILE):
  # Initializes the Deepgram SDK
   deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY"))

  # Open the audio file
   with open(PATH_TO_FILE, 'rb') as audio:
      # ...or replace mimetype as appropriate
      source = {'buffer': audio, 'mimetype': 'audio/mp3'}
      response = await deepgram.transcription.prerecorded(source, {"detect_language": True})

       if 'transcript' in response['results']['channels'][0]['alternatives'][0]:
           transcript = response['results']['channels'][0]['alternatives'][0]['transcript']


       if 'detected_language' in response['results']['channels'][0]:
           detected_language = response['results']['channels'][0]['detected_language']

  
   return transcript, detected_language



@app.route('/')
@app.route('/ivr')
def home():
   return render_template('index.html')


@app.route('/ivr/welcome', methods=['POST'])
def welcome():
   response = VoiceResponse()
   with response.gather(
       num_digits=1, action=url_for('menu'), method="POST"
   ) as g:
       g.say(message="Thanks for calling the Deepgram Speech-to-Text Python SDK. " +
             "Please press 1 for Spanish" +
             "Press 2 for French" +
             "Press 3 for German", loop=3)


   return twiml(response)


@app.route('/ivr/menu', methods=['POST'])
async def menu():
   selected_option = request.form['Digits']
   option_actions = {'1': _spanish_recording,
                     '2': _french_recording,
                     '3': _german_recording}


   if selected_option in option_actions:
       response = VoiceResponse()
       await option_actions[selected_option](response)

       return twiml(response)


   return _redirect_welcome()



async def _spanish_recording(response):
   recording = "languages/spanish-recording.mp3"
   spanish_transcript = await deepgram_transcribe(recording)

   representative = customer_service_reps[spanish_transcript[1]]

   response.say(f"This is the Spanish response and {representative} will help you.",
                voice="alice", language="en-US")

   response.hangup()

   return response


async def _french_recording(response):
   recording = "languages/french-recording.mp3"
   french_transcript = await deepgram_transcribe(recording)

   representative = customer_service_reps[french_transcript[1]]

   response.say(f"This is the French response and {representative} will help you.",
                voice="alice", language="en-US")


   response.hangup()

   return response



async def _german_recording(response):
   recording = "languages/german-recording.mp3"
   german_transcript = await deepgram_transcribe(recording)



   representative = customer_service_reps[german_transcript[1]]



   response.say(f"This is the German response and {representative} will help you.",
                voice="alice", language="en-US")


   response.hangup()

   return response


def _redirect_welcome():
   response = VoiceResponse()
   response.say("Returning to the main menu", voice="alice", language="en-US")
   response.redirect(url_for('welcome'))

   return twiml(response)

view_helpers.py

import flask

def twiml(resp):
   resp = flask.Response(str(resp))
   resp.headers['Content-Type'] = 'text/xml'

   return resp

Build a Web Scraper With Your Voice Using Python

Tonya Sims — Mon, 19 Sep 2022 21:47:57 +0000

Voice commands are intriguing, especially with a speech recognition API. After getting exposure to Deepgram’s real-time transcription, and speech-to-text Python SDK, I thought it’d be cool to scrape a website with my voice.

The way the project works is simple:

Speak the command “scrape” into my computer’s microphone.
That will kick off the Python scraper, which extracts links from a webpage.

Let’s take a closer look at how I built this project using Python, FastAPI, and Deepgram speech-to-text.

Python Code Web Scraper Using a Voice Command With Speech-to-Text

For this voice command scraper, I used one of Python’s newest web frameworks, FastAPI. I’ve already written a blog post about how to get up and running with FastAPI and Deepgram’s live transcription using the Python SDK.

Since there’s already a tutorial about FastAPI written on Deepgram’s blog, I won’t go into tremendous detail as my original post covers most of the Python code.

Let’s start with the installation.

I installed two additional Python libraries from my terminal inside of a virtual environment:

pip install beautifulsoup4 
pip install requests

Then, I added the import statements to the main.py file:

from bs4 import BeautifulSoup
import requests
import re

BeautifuSoup is for web scraping.
The requests library is to get the text from the page source.
The re import is to get the links in a specific format.

The only new function in this file is scrape_links. I also defined a new list called hold_links which will hold all the links extracted from the webpage. I pass in a URL to scrape to requests.get and loop through a BeautifulSoup object. A link from the webpage gets appended to the list each time through the loop.

hold_links = []

def scrape_links():
   url = "https://xkcd.com/"
   r = requests.get(url)

   soup = BeautifulSoup(r.text, "html.parser")

   for link in soup.find_all("a", attrs={'href': re.compile("^https://")}):
       hold_links.append(link.get('href'))

   return hold_links

Next, is the get_transcript inner function.

async def process_audio(fast_socket: WebSocket):
   async def get_transcript(data: Dict) -> None:
       if 'channel' in data:
           transcript = data['channel']['alternatives'][0]['transcript']

           if transcript and transcript == 'scrape':
               scrape_links()
               await fast_socket.send_text(transcript)

   deepgram_socket = await connect_to_deepgram(get_transcript)

   return deepgram_socket

The only change here are these lines to check if there’s a transcript and if the transcript or voice command is “scrape”, then call the scrape_links function:

if transcript and transcript == 'scrape':
               scrape_links()

Last but not least, when rendering the template, I passed in the hold_links list as a context object so the HTML page could display the links using Jinja.

@app.get("/", response_class=HTMLResponse)
def get(request: Request):
   return templates.TemplateResponse("index.html", {"request": request, "hold_links": hold_links})

In the index.html file, I added the following line to the <head></head> section to refresh the page every five seconds:

<meta http-equiv="refresh" content="5" />

The page needs to be refreshed after speaking the voice command “scrape” to display the extracted links.

Lastly, in the <body></body>, add these lines which loop over the extracted links from the webpage and render them to the HTML page, index.html:

<body>
       <p>
           {% for link in hold_links %}
               {{ link }}</br>
           {% endfor %}
       </p>
</body>

Finally, to run the FastAPI Python voice-to-text web scraper, type uvicorn main:app --reload from the terminal and navigate to http://127.0.0.1:8000/.

After speaking the word “scrape” into my computer’s microphone, a list of extracted links for the specified URL appeared on the webpage.

If you found my project exciting or have questions, please feel free to Tweet me! I’m happy to help!

How To Monitor Media Mentions in Podcasts with Python

Tonya Sims — Wed, 31 Aug 2022 19:20:08 +0000

Over the last ten years, the number of people who listen to podcasts has doubled. With this increase comes more ad spending. Companies must monitor media mentions from podcast ads using AI and Python more than ever to identify which companies are mentioned, either theirs or a competitor.

For example, the podcasts I listen to occasionally include ads from multiple sponsors. What if you’re a company that needs to monitor media mentions in podcasts for your competitors? You need to identify what was said about these companies versus what was paid to be said. This differentiation is an important distinction.

There are a few ways to monitor media mentions in podcasts using AI speech-to-text and Python. Let’s look at a method using diarization (FYI, there is a better way further down in this post).

Method 1: Monitor Media Mentions in Podcasts Using Diarization with AI Speech Recognition

This method is interesting but not as effective as I’ll show later in this post. As a quick review, Deepgram’s diarization feature recognizes speaker changes in a transcript. For example, if there are multiple speakers and diarization is set to True, a word will be assigned to each speaker in the transcript.

A readable formatted transcript with the speech-to-text diarize feature may look something like this:

[Speaker:0] All alright, guys, before we start, we got a special message from our sponsor.
[Speaker:1] If you wanna rank higher on Google, you gotta look at your page speed time. 
[Speaker:1] The faster website loads, the better off you are.
[Speaker:1] With Google's core vital update that makes it super important to optimize your site or load time.
[Speaker:1] And one easy way to do it is use the host that Eric and I use, Dream Host.

In a podcast, there’s usually an even split time between the speakers or the hosts. The way diarization is used to monitor media mentions in podcasts is to determine if one person is a speaker for a more extended time than the other. In our above transcript example, you’ll notice that Speaker 1 talks the longest during that segment. This could indicate that’s where the ad is read on behalf of the sponsor.

I promised you a better way to monitor mentions in a podcast. Let’s look at how that would work with Python, Deepgram’s AI speech-to-text Search feature, and entity detection with SpaCy.

Method 2: Monitor Media Mentions in Podcasts Using Search and Entity Detection

I was curious how to come up with a way to monitor media mentions in podcasts that would do the following:

Search for terms in the podcast transcript like “sponsor” or “paid” that indicate an ad segment
Identify the organizations that are talked about in the ad to determine the company sponsoring that segment
And overall, not cause a bigger headache for me

I needed to use an AI voice recognition API that would transcribe the podcast audio. That part was easy to figure out. Use the Deepgram Python SDK. I used the prerecorded option in this scenario to transcribe the already recorded audio. I also grabbed a Deepgram API key from our console, which has gamified missions you can try to get up to speed quicker.

Deepgram is nice because it has high accuracy, and the transcript gets returned quickly. Both are important in this case. I needed accuracy to correctly flag the organizations (I’ll show you in the code), and speed is an advantage, so I didn’t have to wait long for the transcribed audio.

The Search feature from Deepgram was a lifesaver when working on this project. It searches for terms or phrases by matching acoustic patterns in audio, then returns the result as a JSON object.

I added the Search feature as a parameter in the Python code like this:

'search': 'sponsor'

Since I wanted to find where the podcast hosts mentioned sponsorships, searching for the world sponsor made sense. Imagine them saying something like, “Now a word from our sponsor”.

After printing the results, I received a response similar to this:

[{'confidence': 1.0, 'end': 23.57, 'snippet': 'our sponsor', 'start': 23.09},
 {'confidence': 0.7023809, 'end': 79.82909, 'snippet': 'spotify', 'start': 79.38954},
 {'confidence': 0.6279762, 'end': 120.18001, 'snippet': 'stocks','start': 119.740005},
 {'confidence': 0.5535714, 'end': 241.19926,'snippet': 'focus on','start': 240.92029}]

The response is a list of dictionaries with the closest match for my search term indicated by the confidence. The higher the confidence, the more likely it matches the search. This feature helped tremendously since all I had to do was pass in a word to search for in the transcript to the speech-to-text Python SDK and spit out a result.

Next, I used SpaCy to handle the entity detection. SpaCy is a Python library used for Machine Learning and Natural Language Processing. I was looking for a way to tag the entities in the transcribed audio as an organization.

SpaCy labels the recognized company entities as ORG, but I also used EntityRuler to identify lesser-known organizations. You’ll see how that works in the next section when I break down the code.

Python Code Breakdown With AI Deepgram Speech-to-Text and SpaCy

The first thing I did was pip install the following Python libraries:

pip install deepgram-sdk
pip install python-dotenv
pip install -U pip setuptools wheel
pip install spacy
python3 -m spacy download en_core_web_md

If you want to see the Python code that I wrote for this podcast media mentions project, please look below:

from multiprocessing.context import set_spawning_popen
from deepgram import Deepgram
from dotenv import load_dotenv
from spacy.pipeline import EntityRuler
import spacy
import asyncio
import json
import os

load_dotenv()

PATH_TO_FILE = 'podcast-audio-file.mp3'

async def transcribe_with_deepgram():
   # Initializes the Deepgram SDK
   deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY"))

   options = {
       'punctuate': True,
       'search': 'sponsor'
   }   

   get_start_time = 0.0


   # Open the audio file
   with open(PATH_TO_FILE, 'rb') as audio:
       # ...or replace mimetype as appropriate
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = await deepgram.transcription.prerecorded(source, options)

       if 'transcript' in response['results']['channels'][0]['alternatives'][0]:
           # search for query word in transcript
           search_term = response['results']['channels'][0]['search'][0]['hits']

           # get search_term with confidence of 1.0
           if search_term[0]['confidence'] == 1.0:
               get_start_time = search_term[0]['start']

           transcript = response['results']['channels'][0]['alternatives'][0]['words']

    get_end_start_time = get_start_time + 30

   start_list = []

   for word in transcript:
       if word['start'] >= get_start_time and word['start'] < get_end_start_time:
           start_list.append(word['punctuated_word'])

   new_transcript = " ".join(start_list)

   return new_transcript


async def get_media_mentions():

   media_transcript = await transcribe_with_deepgram()

   # Build upon the spaCy Medium Model
   nlp = spacy.load("en_core_web_md")

   # Create the EntityRuler (your competition or whichever ORG)
   ruler = nlp.add_pipe("entity_ruler")

   # List of Entities and Patterns
   patterns = [
                   {"label": "ORG", "pattern": "Dream Host"}
              ]

   ruler.add_patterns(patterns)


   doc = nlp(media_transcript)

   #extract entities
   for ent in doc.ents:
       if ent.label_ == "ORG":
           print(ent.text, ent.label_)



asyncio.run(get_media_mentions())

In the transcribe_ with_deepgram method, you initialize the Deepgram API and open our .mp3 podcast file to read it as audio. Then you use the prerecorded transcription option to transcribe a recorded file to text.

In the get_media_mentions method, I’m loading up the SpaCY medium model and creating an EntityRuler. This EntityRuler allowed me to create a pattern Dream Host with a corresponding label ORG. In this example, Dream Host is not a recognized company. Still, it is mentioned in the transcript, so I wanted to ensure the code picked it up as I monitored the media mentions in the podcast.

Finally, I extracted the entities and printed out the text or name of the company mentioned in the sponsored segment of the podcast and all the labels with ORG, identifying it as an organization.

Here’s what it looked like in my terminal:

Google ORG
Google ORG
Dream Host ORG

As you can see, the podcast hosts mentioned the companies Google and Dream Host.

Conclusion

That wraps up this blog post on how to monitor media mentions in podcasts with Python. I hope you found this tutorial helpful. If you did or have any questions, please feel to tweet me at @DeepgramAI.

Topic Detection in Podcast Episodes with Python

Tonya Sims — Wed, 24 Aug 2022 20:55:13 +0000

Imagine you’re a Python Machine Learning Engineer. Your work day is getting ready to start with the dreaded stand-up meeting, but you're looking the most forward to deep diving into topic detection algorithms.

If you want to see the whole Python code snippet for topic detection, please scroll to the bottom of this post.

You step out to get coffee down the street. A black SUV pulls up next to you, the door opens, and someone tells you to get in the truck.

They explain that your Machine Learning Python prowess is needed badly.

Why?

They need you to transcribe a podcast from speech-to-text urgently. But not just any podcast. It’s Team Coco’s podcast, the legendary Conan O’Brien. Not only do they need it transcribed using AI speech recognition, but they also require a topic analysis to quickly analyze the topics to discover what the podcast is about.

They can’t say too much about the underground Operation Machine Learning Topic Detection, other than if you can’t deliver the topic modeling results or tell anyone, something terrible may happen.

Weird. Ironic but weird. Yesterday, you learned about the TF-IDF (Term Frequency - Inverse Document Frequency) topic detection algorithm.

You should feel confident in your Python and Machine Learning abilities, but you have some reservations.

You think about telling your manager but remember what they said about something terrible that may happen.

You’re going through self-doubt, and most importantly, you’re not even sure where to start with transcribing audio speech-to-text in Python.

What if something bad does happen if you don’t complete the topic detection request?

You decide to put on your superhero cape and take on the challenge because your life could depend on it.

Discovery of Deepgram AI Speech-to-Text

You’re back at your home office and not sure where to start with finding a Python speech-to-text audio transcription provider.

You try using Company A’s transcription with Python, but it takes a long time to get back a transcript. Besides, the file you need to transcribe is over an hour long, and you don’t have time to waste.

You try Company B’s transcription again with Python. This time, the transcription comes back faster, but one big problem is accuracy. The words in the speech-to-text audio transcript you’re getting back are inaccurate.

You want to give up because you don’t think you’ll be able to find a superior company with an API that provides transcription.

Then you discover Deepgram, and everything changes.

Deepgram is an AI automated speech recognition voice-to-text company that allows us to build applications that transcribe speech-to-text.

You loved how effortless it is to sign up for Deepgram by quickly grabbing a Deepgram API Key from our website. You also immediately get hands-on experience after signing up by trying out their console missions for transcribing prerecorded audio in a matter of a few minutes.

There’s even better news!

Deepgam has much higher transcription accuracy than other providers, and you receive a transcript back super fast. You also discover they have a Python SDK that you can use.

It’s do-or-(maybe)-die time.

You hear a tornado warning siren, but disregard it and start coding.

You won’t let anything get in your way, not even a twister.

Python Code for AI Machine Learning Topic Detection

You first create a virtual environment to install your Python packages inside.

Next, from the command line, you pip install the following Python packages inside of the virtual environment:

pip install deepgram-sdk
pip install python-dotenv
pip install -U scikit-learn
pip install -U nltk

Then you create a .env file inside your project directory to hold your Deepgram API Key, so it’s not exposed to the whole world. Inside of your .env file, you assign your API Key from Deepgram to a variable `DEEPGRAM_API_KEY, like so:

DEEPGRAM_AP_KEY=”abc123”

Next, you create a new file called `python_topic_detection.py. You write the following code that imports Python libraries and handles the Deepgram prerecorded audio speech-to-text transcription:

from ast import keyword
from posixpath import split
from deepgram import Deepgram
from dotenv import load_dotenv
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from nltk.corpus import stopwords
import asyncio
import json
import os
import nltk


load_dotenv()

PATH_TO_FILE = 'conan_podcast.mp3'

async def transcribe_with_deepgram():
  # Initializes the Deepgram SDK
  deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY"))
  # Open the audio file
  with open(PATH_TO_FILE, 'rb') as audio:
      # ...or replace mimetype as appropriate
      source = {'buffer': audio, 'mimetype': 'audio/mp3'}
      response = await deepgram.transcription.prerecorded(source)

      if 'transcript' in response['results']['channels'][0]['alternatives'][0]:
          transcript = response['results']['channels'][0]['alternatives'][0]['transcript']

          return transcript

The transcribe_with_deepgram() function comes from our Deepgram Python SDK, located here in Github.

In this method, you initialize the Deepgram API and open our .mp3 podcast file to read it as audio. Then you use the prerecorded transcription option to transcribe a recorded file to text.

You’re on a roll!

Next, you start writing the code for the TF-IDF Machine Learning algorithm to handle the topic detection. The tornado knocks out your power, and you realize you only have 20% laptop battery life.

You need to hurry and continue writing the following code in the same file:

async def remove_stop_words():
  transcript_text = await transcribe_with_deepgram()
  words = transcript_text.split()
  final = []

  nltk.download('stopwords')
  stops = stopwords.words('english')

  for word in words:
      if word not in stops:
          final.append(word)

  final = " ".join(final)

  return final


async def cleaned_docs_to_vectorize():
  final_list = []
  transcript_final = await remove_stop_words()

  split_transcript = transcript_final.split()

  vectorizer = TfidfVectorizer(
                              lowercase=True,
                              max_features=100,
                              max_df=0.8,
                              min_df=4,
                              ngram_range=(1,3),
                              stop_words='english'

                          )


  vectors = vectorizer.fit_transform(split_transcript)

  feature_names = vectorizer.get_feature_names()

  dense = vectors.todense()
  denselist = dense.tolist()

  all_keywords = []

  for description in denselist:
      x = 0
      keywords = []
      for word in description:
          if word > 0:
              keywords.append(feature_names[x])
          x=x+1

      [all_keywords.append(x) for x in keywords if x not in all_keywords]
   topic = "\n".join(all_keywords)
  print(topic)

  k = 10

  model = KMeans(n_clusters=k, init="k-means++", max_iter=100, n_init=1)

  model.fit(vectors)

  centroids = model.cluster_centers_.argsort()[:, ::-1]

  terms = vectorizer.get_feature_names()
   with open("results.txt", "w", encoding="utf-8") as f:
      for i in range(k):
          f.write(f"Cluster {i}")
          f.write("\n")
          for ind in centroids[i, :10]:
              f.write(' %s' % terms[ind],)
              f.write("\n")
          f.write("\n")
          f.write("\n")

asyncio.run(cleaned_docs_to_vectorize())

In this code, you create a new function called cleaned_docs_to_vectorize(), which will get the previous method's transcript and remove any stop words. Stop words are unimportant, like a, the, and, this etc.

The algorithm will then perform the TF-IDF vectorization using these lines of code:

vectorizer = TfidfVectorizer(
                              lowercase=True,
                              max_features=100,
                              max_df=0.8,
                              min_df=4,
                              ngram_range=(1,3),
                              stop_words='english'

                          )

You quickly read about the options passed into the vectorizer like max_features and max_df on sciki-learn.

You have a little bit on time with 15% battery life, so you decide to use K-Means to create 10 clusters of topics. This way, they can get a more meaningful sense of the data structure from the podcast. You write the K-Means clusters to a file called results.txt.

To run the program, type python3 python_topic_detection.py from the terminal.

When you print the topics, you see a list like the following:

sort
little
sitting
went
new
knew
comedy
remember
guys
funny
jerry
club
point
gilbert
york
chris
rock
famous
later
getting
long
love
night
year
bob
norm
car
news
space
astronauts
nasa

Bingo!

You can now make inferences about the AI Topic Detection to determine the subject matter of the podcast episode.

Then, peek at your results.txt file to verify that you received 10 clusters. Here’s an example of four of the ten groups of words using KMeans clustering:

Cluster 0
yeah
think
ve
roast
got
space
cat
say
joke
oh


Cluster 1
person
york
joke
gonna
good
got
great
guy
guys
heard


Cluster 2
know
york
jokes
gonna
good
got
great
guy
guys
heard


Cluster 3
right
york
joke
gonna
good
got
great
guy
guys
heard

Just before your laptop battery dies, you show them the topics for Team Coco. They are very happy with your results and drive off.

You’re feeling more confident than ever.

You’ll never know why they needed the Machine Learning topic detection or why they chose you, but you’re on top of the world right now.

Conclusion

Congratulations on building the Topic Detection AI Python project with Deepgram. Now that you made it to the end of this blog post, Tweet us at @DeepgramAI if you have any questions or to let us know how you enjoyed this post.

Full Python Code for the AI Machine Learning Podcast Topic Detection Project

from ast import keyword
from posixpath import split
from deepgram import Deepgram
from dotenv import load_dotenv
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from nltk.corpus import stopwords
import asyncio
import json
import os
import nltk


load_dotenv()

PATH_TO_FILE = 'conan_podcast.mp3'

async def transcribe_with_deepgram():
  # Initializes the Deepgram SDK
  deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY"))
  # Open the audio file
  with open(PATH_TO_FILE, 'rb') as audio:
      # ...or replace mimetype as appropriate
      source = {'buffer': audio, 'mimetype': 'audio/mp3'}
      response = await deepgram.transcription.prerecorded(source)

      if 'transcript' in response['results']['channels'][0]['alternatives'][0]:
          transcript = response['results']['channels'][0]['alternatives'][0]['transcript']

          return transcript

async def remove_stop_words():
  transcript_text = await transcribe_with_deepgram()
  words = transcript_text.split()
  final = []

  nltk.download('stopwords')
  stops = stopwords.words('english')

  for word in words:
      if word not in stops:
          final.append(word)

  final = " ".join(final)

  return final


async def cleaned_docs_to_vectorize():
  final_list = []
  transcript_final = await remove_stop_words()

  split_transcript = transcript_final.split()

  vectorizer = TfidfVectorizer(
                              lowercase=True,
                              max_features=100,
                              max_df=0.8,
                              min_df=4,
                              ngram_range=(1,3),
                              stop_words='english'

                          )


  vectors = vectorizer.fit_transform(split_transcript)

  feature_names = vectorizer.get_feature_names()

  dense = vectors.todense()
  denselist = dense.tolist()

  all_keywords = []

  for description in denselist:
      x = 0
      keywords = []
      for word in description:
          if word > 0:
              keywords.append(feature_names[x])
          x=x+1

      [all_keywords.append(x) for x in keywords if x not in all_keywords]
   topic = "\n".join(all_keywords)
  print(topic)

  k = 10

  model = KMeans(n_clusters=k, init="k-means++", max_iter=100, n_init=1)

  model.fit(vectors)

  centroids = model.cluster_centers_.argsort()[:, ::-1]

  terms = vectorizer.get_feature_names()
   with open("results.txt", "w", encoding="utf-8") as f:
      for i in range(k):
          f.write(f"Cluster {i}")
          f.write("\n")
          for ind in centroids[i, :10]:
              f.write(' %s' % terms[ind],)
              f.write("\n")
          f.write("\n")
          f.write("\n")

asyncio.run(cleaned_docs_to_vectorize())

How to Use Voice to Control Music with Python and Deepgram

Tonya Sims — Fri, 19 Aug 2022 19:29:59 +0000

Move over Beethoven. This tutorial will use Python and the Deepgram API speech-to-text audio transcription to play a piano with your voice. The song we’ll play is the first few phrases of Lady Gaga’s Bad Romance. It’s a simple piece in C Major, meaning no flats and sharps! We’ll only use pitches C, D, E, F, G, A, and B, and no black keys. What a beautiful chance for someone learning how to play the piano without a keyboard, tapping into the power of voice to play music!

After running the project, we'll see the GIF below when running the project as a PyGame application. A window will appear, and the piano will play the song. We'll hear the notes, which also light up on the keyboard.

Let’s get started!

What We’ll Need to Play Voice-Controlled Music Using AI

This project requires macOS but is also possible with a Windows or Linux machine. We’ll also use Python 3.10 and other tools like FluidSynth and Deepgram Python SDK speech-to-text audio transcription.

FluidSynth

We need to install FluidSynth, a free, open-source MIDI software synthesizer that creates sound in digital format, usually for music. MIDI or Musical Instrument Digital Interface is a protocol that allows musical gear like computers, software, and instruments to communicate with one another. FluidSynth uses SoundFont files to generate audio. These files have samples of musical instruments like a piano that play MIDI files.

There are various options to install FluidSynth on a Mac. In this tutorial, we’ll use Homebrew for the installation. After installing Homebrew, run this command anywhere in the terminal:

brew install fluidsynth

Now that FluidSynth is installed, let’s get our Deepgram API Key.

Deepgram API Key

We need to grab a Deepgram API Key from the console. It’s effortless to sign up and create an API Key here. Deepgram is an AI automated speech recognition voice-to-text company that allows us to build applications that transcribe speech-to-text. We’ll use Deepgram’s Python SDK and the Numerals feature, which converts a number from written format to numerical format. For example, if we say the number “three”, it would appear in our transcript as “3”.

One of the many reasons to choose Deepgram over other providers is that we build better voice applications with faster, more accurate transcription through AI Speech Recognition. We offer real-time transcription and pre-recorded speech-to-text. The latter allows uploading a file that contains audio voice data for transcribing.

Now that we have our Deepgram API Key let’s set up our Python AI piano project so we can start making music!

Create a Python Virtual Environment

Make a Python directory called play-piano to hold our project. Inside of it, create a new file called piano-with-deepgram.py, which will have our main code for the project.

We need to create a virtual environment and activate it so we can pip install our Python packages. We have a more in-depth article about virtual environments on our Deepgram Developer blog.

Activate the virtual environment after it’s created and install the following Python packages from the terminal.

pip install deepgram-sdk
pip install python-dotenv
pip install mingus
pip install pygame
pip install sounddevice
pip install scipy

Let’s go through each of the Python packages.

deepgram-sdk is the Deepgram Python SDK installation that allows us to transcribe speech audio, or voice, to a text transcript.
python-dotenv helps us work with environment variables and our Deepgram API KEY, which we’ll pull from the .env file.
mingus is a package for Python used by programmers and musicians to make and play music.
pygame is an open-sourced Python engine to help us make games or other multimedia applications.
sounddevice helps get audio from our device’s microphone and records it as a NumPy array.
scipy helps writes the NumPy array into a WAV file.

We need to download a few files, including keys.png, which is the image of the piano GUI. The other file we need is the Yamaha-Grand-ios-v1.2 from this site. A SoundFont contains a sample of musical instruments; in our case, we’ll need a piano sound.

The Code to Play Voice-Controlled Music with Python and AI

We’ll only cover the Deepgram code in this section but will provide the entire code for the project at the end of this post.

file_name = input("Name the output WAV file: ")

AUDIO_FILE = file_name

fs = 44100
duration = 30.0


def record_song_with_voice():
   print("Recording.....")
   record_voice = sd.rec(int(duration * fs) , samplerate = fs , channels = 1)
   sd.wait()
   write(AUDIO_FILE, fs,record_voice)
   print("Finished.....Please check your output file")

async def get_deepgram_transcript():
   deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY"))

   record_song_with_voice() 
   with open(AUDIO_FILE, "rb") as audio:
       source = {"buffer": audio, "mimetype": "audio/wav"}
       response = await deepgram.transcription.prerecorded(source, {"punctuate": True, "numerals": True})

   return response

async def get_note_data():
   note_dictonary = {
          '1': 'C',
          '2': 'D',
          '3': 'E',
          '4': 'F',
          '5': 'G',
          '6': 'A',
          '7': 'B'
  }

   get_numbers = await get_deepgram_transcript()
   data = []
   if 'results' in get_numbers:
       data = get_numbers['results']['channels'][0]['alternatives'][0]['words']

   return [note_dictonary [x['word']] for x in data]

data = asyncio.run(get_note_data())

Deepgram Python Code Explanation

This line of code prompts the user to create a name of the audio file so that the file will save in .wav format:

file_name = input("Name the output WAV file: ")

Once the file is created the function record_song_with_voice gets called inside the get_deepgram_transcript method.

def record_song_with_voice():
   print("Recording.....")
   record_voice = sd.rec(int(duration * fs) , samplerate = fs , channels = 1)
   sd.wait()
   write(AUDIO_FILE, fs,record_voice)
   print("Finished.....Please check your output file")

Inside the record_song_with_voice function, this line records the audio.

 record_voice = sd.rec(int(duration * fs) , samplerate = fs , channels = 1)

Where duration is the number of seconds it takes to record an audio file, and fs represents the sampling frequency. We set both of these as constants near the top of the code.

Then we write the voice recording to an audio file using the .write() method. That line of code looks like this:

   write(AUDIO_FILE, fs,record_voice)

Once the file is done writing, this message will print to the terminal ”Finished.....Please check your output file", which means the recording is complete.

The function get_deepgram_transcript is where most of the magic happens. Let’s walk through the code.

async def get_deepgram_transcript():
   deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY"))

   record_song_with_voice() 

   with open(AUDIO_FILE, "rb") as audio:
       source = {"buffer": audio, "mimetype": "audio/wav"}
       response = await deepgram.transcription.prerecorded(source, {"punctuate": True, "numerals": True})

   return response

Here we initialize the Deepgram Python SDK. That’s why it’s essential to grab a Deepgram API Key from the console.

deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY"))

We store our Deepgram API Key in a .env file like so:

DEEPGRAM_API_KEY="abc123"

The abc123 represents the API Key Deepgram assigns us.

Next, we call the external function record_song_with_voice(), which allows us to record our voice and create a .wav file that will pass into Deepgram as pre-recorded audio.

Finally, we open the newly created audio file in binary format for reading. We provide key/values pairs for buffer and a mimetype using a Python dictionary. The buffer’s value is audio, the object we assigned it in this line with open(AUDIO_FILE, "rb") as audio: The mimetype value is audio/wav, which is the file format we’re using, which one of 40+ different file formats that Deepgram supports. We then call Deepgram and perform a pre-recorded transcription in this line: response = await deepgram.transcription.prerecorded(source, {"punctuate": True, "numerals": True}). We pass in the numerals parameter so that when we say a number, it will process in numeric form.

 with open(AUDIO_FILE, "rb") as audio:
       source = {"buffer": audio, "mimetype": "audio/wav"}
       response = await deepgram.transcription.prerecorded(source, {"punctuate": True, "numerals": True})

   return response

The last bit of code to review is the get_note_data function, doing precisely that: getting the note data.

async def get_note_data():
   note_dictonary = {
          '1': 'C',
          '2': 'D',
          '3': 'E',
          '4': 'F',
          '5': 'G',
          '6': 'A',
          '7': 'B'
  }

   get_numbers = await get_deepgram_transcript()
   data = []
   if 'results' in get_numbers:
       data = get_numbers['results']['channels'][0]['alternatives'][0]['words']

   return [note_dictonary [x['word']] for x in data]

data = asyncio.run(get_note_data())

We have a Python dictionary with keys from ‘1’ to ‘7’ corresponding to every note in the C Major scale. For example, when we say the number 1 that plays the note C, saying the number 2 will play the ‘D’ note, and so on:

   note_dictonary = {
          '1': 'C',
          '2': 'D',
          '3': 'E',
          '4': 'F',
          '5': 'G',
          '6': 'A',
          '7': 'B'
  }

Here’s how that would look on a piano. Each note in C Major is labeled, and located above is a corresponding number. The numbers 1 - 7 are critical, representing a single note in our melody.

Next, we get the numerals from the Deepgram pre-recorded transcript get_numbers = await get_deepgram_transcript().

We then create an empty list called data and check if there are any results in the parsed response we get back from Deepgram. If results exist, we get that result and store it in data:

   data = []
   if 'results' in get_numbers:
       data = get_numbers['results']['channels'][0]['alternatives'][0]['words']

Example output may look like the below, depending on which song we create.

[
{'word': '1', 'start': 2.0552316, 'end': 2.4942129, 'confidence': 0.99902344, 'punctuated_word': '1'}, 
{'word': '4', 'start': 2.8533795, 'end': 3.172639, 'confidence': 0.9980469, 'punctuated_word': '4'}, 
{'word': '3', 'start': 3.6116204, 'end': 4.1116204, 'confidence': 0.9975586, 'punctuated_word': '3'}
]

We notice that the word key in the above response correlates to a numeral we speak into the microphone when recording the song.

We can now create a new list that maps each numeral to a note on the piano, using a list comprehension return [note_dictonary [x['word']] for x in data].

To run the project, we’ll need all the code. See the end of this post.

Then in our terminal, we can run the project by typing:

python3 piano-with-deepgram.py

Now, use our voice to say the following numerals, which correspond to piano notes, to play the first few phrases from Lady Gaga’s song Bad Romance:

12314 3333211 12314 3333211

Next Steps to Extend the Voice-Controlled Python AI Music Example

Congratulations on getting to the end of the tutorial! We encourage you to try and extend the project to do the following:

Play around with the code to play songs in different octaves
Play voice-controlled music that has flats and sharps
Tweak the code to play voice-controlled music using whole notes and half notes

When you have your new masterpiece, please send us a Tweet at @DeepgramAI and showcase your work!

The Entire Python Code for the Voice-Controlled Music Example

# -*- coding: utf-8 -*-

from pygame.locals import *
from mingus.core import notes, chords
from mingus.containers import *
from mingus.midi import fluidsynth
from os import sys
from scipy.io.wavfile import write
from deepgram import Deepgram
from dotenv import load_dotenv
import asyncio, json
import pygame
import os
import time
import sounddevice as sd


load_dotenv()
file_name = input("Name the output WAV file: ")

# Audio File with song
AUDIO_FILE = file_name
SF2 = "soundfont.sf2"
OCTAVES = 5 # number of octaves to show
LOWEST = 2 # lowest octave to show
FADEOUT = 0.25 # 1.0 # coloration fadeout time (1 tick = 0.001)
WHITE_KEY = 0
BLACK_KEY = 1

WHITE_KEYS = [
  "C",
  "D",
  "E",
  "F",
  "G",
  "A",
  "B",
]

BLACK_KEYS = ["C#", "D#", "F#", "G#", "A#"]

fs = 44100
duration = 30.0

def record_song_with_voice():
   print("Recording.....")
   record_voice = sd.rec(int(duration * fs) , samplerate = fs , channels = 1)
   sd.wait()
   write(AUDIO_FILE, fs,record_voice)
   print("Finished.....Please check your output file")

async def get_deepgram_transcript():
   # Initializes the Deepgram SDK
   deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY"))

   # call the external function
   record_song_with_voice()
   # Open the audio file
   with open(AUDIO_FILE, "rb") as audio:
       # ...or replace mimetype as appropriate
       source = {"buffer": audio, "mimetype": "audio/wav"}
       response = await deepgram.transcription.prerecorded(source, {"punctuate": True, "numerals": True})

   return response

def load_img(name):
  """Load image and return an image object"""
  fullname = name
  try:
      image = pygame.image.load(fullname)
      if image.get_alpha() is None:
          image = image.convert()
      else:
          image = image.convert_alpha()
  except pygame.error as message:
      print("Error: couldn't load image: ", fullname)
      raise SystemExit(message)
  return (image, image.get_rect())
if not fluidsynth.init(SF2):
  print("Couldn't load soundfont", SF2)
  sys.exit(1)

pygame.init()
pygame.font.init()
font = pygame.font.SysFont("monospace", 12)
screen = pygame.display.set_mode((640, 480))
(key_graphic, kgrect) = load_img("keys.png")
(width, height) = (kgrect.width, kgrect.height)
white_key_width = width / 7

# Reset display to wrap around the keyboard image
pygame.display.set_mode((OCTAVES * width, height + 20))
pygame.display.set_caption("mingus piano")
octave = 4
channel = 8

# pressed is a surface that is used to show where a key has been pressed
pressed = pygame.Surface((white_key_width, height))
pressed.fill((0, 230, 0))

# text is the surface displaying the determined chord
text = pygame.Surface((width * OCTAVES, 20))
text.fill((255, 255, 255))

playing_w = [] # white keys being played right now
playing_b = [] # black keys being played right now
quit = False
tick = 0.0

def play_note(note):
  """play_note determines the coordinates of a note on the keyboard image
  and sends a request to play the note to the fluidsynth server"""
  global text
  octave_offset = (note.octave - LOWEST) * width
  if note.name in WHITE_KEYS:
      # Getting the x coordinate of a white key can be done automatically
      w = WHITE_KEYS.index(note.name) * white_key_width
      w = w + octave_offset
      # Add a list containing the x coordinate, the tick at the current time
      # and of course the note itself to playing_w
      playing_w.append([w, tick, note])
  else:
      # For black keys I hard coded the x coordinates. It's ugly.
      i = BLACK_KEYS.index(note.name)
      if i == 0:
          w = 18
      elif i == 1:
          w = 58
      elif i == 2:
          w = 115
      elif i == 3:
          w = 151
      else:
          w = 187
      w = w + octave_offset
      playing_b.append([w, tick, note])
  # To find out what sort of chord is being played we have to look at both the
  # white and black keys, obviously:
  notes = playing_w + playing_b
  notes.sort()
  notenames = []
  for n in notes:
      notenames.append(n[2].name)
  # Determine the chord
  det = chords.determine(notenames)
  if det != []:
      det = det[0]
  else:
      det = ""
  # And render it onto the text surface
  t = font.render(det, 2, (0, 0, 0))
  text.fill((255, 255, 255))
  text.blit(t, (0, 0))
  # Play the note
  fluidsynth.play_Note(note, channel, 100)
  time.sleep(0.50)

async def get_note_data():
   note_dictonary = {
          '1': 'C',
          '2': 'D',
          '3': 'E',
          '4': 'F',
          '5': 'G',
          '6': 'A',
          '7': 'B'
  }

   get_numbers = await get_deepgram_transcript()
   data = []
   if 'results' in get_numbers:
       data = get_numbers['results']['channels'][0]['alternatives'][0]['words']


   return [note_dictonary [x['word']] for x in data]
data = asyncio.run(get_note_data())
i = 0

while i < len(data):
   # Blit the picture of one octave OCTAVES times.
   for x in range(OCTAVES):
       screen.blit(key_graphic, (x * width, 0))
   # Blit the text surface
   screen.blit(text, (0, height))
  # Check all the white keys
   for note in playing_w:
      diff = tick - note[1]
      # If a is past its prime, remove it, otherwise blit the pressed surface
      # with a 'cool' fading effect.
      if diff > FADEOUT:
          fluidsynth.stop_Note(note[2], channel)
          playing_w.remove(note)
      else:
          pressed.fill((0, ((FADEOUT - diff) / FADEOUT) * 255, 124))
          screen.blit(pressed, (note[0], 0), None, pygame.BLEND_SUB)
   if tick > i/4:
       play_note(Note(data[i], octave))
       i += 1
      # if i == len(data):
      # i = 0


   pygame.display.update()
   tick += 0.005 # or 0.001 or 0.0001

Starting Out with Python and Deepgram Live Streaming Audio

Tonya Sims — Fri, 24 Jun 2022 15:38:30 +0000

Python Web Frameworks for Live Audio Transcription

This blog post will summarize how to transcribe speech-to-text streaming audio in real-time using Deepgram with four different Python web frameworks. At Deepgram, we have a Python SDK that handles pre-recorded and live streaming speech recognition transcription, which can be used with your framework of choice.

FastAPI Live Streaming Audio

FastAPI is a new, innovative Python web framework gaining popularity because of its modern features, such as concurrency and asynchronous code support.

Working with WebSockets in FastAPI is a breeze because it uses the WebSocket API, making it easier to establish two-way communication between the browser and server. There’s a section about working with WebSockets in the FastAPI documentation.

FastAPI is very easy to use because of its thorough documentation, so even beginners can get started. Remember that supporting community resources, as a newer Python web framework, may not be as robust as other options. It didn’t take long to get FastAPI up and running with Deepgram’s live streaming audio speech-to-text transcription in Python. We wrote a step-by-step tutorial on using FastAPI with Deepgram for real-time audio transcription in Python.

Flask 2.0 Live Streaming Audio

Flask 2.0 is a familiar, lightweight, micro web framework that is very flexible. It doesn't make decisions for you, meaning you are free to choose which database, templating engine, etc., to use without lacking functionality. Check out the tutorial we wrote on using Flask to get up and running with a live-streamed audio speech-to-text transcript in Python.

Flask does not have WebSocket support built-in, but there is a workaround. You use aiohttp, an Async HTTP client/server for asyncio and Python. It also supports server and client WebSockets out of the box.

Once you get aiohttp configured for WebSockets, getting Flask 2.0 working with Deepgram is pretty straightforward. If you'd like to work with a Python framework similar to Flask with WebSocket support built-in, you can use Quart.

Quart Live Streaming Audio

Quart is a Python web microframework that is asynchronous, making it easier to serve WebSockets. Quart is an asyncio reimplementation of Flask. If you're familiar with Flask, you'll be able to ramp up on Quart quickly. We have a tutorial on using Quart with Deepgram live streaming audio speech-to-text.

Getting started with Quart was very simple. They have a short tutorial on WebSockets on their website that covers the basics. Since Quart is very similar to Flask, there wasn’t as much ramp-up time, which is nice. Quart also has support for WebSockets, so there was no need for extra configuration, and it worked perfectly with Deepgram’s live streaming audio.

Django Live Streaming Audio

Django is a familiar Python web framework for rapid development. It provides a lot of things you need "out of the box" and everything is included with the framework, following a “Batteries included” philosophy.

Django uses Channels to handle WebSockets. It allows for real-time communication to happen between a browser and a server. The Django Channels setup was different than the other three Python web frameworks but was easy to follow because of their documentation. It might be good to have a little experience with Django, but if you want to use it with Deepgram, check out the blog post we wrote on using Django to handle real-time speech-to-text transcription.

Final Words

Hopefully, you can see that regardless of your application's Python web framework choice, you can use Deepgram speech-to-text live streaming transcription. As a next step, you can go to the Deepgram console and grab an API Key. You'll need this key to do speech-to-text transcription with Deepgram and Python. We also have missions to try in the console to get up and running quickly with real-time or pre-recorded audio-to-text transcription.

Please feel free to Tweet us at @deepgramdevs. We would love to hear from you!

DEV Community: Tonya Sims

Identify Sales Insights from Meeting Audio

Conversation Intelligence and Sales Insights from Meeting Audio

Build an Agent Assist Bot with Python

Using a Speech-to-Text Provider With a Chatbot in Python for Agent-Assist

Why We Need AI Speech-to-Text With Customer Assist Using Python

Why We need Chatbots Customer Assist Using Python

Speech-to-Text Chatbot with Python

Full Code of Speech-to-Text Chatbot with Python

Taking Notes with Voice in Python

A Learn-by-Doing Speech AI Project in Python

Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK

Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python

Step 3 - Setup Your Python Project

Step 4 - Install Your Python Libraries and Packages using pip

Step 5 - How to Transcribe the Audio File in Python with Voice

Transcribe a Local Audio File with Python

Transcribe a Hosted Online Audio File with Python

Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python

Final Step - Run the Python Voice Note-Taking Project and Export the Results

Conclusion of the Python Voice Note-taking Project with Speech Recognition

Compliance Monitoring for Call Centers

Before You Start with Compliance Monitoring in Python

Installing Python Packages for Compliance Monitoring with Speech to Text

Python Code Dependencies and File Setup

Define the Python Variables

The Python Callback Code for Compliance Monitoring with Speech to Text

Getting the Microphone Audio in Python

Open the Websocket and Connect to Deepgram Real Time Speech to Text

Run the Python Code for Compliance Monitoring

Extending the Project Compliance Monitoring with Speech to Text

How to Loop Through a Podcast Episode List using Async IO with Python

High-Level Overview of Asynchronous and Synchronous Code in Python

Transcribing Podcast Audio with Python and Speech-to-Text Using AI

The Python Code and Explanation with Async IO Keywords(async/await)

Conclusion

Full Python Code Sample of Looping Through a Podcast Episode

How to Transcribe Only What You Need with Python: Listening Before Connected

Using a Buffer in Python to Store Audio Data from Speech-to-Text Transcription

Python Code Explanation for Using a Buffer with Speech-to-Text Transcription

Conclusion

Identifying the Best Agent to Respond in Your IVR System

Create Voice Recognition Phone IVR With Speech Recognition Using Twilio and Python

Code Breakdown for Creating IVR Speech-to-Text With Language Detection Using Python

Conclusion

The Python Flask Code for the IVR Speech-To-Text Application

Build a Web Scraper With Your Voice Using Python

Python Code Web Scraper Using a Voice Command With Speech-to-Text

How To Monitor Media Mentions in Podcasts with Python

Method 1: Monitor Media Mentions in Podcasts Using Diarization with AI Speech Recognition

Method 2: Monitor Media Mentions in Podcasts Using Search and Entity Detection

Python Code Breakdown With AI Deepgram Speech-to-Text and SpaCy

Conclusion

Topic Detection in Podcast Episodes with Python

Discovery of Deepgram AI Speech-to-Text

Python Code for AI Machine Learning Topic Detection

Conclusion

Full Python Code for the AI Machine Learning Podcast Topic Detection Project

How to Use Voice to Control Music with Python and Deepgram

What We’ll Need to Play Voice-Controlled Music Using AI

FluidSynth

Deepgram API Key

Create a Python Virtual Environment

The Code to Play Voice-Controlled Music with Python and AI

Deepgram Python Code Explanation

Next Steps to Extend the Voice-Controlled Python AI Music Example

The Entire Python Code for the Voice-Controlled Music Example

Starting Out with Python and Deepgram Live Streaming Audio

Python Web Frameworks for Live Audio Transcription

FastAPI Live Streaming Audio

Flask 2.0 Live Streaming Audio

Quart Live Streaming Audio

Django Live Streaming Audio

Final Words

Step 4 - Install Your Python Libraries and Packages using `pip`