Icing my swollen, disfigured hand, I was sitting on the couch, unable to drive to the store to grab some bandages and medication for the intense pain. I pulled up the website for the nearest store and started typing in the items I was looking for, all with one hand. It was my non-dominant hand at that. 

I gave up. I was simply in too much pain and it was taking me forever to order these items online for delivery. 

You may have spotted the problem. I couldn’t type fast enough and got impatient. My hand and fingers ballooned in size, and the pharmacy was also losing business because I couldn’t order what I needed.

You might be wondering how I broke my hand and what this has to do with building an agent-assist bot in Python. To keep a long story short, someone accidentally slammed the car door shut on my hand. It seemed fine, until a few hours later when it started turning blue and the pain became immense. 

I didn’t go to the ER quickly enough and no one was around to take me. So I did what some people would do, I put an icepack on my hand hoping the swelling would go down. 

Nope, didn’t work!

That’s when I started to panic. At that moment I picked up my phone, barely, and that’s when I tried placing an order for emergency items with my “good” hand. 

Super frustrated I gave up.  

That would have been a wonderful opportunity to use a speech-to-text chatbot, so an agent could have helped me quicker instead of ordering every item separately and adding each to an online checkout cart. 

Enter Python.

Using a Speech-to-Text Provider With a Chatbot in Python for Agent-Assist

The situation with my now very hideous hand inspired the idea for this blog post tutorial. I thought to myself, how could my life have been made easier…and hand prettier, in the most simple, easiest way possible?

I would have loved to have just pushed a button and chatted with customer service, so my items could be ordered. By chat, I don’t mean type but rather talk and they send me a response based on what I say. That is pretty much an agent-assist chatbot using AI speech-to-text technology.

In this tutorial, I built a command line implementation of what that could have looked like using Deepgram, a speech recognition provider, ChatterBot a chatbot based on machine learning, and Python. 

If you’d like to see the full code, skip to the end of the blog post. Before jumping into the code explanation, let’s take a look at why we might need speech-to-text and chatbots. 

Why We Need AI Speech-to-Text With Customer Assist Using Python

There are many reasons why you might need automated speech recognition (ASR) for your next project, including:

  • Increase Accessibility - speech-to-text makes technology more accessible for people in various situations. 

  • It’s Faster than Typing - think of all the time that could be saved if you could just speak and not have to type anything.

  • Increases Productivity and Profitability - speaking of time, it’s a great productivity and profitability booster for all involved. 

These are just a few, but there are a bunch more use cases. 

Why We need Chatbots Customer Assist Using Python

Many companies need chat along with phone support and use chatbots for interactions with customers. A few advantages of chatbots are:

  • They have 24/7 Availability - they are available all hours of the day for customers to get their questions answered. 

  • Collect and analyze data - data can be collected and analyzed quicker from the chatbot sessions which improves customer experience.

Now we know why both speech-to-text and chatbots are important, so let’s dive into the tech and discover which tools to use to build our agent-assist chatbot with Python.

Prerequisites for Building the Speech-to-Text Chatbot with Python

There are a few things I needed to get set up first before I started coding.

  • Step 1 - Make sure to use a version of Python that is at or below 3.9, to work with our selected chatbot Python library, ChatterBot.

  • Step 2 - Grab a Deepgram API Key from our Console. Deepgram is a speech recognition provider that transcribes prerecorded or live-streaming audio from speech to text.

  • Step 3 - Create a directory called python-agent-bot on my computer and opened it with a code editor, like VS Code.

  • Step 4 - Inside the directory create a new Python file. I called mine chatbot.py

  • Step 5 - It’s recommended to create a virtual environment and install all the Python libraries inside, but not required. For more on creating a virtual environment, check out this blog post

  • Step 6 - Install the following Python libraries inside the virtual environment with pip like this:

pip install chatterbot==1.0.2
pip install pytz
pip install pyaudio
pip install websockets

Wonderful! Now that everything is set up let’s walk through the Python code section by section. Make sure to add it to the file chatbot.py.

Here we are importing the necessary Python packages and libraries we need for our speech-to-text chatbot with ChatterBot.

from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer
import pyaudio
import asyncio
import websockets
import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.ERROR)

Copy and paste the Deepgram API Key you created in the console and add it here:

DEEPGRAM_API_KEY = ‘YOUR_DEEPGRAM_API_KEY_GOES_HERE`

The below are setting we need for PyAudio, to grab the audio from your computer’s mic:

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 8000

Create a new instance of ChatBot and start training the chatbot to respond to you.

bot = ChatBot('Bot')
trainer = ListTrainer(bot)

 trainer.train([
   'Hi',
   'Hello',
   'I need to buy medication.',
   'Sorry you are not feeling well. How much medication do you need?',
   'Just one, please',
   'Medication added. Would you like anything else?',
   'No Thanks',
   'Your order is complete! Your delivery will arrive soon.'
])

This callback is needed for PyAudio which puts an item into the queue without blocking.

audio_queue = asyncio.Queue()

def callback(input_data, frame_count, time_info, status_flag):
   audio_queue.put_nowait(input_data)
   return (input_data, pyaudio.paContinue)

Next, we access the mic on our machine with PyAudio. 

async def microphone():
   audio = pyaudio.PyAudio()
   stream = audio.open(
       format = FORMAT,
       channels = CHANNELS,
       rate = RATE,
       input = True,
       frames_per_buffer = CHUNK,
       stream_callback = callback
   )

   stream.start_stream()

   while stream.is_active():
       await asyncio.sleep(0.1)

   stream.stop_stream()
   stream.close()

Here the WebSocket gets handled and hits the Deepgram API endpoint. In the nested receiver function is where we get the transcript, what the customer says, and print the agent’s response.

async def process():
   extra_headers = {
       'Authorization': 'token ' + DEEPGRAM_API_KEY
   }

   async with websockets.connect('wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=16000&channels=1', extra_headers = extra_headers) as ws:
       async def sender(ws):
           try:
               while True:
                   data = await audio_queue.get()
           except Exception as e:
               print('Error while sending: ', str(e))
               raise

       async def receiver(ws):
           async for msg in ws:
               msg = json.loads(msg)
               transcript = msg['channel']['alternatives'][0]['transcript']

               if transcript:
                   print('Customer(you):', transcript)
                   if transcript.lower() == "okay":
                       print('Agent: bye')
                       break
                   else:
                       response=bot.get_response(transcript)
                       print('Agent:', response)

       await asyncio.wait([
           asyncio.ensure_future(microphone()),
           asyncio.ensure_future(sender(ws)),
           asyncio.ensure_future(receiver(ws))
       ])

Finally, we call the main function to execute our code.

def main():
   asyncio.get_event_loop().run_until_complete(process())

if name == '__main__':
   main()

To run the program and give it a try, type python3 chatbot.py from your terminal. Start by saying Hi, then the agent will respond Hello in a typed message, and so on.

Here’s an example of what the conversation would look like:


I hope you enjoyed this tutorial and all the possibilities that come with speech-to-text and chatbots in Python. The full code is below. 

Full Code of Speech-to-Text Chatbot with Python 

from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer
import pyaudio
import asyncio
import websockets
import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.ERROR)

DEEPGRAM_API_KEY = "YOUR-DEEPGRAM-API-KEY"

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK = 8000

bot = ChatBot('Bot')
trainer = ListTrainer(bot)

trainer.train([
   'Hi',
   'Hello',
   'I need to buy medication.',
   'Sorry you are not feeling well. How much medication do you need?',
   'Just one, please',
   'Medication added. Would you like anything else?',
   'No Thanks',
   'Your order is complete! Your delivery will arrive soon.'
])

audio_queue = asyncio.Queue()

def callback(input_data, frame_count, time_info, status_flag):
   audio_queue.put_nowait(input_data)
   return (input_data, pyaudio.paContinue)


async def microphone():
   audio = pyaudio.PyAudio()
   stream = audio.open(
       format = FORMAT,
       channels = CHANNELS,
       rate = RATE,
       input = True,
       frames_per_buffer = CHUNK,
       stream_callback = callback
   )

   stream.start_stream()

   while stream.is_active():
       await asyncio.sleep(0.1)

   stream.stop_stream()
   stream.close()

async def process():
   extra_headers = {
       'Authorization': 'token ' + DEEPGRAM_API_KEY
   }

   async with websockets.connect('wss://api.deepgram.com/v1/listen?encoding=linear16&sample_rate=16000&channels=1', extra_headers = extra_headers) as ws:
       async def sender(ws):
           try:
               while True:
                   data = await audio_queue.get()
           except Exception as e:
               print('Error while sending: ', str(e))
               raise

       async def receiver(ws):
           async for msg in ws:
               msg = json.loads(msg)
               transcript = msg['channel']['alternatives'][0]['transcript']

               if transcript:
                   print('Customer(you):', transcript)

                   if transcript.lower() == "okay":
                       print('Agent: bye')
                       break
                   else:
                       response=bot.get_response(transcript)
                       print('Agent:', response)

       await asyncio.wait([
           asyncio.ensure_future(microphone()),
           asyncio.ensure_future(sender(ws)),
           asyncio.ensure_future(receiver(ws))
       ])


def main():
   asyncio.get_event_loop().run_until_complete(process())

if name == '__main__':
   main()

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions .

Unlock language AI at scale with an API call.

Get conversational intelligence with transcription and understanding on the world's best speech AI platform.

Sign Up FreeBook a Demo