We have been lucky to have two interesting and informative panel discussions on the evolution, current state, and future predictions of Conversational AI voicebots.
Our panelists included some of the pioneers of voicebots like Al Lindsay, former VP that lead the technical creation of Amazon Alexa, and companies that are creating a more human-like voicebot experience: Bitext, Elerian AI, OneReach AI, Uniphore, and Valyant AI. Our discussions were not about the Command and Respond type smart speakers (Amazon Alexa, Siri, and Google) but on applications that sound, feel, and respond like speaking with a human; where you feel you can have a more natural conversation with the machine. One case study example was a customer that continually said "Thank You" to the AI voicebot because the interaction was so human-like. Some interesting takeaways discussed:
Implementing a human-like voicebot is difficult and you have to be committed
Currently, there is no unified voicebot that can speak on any topic, in all dialects, and in all languages, so each voicebot must be built for a specific use case and domain.
The foundation of human-like voicebots is having a great speech-to-text application.
Conversational AI voicebots cannot replace humans and will enhance their jobs
Commitment to Voicebots
Plug and play voicebots don't exist yet as each voicebot must be trained to the domain and use case of the organization. For example, you can't use a voicebot trained for banking in England for food ordering in the U.S. The voicebot will not understand the accents, dialects, and terminology of the customer nor can it respond correctly. And you may want your text-to-speech engine to have a British accent instead of an American one. Remember, these conversational AI voicebots are not simple apps giving directions where it only has to respond with 20 different sentences. Kevin Fredrick, the Managing Partner of OneReach.ai, expressed this best when he said, "Building a Conversational AI voicebot is like planning to summit a mountain. Those who are looking for an 'easy button' get frustrated and quit. The ones who think it will be too hard, don't ever start. It is the ones who know the challenge is worth it and have the right partners and use the right tools who make the summit." Although it is not easy, the rewards and return on investment when done correctly are worth it. Better customer satisfaction with less waiting on hold, faster answers to customer's questions, more opportunities for add-on sales, and freeing your human agents for more difficult tasks are some of the outsized benefits.
A Unified Voicebot
Unlike the movies, there is no Conversational AI voicebot that knows everything. If you have seen the movie, "HER," Samantha may be the ultimate personal voicebot. Unfortunately, even with cloud computing, big data, and ultrafast connections, we cannot put all knowledge into one voicebot. For some voicebot companies, that may be the ultimate goal. Think of a voicebot that you can have a conversation with about finance and soccer and it remembers the previous conversations you have had and refers back to them. It can combine what it learned from current events, previous conversations with others, and provide you recommendations and responses. Currently, you need one voicebot trained for one domain and use case. But, as Antonio Valderrabanos, CEO of Bitext indicated, you may be able to combine a multitude of voicebots to get larger ranges of knowledge and conversations. So, how do we get there? Our experts think it will take both a large breakthrough innovation and a bunch of smaller innovations along the entire voicebot workflow from speech recognition to text to speech to create the unified voicebot. So, when you find yourself getting upset with Alexa or Siri, remember they are still a long way off from Samantha. They're only capable of so much.
Great Speech to Text is the Foundation
Dion Millson, CEO of Elerian AI summed it up best when he said, "For Conversational AI voicebots, it all starts off with speech recognition, if you don't understand what the person said and transcribe it to text accurately, you are not in the game. Unfortunately, the general ASR models standardize around 70% accuracy, and it is just not good enough to respond to a caller with real-time accuracy and relevance. Our partnership with Deepgram and their models in conjunction with our internal models that are trained on case-specific data get well over 90% accuracy." He further said that some words are much more important than others in a specific use case. For his banking customers, the account numbers, phone numbers, and government ID numbers are vitally important for the voicebot to provide the right response. You cannot be 70% accurate on these keywords, you need to be closer to 100% correct or the whole system fails. Inaccurate transcriptions sent to the artificial intelligence knowledge base will lead to incorrect responses or a "please repeat that " request. The foundation must be a highly accurate speech recognition solution that can be trained on the keywords for that use case.
Yes, robotics and machines have replaced humans in some roles, mostly in manufacturing, when they are doing one task or more repetitive tasks. For Conversational AI voicebots, we asked our panel if humans will be replaced by voicebots. Our answer was a resounding NO. Jason Curran, Head of Engineering at Valyant AI, a voicebot company focused on food orders, sees their voicebot as enhancing the drive-thru experience for the human clerk. Now, the clerk can focus on making sure the order is correct and providing great customer service instead of having to type in orders. Jason believes that his voicebots help a clerk's job satisfaction by eliminating the more mundane and repetitive tasks. This holds true for Elerian AI and Uniphore's voicebots as well. They help contact centers increase their productivity and satisfaction of their support and sales staff by removing them from answer the same questions over and over again like, "What is my balance?", "How do you turn on the router?", "When will my order be sent?", or "How much is the warranty plan?". Now, the staff can work on problem-solving more challenging issues or working on selling a higher value product; i.e. they can use their minds instead of saying repeating things by rote. Current voicebots are not going to replace humans on the phone or drive-thru speakers anytime soon, nor will they be an all-knowing entity. There is still a ways to go before we have a Samantha. For the full conversations, you can view the on-demand below:
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.
More with these tags:
Share your feedback
Was this article useful or interesting to you?
We appreciate your response.