A funny thing happened when Deepgram first decided to use end-to-end deep learning (E2EDL) to design our next-generation speech-to-text (STT) solution. We found that this approach was hugely flexible and easier to optimize than traditional STT. We didn’t have to reconnect and optimize multiple models (acoustic, pronunciation, and language) every time we wanted to make a change. And we could retrain and enhance our speech models without starting from scratch. With transfer learning, we could build new speech models faster. This trait of our technology has allowed us to build different base speech models for different use cases and needs. It also allows us to tailor models in cases where a customer needs something specific that we don’t currently offer. Let’s take a look at the two types of models that we offer here at Deepgram and what each is good for.
1. Language-by-Use Case Models
All of our use case-specific models are available in various English dialects. We are expanding into different language-by-use case combinations as we continue to train and optimize our speech models for specific circumstances, such as call centers or meeting transcription, as well as expanding the spoken languages we offer. Our customers have found that combining a spoken language and use case to create a speech model that works specifically for their needs is more accurate than Big Tech’s out-of-the-box, one-size-fits-none models. These targeted models have the fastest speed and are optimized for the best scalability. Our models can transcribe one hour of pre-recorded audio in 30 seconds. These models are great for all applications, especially ones that need very high speeds or cost savings for on-prem use. You also don’t need to trade off speed or scalability for high accuracy and because we have multiple models for different use cases-unlike Big Tech-our models tend to be more accurate as well.
2. Higher Accuracy Enhanced Models
We also built our next-generation architecture with the highest English language accuracy on long-tail words or words that are not as common in regular conversations. This new architecture was rebuilt from our current architecture to optimize accuracy on more words. This new enhanced speech model architecture is best suited where you have keywords and terms that you must get correct but are not in normal conversations; like fiduciary, biodiversity, formulae. Some use cases can be Conversational AI for B2B, technical support contact centers, or technical meetings or seminars.
3. Models Tailored for Your Business
But what if we don’t have a use case model specifically for your needs? Maybe your audio has a lot of background noise, accents, jargon, or product and company names; all of this can sometimes create problems for off-the-shelf models. If that’s the case for you, here at Deepgram we can customize a model for your specific use case. These tailored models can be trained and deployed within weeks and are specifically targeted to address the characteristics of your use case that might make it hard for an off-the-shelf model. To make sure that the tailored model really does address your specific issues, the data for training these models requires audio from your specific business. The more “real world” audio from your business, the better the accuracy. Having an employee read off a script or list of terms creates poor data vs. recording your employee and customer having a conversation. Although we like to say that the more real-world audio you can provide, the better, we’ve seen good accuracy improvement with less than 10 hours of audio.
Deciding Which ASR Platform is Best for You
There are obviously a lot of factors that go into deciding which ASR system will work best for you, beyond the ability to tailor models. If you’d like to read more the factors that you should consider when shopping for an ASR platform, check out How to Evaluate an ASR Platform, or fill out our free Speech-to-Text Self Assessment. Still have questions? Contact us to talk through your use case and see which of our models is best for you.
More with these tags:
Share your feedback
Was this article useful or interesting to you?
We appreciate your response.
Speech-to-Text Model for Ukrainian Released
- Chris Doty
Deepgram Has Received SOC 2 Type II Certification
- Ehab El-Ali
New Releases - Five New Languages and Three New Use Case ...
- Natalie Rutgers
Hell Yes, We Have SDKs, APIs, and Docs
- Keith Lam