The excitement around speech recognition is real: it has the potential to power the next wave of modern applications and give businesses and vendors a competitive advantage. But, with excitement comes misaligned expectations. Speech recognition is a messy, tough and persistent problem for enterprises, one that has languished under existing technology providers for decades. At Deepgram we have been working to change that by rebuilding speech recognition from the ground up. Today, we celebrate a key milestone on our path with a $12 million Series A round led by Wing VC, with participation from NVIDIA, Y Combinator, Compound and SAP.iO.
Why Speech Recognition?
Today, getting actionable information from recorded phone conversations and meetings is time and resource-intensive, costly and cumbersome. Audio recordings don’t play by the same rules as text or data. They’re messy and idiosyncratic and go far beyond the short pre-programmed phrases that Siri and Alexa rely on. There’s no silver bullet to speech recognition, especially when it comes to speed, scale, accuracy and reliability.
"Enterprise communication amounts to trillions of minutes annually, but making sense of what's being said is technically challenging and cost-prohibitive,” said Zach DeWitt, Partner at Wing VC.
“Deepgram is building mission-critical infrastructure to empower enterprises to accurately transcribe and analyze their calls and meetings in real-time, across multiple accents and languages. Deepgram is helping companies better serve their customers and employees by unlocking unutilized voice datasets."
You can read Zach’s blog post about the need for Deepgram and the importance that companies utilize speech recognition to better interpret customer needs and serve their employees.
Rebuilding Speech From the Ground up
The idea for Deepgram began while I was a PhD student at University of Michigan. My cofounder and I were researching the detection of dark matter two miles underground and in the hours not devoted to research, we life-logged (we made devices that recorded backup copies of the audio surrounding us, 24/7). When we tried to go back and find key conversations and specific moments in those audio files, we felt the very real pain of not having a good tool available to help process the recordings and pinpoint valuable timestamps. That was the spark that created Deepgram.
Deepgram has taken an entirely new approach to speech recognition, replacing what hasn’t worked — heuristics-based speech processing — with fully end-to-end deep learning. Audio recordings are complex and infinitely varied, meaning there is no one quick-fix to speech recognition. That’s why we train speech models to learn and adapt under complex, real-world conditions with customers’ unique vocabularies, accents, product names and acoustic environments.
Companies dealing with challenging audio from conference calls or call centers previously struggled to make speech scalable, precise and fast enough. With Deepgram, they can transform their speech data into an enterprise asset. Our speech recognition reliably acts as a foundational layer within the next generation of business applications, allowing companies to build something with speech that actually works.
Making Speech Work In The Enterprise
Since going to market, we’ve amassed customers across the call center, retail and tech industries, and partnered with some of the leading large-scale communication and conferencing providers. Developers, data scientists, product managers and CIOs at these companies all trust Deepgram because our unique approach delivers a high-level of accuracy quickly, and at scale.
Our customers and partners work with us because of our vision, our team and our commitment to continually refining and innovating our product. As part of that product innovation, along with our Series A, we’re also announcing two new features of our platform:
- Real-Time Streaming: an industry-first advancement in speech recognition that lets our customers analyze and transcribe speech as words are being spoken. More complex use cases are long running real-time transcription for meeting platforms or powering real-time agent assist for call center agents to achieve more effective customer service. A simple use case is "command and control" interactions like dictating doctor’s notes or ordering takeout from your favorite restaurant chain.
- On-Premises Deployment: Deepgram On-Premises Deployment provides a private, deployable instance of the Deepgram platform for speech recognition use cases involving confidential, regulated, or otherwise sensitive audio data in enterprise. It delivers the same scalable, high-performance, high-accuracy speech recognition capability as the Deepgram cloud, while allowing enterprises to manage the solution on-premises.
“One of the next big frontiers in the AI revolution is conversational intelligence,” said Jeff Herbst, Vice President of Business Development at NVIDIA.
“Deepgram is doing groundbreaking work in this field, and we are delighted to be working closely with them. Their world class GPU-accelerated speech recognition enables faster, more accurate natural language processing that will make an important impact on a range of industries.”
“As SAP drives toward combining experience data with operational data, Deepgram’s unique ability to automate transcription and intent recognition from voice conversations with high accuracy enables companies to provide a high quality customer experience,” said Ram Jambunathan, Managing Director of SAP.iO. “We’re excited about Deepgram’s potential to enable rich, voice based insights for SAP customers.”
We’re so excited about what’s next. The speech recognition opportunity is huge, and the endorsement from these amazing investors validates that we have the team, technology and vision to crack it. We strive to become the de facto speech company by unlocking valuable voice data for our customers, giving them a competitive advantage in their industry. This round is going to help us do just that.