All posts

How Closed Captioning is Enabled by ASR

How Closed Captioning is Enabled by ASR

There are many different ways that closed captioning is enabled by speech-to-text and automatic speech recognition (ASR). In this blog post, we’ll discuss some of the most common of these use cases, including creating captions for live events, podcasts, and transcribing videos. We’ll also explore the benefits that state-of-the-art, AI-powered speech-to-text solutions like Deepgram can provide to closed captioning companies. But before we start, let’s define what exactly closed captioning is.

What is Closed Captioning?

Closed captioning is a means of adding a transcript of what is said to video files. It’s similar to subtitles, but while subtitles are usually intended for someone who doesn’t speak or understand the language used in the video, closed captions are intended for those who might be deaf or hard of hearing. However, Verizon Media found that 80% of closed caption users aren’t hearing impaired, so this is a feature that’s expanding in use. And, in case you’re curious-”closed” here means that the captions aren’t visible to the viewer until they turn them on.

Use Cases for Speech-to-Text in Closed Captioning

It might seem like there’s just one use case for ASR in closed captioning-namely, providing a transcript of what was said-but we can think about several different domains where captions can be generated with speech-to-text solutions, and that have real advantages over using human transcriptionists.

Live Events

Speech-to-text for live captioning is one of the prime use cases for ASR solutions. These captions are especially important for live events because they allow people who are deaf or hard of hearing to follow along with what is happening and be included in the event. And, these captions can also help others-those too far away from the speaker to hear clearly, for example, can also benefit. This type of captioning can be done with or without human intervention, but it’s important to have someone who is familiar with ASR monitoring the captioning process to ensure accuracy.

Live Television

Similar to live events, live television is another place where closed captions powered by AI speech-to-text can have a big impact. If you’ve ever tried to watch something live with closed captions turned on, you know that they’re often delayed several seconds while humans transcribe what was said. But by using speech-to-text for captioning, transcriptions can be generated in real time, removing delays and lag.

Education and Training

Captioning can also be used to transcribe pre-recorded videos or podcasts. This is often done for educational or training videos, but it can also be used for other types of video content. ASR can be used to create a transcript of the video, which can then be used to create captions. This type of captioning is important for making sure that all viewers can access the information in the video, regardless of whether they are able to hear the audio.


Although you might mostly associate captions with video content, they’re also a critical component of accessibility for podcasts. Podcast content has exploded in recent years and has become a major type of media. But it’s one that can be difficult or impossible for people who are deaf or hard-of-hearing to access without captions. These captions can help other people, too-non-native speakers, people listening with background noise, and those who’d rather read content than listen to it, to name a few. You can read more about the importance of captioning for podcasts at Podcast Accessibility.

White Paper

State of Voice 2022

Benefits of AI-Powered ASR for Closed Captioning

It should be clear from the above that closed captioning powered by AI speech recognition has a lot of potential use cases. But not all speech-to-text solutions can be used for real-time closed captioning. Older systems that rely on a legacy approach are typically too slow for live use cases. However, end-to-end deep-learning ASR solutions like Deepgram can turn around transcriptions in fractions of a second, creating a truly real-time experience. So what are some of the specific benefits that AI-powered speech-to-text tools can provide when compared to a human? Let’s take a look.


As noted above, if you’ve ever tried to watch a sporting event in a bar, for example, it’s very obvious that the captions are delayed-oftentimes so delayed it’s hard to match them up to what’s happening. With a speech-to-text system that runs in real time, these delays can be reduced to fractions of a second so that transcriptions more closely match what’s being said in time.


Another issue you might have seen with live events is that the accuracy of the captions can suffer. This can be anything from small typos to misheard words to complete gibberish. AI-powered speech recognition systems can have accuracies of over 90%. And, with custom model training, you can use audio from your particular domain to further improve a model thanks to transfer learning-something that’s not possible with older speech-to-text systems.

Automatically Align Audio and Captions

If you’re working with human-created transcripts on pre-recorded audio, someone-usually the transcriptionist-has to manually align each caption to the right part of a video, which can be a tedious and time-consuming process. Because ASR transcriptions output start and end times, it’s much easier to correctly align captions with the audio.

Cost Savings

Paying to have a video transcribed can be quite expensive. But with ASR, the cost savings over human transcriptionists can be substantial. And, with AI-powered solutions, you can run multiple audio streams at the same time without losing speed or accuracy, allowing for more things to be transcribed for less.

One Problem with Using ASR for Closed Captioning

Before we wrap up, it’s worth noting that there’s one issue that you can run into if you’re trying to use only ASR for your transcriptions of things like TV shows or movies: even the most sophisticated system won’t be able to tell which [door creaks] or [spooky whispering] should be included in the captions to help those who are deaf or hard-of-hearing understand what’s happening on-screen. In these cases, you’d still want a human in the loop to make sure that any important, non-speech audio is included in the captions. But the ASR transcript can still be used as the base, providing many of the features above, like speed and time syncing, even if a human needs to be included.

Wrapping Up

And there you have it-some of the main ways that ASR tools can deliver strong benefits for anyone who needs to generate closed captions. If you’re curious how Deepgram’s speech-to-text API can help your captioning use case, give us a try and get $150 in free credits. Have questions? Reach out and we’ll be happy to talk through your use case with you and see how we can help.

More with these tags:

Share your feedback

Thank you! Can you tell us what you liked about it? (Optional)

Thank you. What could we have done better? (Optional)

We may also want to contact you with updates or questions related to your feedback and our product. If don't mind, you can optionally leave your email address along with your comments.

Thank you!

We appreciate your response.