Sentiment analysis and emotion recognition are two of the hottest topics in speech understanding today. But they're often confused for one another-so much so that people often say "sentiment analysis" when they're referring to emotion recognition. In this post, we'll explain what both sentiment analysis and emotional recognition are, how they are used in business, and some of the limitations and challenges of each.
1. What is Sentiment Analysis?
Sentiment analysis is a typically text-based machine learning classification task. It might operate on single sentences, paragraphs, or even entire articles. The typical goal of sentiment analysis is to determine whether the author of a text has a positive or a negative opinion about whatever the topic of the text is. To this end, the typical training sets for sentiment analysis models are things like IMDb reviews of movies and Amazon product reviews, where it's easy to tell how someone felt about a topic (that is, their star ratings can be used as part of the training data). Sentiment analysis has a variety of uses, including analyzing customer feedback, monitoring social media conversations, tracking brand reputation, gauging public opinion on a topic or issue, and evaluating customer satisfaction levels.
Limitations and Challenges of Sentiment Analysis
There are, of course, limitations to systems like this. Sarcasm, for example, can be hard for sentiment analysis to detect (which isn't surprising since humans also struggle to correctly identify sarcasm in written language). That might be less of a problem when you're training and have the groundtruth of someone's rating, but in the real world, the goal of sentiment analysis is to determine how someone felt in the absence of a rating.
2. What is Emotion Recognition?
The second way that people use the term sentiment analysis is to refer to what is more appropriately known as emotion recognition (sometimes called emotion detection, or incorrectly called emotional recognition). Unlike sentiment analysis, emotion recognition typically relies on audio data, rather than text, and uses things like intonation, volume, and speed to determine what emotion a speaker is feeling, usually coded as one of several categories like, happy, sad, angry, etc. It's important to note that this doesn't automatically correlate with how someone feels about a topic. Someone can be happy talking about how they don't like something (who doesn't love to vent?), so ultimately, emotion recognition is trying to get at different information than sentiment analysis. Uses of emotion recognition include helping call center agents understand how a caller is feeling, monitoring hospital patients for stress and pain, and even tracking responses to advertisements.
Limitations and Challenges of Emotion Recognition
If you've ever communicated with another human being-and we hope you have-you know that even with all of our experience with social interactions, it can still be tricky to determine someone's emotional state just from talking to them. This is doubly problematic for attempts at emotion recognition. Not only are you trying to make a system to do something that's tricky for humans, you're going to do so using a dataset that humans have labeled based on the emotion that they think is present, even though they might not agree, and even though their labels might not accurately match the emotion the speaker was actually feeling.
This is further complicated by the fact that the audio used to train the model might be acted data, and not people actually expressing the emotion that they're experiencing. Plus, most emotion recognition systems only look at audio data, and don't include other things that could help make a determination, such as body language or facial expressions. It's also the case that we do more with our voice than express emotion-for example, sarcasm in English carries a particular kind of intonation that's recognizable, but sarcasm isn't an emotion. This creates an added complication for emotion recognition systems.
3. A Combination of Approaches
Because of the challenges of sentiment analysis and emotion recognition, some people have tried to combine the systems to try and better understand how people feel about a topic. If you have audio of someone speaking, in addition to conducting emotion recognition on that audio, you can also use a product like Deepgram to transcribe the audio into text, and then apply a text-based sentiment analysis model. This approach obviously only works when you have audio, and so isn't appropriate for all use cases, but it can provide additional insight when working from spoken data.
Ultimately, sentiment analysis, emotion recognition, or some combination of the two systems can help drive improvements in customer service and retention. By harnessing audio and text data to determine how customers (and employees) are feeling and communicating, you can recommend early steps to help customer service agents, improve retention, and extract valuable insights from unstructured data. If you want to learn more about what the future of voice tech looks like for customer experience, check out our recent webinar Importance of Voice Technology for Customer Experiences, which highlights some of the ways that voice tech tools like sentiment analysis and emotion recognition are being used today to power incredible customer experiences.
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.
More with these tags:
Share your feedback
Was this article useful or interesting to you?
We appreciate your response.
Why Deepgram Built its Platform in Rust
- Adam Sypniewski
Exploring OpenAI Whisper Speech Recognition
- Julia Strout
Try Whisper: OpenAI's Speech Recognition Model in 1 Minute
- Michael Jolley
Use OpenAI Whisper Speech Recognition with the Deepgram API
- Scott Stephenson