There's a long-standing joke that a language is just a dialect with an army and a navy. Part of the reason this phrase has stuck around is because it's pithy, but it's also true in some cases. The language variety considered as a standard is typically that spoken by the leaders, or people who live in the capital (whether the actual political capital or a cultural capital). In the case of "the Queen's English", that's even part of the name of the particular variety.
Although this might seem to address the issue of language versus dialect, it's actually quite a bit trickier to define these two terms and to group different language varieties accordingly. In this post, I'm going to walk you through how linguists think about this issue, the usual definition used in linguistics, and some of the problems with that definition. But before we dive into that, let's take a moment to define our terms.
Some Quick Definitions
Before we get started, it's worth thinking about how linguists define a language and a dialect in the abstract.
A language is a spoken or signed system of communication used by humans. Human languages have certain features that set them apart from the communication systems used by other animals, including recursion, displacement, and the ability of language to talk about itself (as we're doing in this post). Although the communication systems of other animals have some features in common with human language, no system (that we know of, anyway) exhibits all of these features. If you want my bet on which comes closest, it wouldn't be chimps, or dolphins, or vervet monkeys—but prairie dogs.
A dialect is a particular form of language, usually defined geographically or socially, that has unique features of grammar and pronunciation. In the US, we often talk casually of southern dialects, New York, Boston, etc. A similar situation exists in the UK for English, and, in fact, most countries have a number of dialects in addition to the standard form you might learn in the classroom.
The term accent often gets mixed into the conversations about what a language or a dialect is, but is actually something more specific. At a high level, we can think of an accent as meaning one of two things. In the first, which is most germane to our conversation here, accent refers to the pronunciation features of a particular language or dialect-for example, dropping the /r/ at the ends of some words are features of some dialects of English.
That is to say, an accent is one aspect of a dialect, the one related to how things are pronounced. But as noted above, dialects also have unique grammatical features that set them apart, so accent does not equal dialect, although it can be one of the most salient aspects of one. The second meaning of accent refers to a non-native accent, such as a French or Polish accent when someone is speaking English. In this case, the term is referring to a kind of interference from the pronunciation of the native language that's showing up in the non-native language.
Finally, the term variety is a neutral term that linguists use when they're talking about a particular form of language without making any claims about whether it's a language or a dialect. You'll see this term used throughout this post.
The linguistic standard: Mutual intelligibility
With those definitions out of the way, let's turn to the question at hand. How do linguists decide if two different varieties constitute separate languages or dialects of the same language? The usual test is referred to as mutual intelligibility-can speakers of the two varieties in question understand each other? If so, we say they speak the same language, or different dialects of the same language. If they can't, then they speak two different languages. Sounds easy, right? If only that were true...
Challenges for mutual intelligibility
Mutual intelligibility works great for some cases-if you're comparing speakers of English and Mandarin, for example. It's unlikely they'd understand anything that the other person says, except in the rare case of a borrowed word. But what about in other cases, where languages are more similar? This is, after all, what we're interested in. English and Mandarin are obviously different languages.
How intelligible is intelligible?
For example, what if you're comparing Swedish, Norwegian, and Danish which are all very similar to one another? You can see an example sentence of each of these three languages from a Wikipedia article below to see just how similar these languages are.
År 1877 lämnade Brandes Köpenhamn och bosatte sig i Berlin. Hans politiska åsikter gjorde emellertid det obehagligt för honom att uppehålla sig i Preussen och år 1883 återvände han till Köpenhamn, där han mötte en helt ny grupp av författare och tänkare, som var ivriga att anta honom som sin ledare.
I 1877 forlet Brandes København og busette seg i Berlin. Dei politiske synspunkta hans gjorde likevel at det vart ubehageleg for han å opphalde seg i Preussen, og i 1883 vende han tilbake til København, der han vart møtt av ei heil ny gruppe forfattarar og tenkjarar, som var ivrige etter å få han som leiaren sin.
I 1877 forlod Brandes København og bosatte sig i Berlin. Hans politiske synspunkter gjorde dog, at Preussen blev ubehagelig for ham at opholde sig i, og han vendte i 1883 tilbage til København, hvor han blev mødt af en helt ny gruppe af forfattere og tænkere, der var ivrige efter at modtage ham som deres leder.
In 1877 Brandes left Copenhagen and took up residence in Berlin. However, his political views made Prussia an uncomfortable place to live, and in 1883 he returned to Copenhagen, where he was met by a completely new group of writers and thinkers, who were eager to accept him as their leader.
As should be obvious, these languages are extremely similar, but not identical. So what do we do with mutual intelligibility if someone understands 70% of what someone says? Are they speaking the same language? What if they understand 50%? 30%? Where is the cut-off to decide that the varieties are dialects of the same language, or two different languages? I once worked with a linguistic consultant from Bhutan. I was trying to get a sense of how similar his variety was to others spoken in the country (since, another problem with mutual intelligibility-I didn't have speakers of all of them to do experiments and see who could understand who), and I asked him if he could understand the people who lived in the next valley over from where he grew up. He thought for a moment and then described it as a "three-day dialect", meaning that he needed about three days of living around people who spoke the language before he could understand what people there were saying. Are these two varieties of the same language, or two different languages?
There are also cases of asymmetrical intelligibility, where it's easier for speakers of one language to understand speakers of another than vice versa. For example, speakers of Portuguese often report being able to understand quite a bit of Spanish, but Spanish speakers typically can't understand much Portuguese at all. A similar situation exists in Scandinavia, where Danish speakers typically have an easier time understanding Swedish and Norwegian than the other way around. (If you're curious, in both cases, this is due to some changes in the pronunciation of words that affected Portuguese but not Spanish, and Danish but not Swedish or Norwegian.)
And, to further complicate the issue of mutual intelligibility, who are we asking if they can understand something? I'd guess that I, as someone who has studied linguistics and lived abroad several times, would be better at understanding different English dialects than someone who doesn't have that kind of experience. Or consider trying to compare, say, Polish and Russian. If you ask someone who's older, who grew up in the USSR and was exposed to Russian extensively, they're probably likely to view Polish and Russian as more similar than a younger person who has less exposure to Russian on a regular basis.
So, how do you know what's a language and what's a dialect?
As we've seen, linguists do have a clear definition of what constitutes a language versus a dialect-but that definition, on the ground, is a lot more complicated than it might appear at first glance. Unfortunately, that means I can't leave you with a clear, surefire way of determining what's a language and what's a dialect. As mentioned at the outset, a lot of what we refer to today as languages are groups of dialects, often based on geopolitics, with one of those dialects seen as the standard version.
All of this is to say that, if language versus dialect is important to what you're working on-say, a speech-to-text model-it's important to ask questions of your provider to make sure that you understand how they're referring to languages and dialects and how their models will work for what you need.
* Norwegian has two dialects-bokmål (literally, 'book language') and nynorsk (literally, 'new Norwegian'), I chose nynorsk here because it's the most different from Swedish and Danish, but you can still see how similar it is.
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.
More with these tags:
Share your feedback
Was this article useful or interesting to you?
We appreciate your response.
Benchmarking OpenAI Whisper for non-English ASR
- Dan Shafer
What is Code-Switching? And How Did it Make English?
- Morris Gevirtz
Text Cleaning for ASR: The Case of Turkish
- Morris Gevirtz
- Duygu Altinok
- Chris Doty
Whats the Difference Between a Language and a Dialect?
- Chris Doty