You don't know this, but you don't speak English. You actually speak Fronglish—a mix of Anglo-Saxon, Norse, and Old French. You eat in French: pork, beef, poultry are French words. However, you hunt (and slaughter) in English: boar, cow, goose. Surprisingly, while you are born in English, the end of life comes in Viking. Death is taboo in many cultures, and so often the words used to refer to "the end" are euphemisms or foreign words. The word "to die" is a Norse word borrowing.
In addition to speaking fluent Fronglish, I also can speak Spanglish. I was raised in a more-or-less bilingual home—my mother is Chilean, my father a New Yorker. As a result, I have an easy time in California, where Spanglish is spoken as commonly as English and Spanish. I can read the New Yorker as well as Revueltas, a Chilean magazine. By contrast, being a Fronglish speaker is not sufficient training to understand French or Norwegian. This is different from being a Spanglish speaker. How do we explain this?
In this article, we will explore some of the core concepts of what language is and attempt to structure some decidedly complex human social phenomena. While we will be unable to draw hard and fast lines (because they don’t exist), this exercise is crucial for researchers who hope to use AI techniques to build language models. Because AI relies on well-structured, curated data, the definitions (and assumptions) that guide the structuring and curation make difference between success, failure or “bias.”
What is Code-Switching?
Spanglish, a portmanteau of “Spanish” + “English”, is generally considered code-switching. There are other types, of course, such as Hinglish (Hindi and English), Taglish (Tagalog and English), along with many other mixes. Each of these is different from each other in many ways, linguistic and cultural. But they do have a few things in common.
On its face, code-switching is a straight-forward communicative behavior: two or more people speak and, when they speak, they use two or more languages. This would be enough of a definition to disentangle Fronglish from Spanglish, if there existed a reliable definition for the concept “a language.” Sadly, the concept of language is not well defined either.
For example, according to 19th & 20th century nationalists, Norwegian and Danish are separate languages, yet I am told that the language of Madrid is the same language as the language of Santiago de Chile. Likewise, the Turkish of the hills of Rize is somehow Turkish, but the language spoken in the capital of Azerbaijan is not Turkish. I think more than one Istanbulite would agree that it is often easier to understand the Azerice of Azerbaijan than the “Rizece” of Rize!
The simple definition of code-switching suggests that an Azerbaijani person who speaks their native tongue in Istanbul is somehow engaged in code-switching rather than speaking in an odd or cool accent. If you consider Azerice and Türkçe to be dialects of one another, then the Azerbaijani living in Istanbul now just speaks Turkish, albeit with a different accent. Here we see that because we don’t have clear lines between “language” and “dialect” we have a hard time telling how many languages are being spoken at one time.
In his analysis of the state of Yiddish in the world, Max Weinreich is famously quoted as saying: "A language is a dialect with an army and navy." This pithy quote helps explain why the Danes and Norwegians don't speak the same language and yet the hill-folk of Rize and the pampered urbanites of Bebek in Istanbul somehow do. In the 19th century, nationalists worked hard to differentiate the ex-Viking dialects and drew a hard line where there wasn’t one before.
We need to put boundaries somewhere. Let's imagine that two languages are different so long as they are not mutually intelligible.
“Code-switching as a communicative behavior wherein 2 or more speakers are able to speak while mixing two languages.”
Great, we solved it. No need to think any further.
But wait, If you go to a taco truck and say, "Can I have three birria tacos and a tamarind Jarrito?" does this count as code-switching? The easy answer is no, but a better answer is, it depends. To understand why it depends, we have to ask ourselves what borrowed words are and why people do the code-switching thing.
This second definition belies a few fascinating notions.
This definition does not account for word borrowings.
For it to work, we must assume that the 2+ speakers have reasonably comparable understandings of the entirety of the code-switched languages.
Let's unpack these concepts.
Loanwords Aren’t Code-Switching
Using words from a different culture in your speech does not necessarily entail that you are switching back and forth between languages. When you order a taco, refer to the zeitgeist, or even say the word Saturday, you are not engaging in code-switching despite the fact that none of those words is of English origin. No language is pure in any sense of the word. Words, grammar, ideas, technology, art, just about every element in any one culture has been inherited over the millennia through contact with “different” people.
People come into contact with each other for different reasons, for different durations of time and as a result, cultural contact affects language and culture in unpredictable ways. Trade, war, migration, and invasion are four of the myriad reasons why large groups of people leave their home and travel to distant lands and interact with each other.
In the Middle Ages, European traders traveled along long difficult roads and plied treacherous waters looking for things to trade. Deals were made, and traders returned with useful and beautiful things.
The European traders who did business with Turks, Indians, Persians, Morroccans, etc. often learned local tongues. When they returned home and spoke of the exotic places and things they had seen, they used their own vernaculars (or Latin), inserting foreign words where needed. While the traders came by these ideas and terms through long-term or iterative contact with “foreign” peoples, the majority of their countrymen never even left their village when they learned the exotic words.
For example, sometime in the Middle Ages Germans became acquainted with Chinese apples "Apfelsine" (Apfel = apple, sine = Chinese). In the modern day we can interpret this as: "Chinese apple". You call these fruits “oranges”. Many languages got the names for "the orange" through imitation of pronunciation. We got that word from “norange” a French interpretation of an Italian word which came by way of Persians or Arabs. Trace it all the way back and you learn that the fruit and the name came from Tamil lands in southern India where it originally meant “fragrant fruit” நரந்தம்.
In this way, ordering a birria taco, or even a coffee (from Arabic قهوة “kahwa”) in the US does not (usually) represent a code-switching event. We must accept the fact that no culture is “pure” and that most of the goods and ideas we have today were developed by foreigners (from other lands and other times).
So Where Does Code-Switching Come From?
Trade and war can bring very different people into contact and, as a result, new words are often borrowed. However, trade and war don’t always result in prolonged, multigenerational contact between cultures. In scenarios where people do stay, make livelihoods for themselves and, most importantly, reproduce, multigenerational contact leads to very complex cultural power dynamics.
For example, the Anglo-Saxons learned the words beef, pork, and poultry as part of their slavery to the Norman kings. These were the terms for the expensive protein, the food of the kings. Cow, pig, and goose, the names for the animals hunted and raised for the benefit of the elites, remained English. In this scenario, (Norman) French was the language of the court and the nobility, Latin the language of the church and international diplomacy. The language of the peasant classes remained predominantly Anglo-Saxon and its dialects (which were complex in-and-of-themselves, in part due to the Viking invasions). The farther you were from the court and the clergy, the less likely it was that you spoke French.
In the movie Braveheart, the plucky rebel William Wallace impresses the princess and audiences with his ability to speak French and Latin, a sign that he was educated and a worthy general not just a peasant. Presumably, the historic William Wallace spoke Scots, some version of Middle English, and, according to Mr. Gibson, Latin and French. At that time, it would be doubtful that the princess would have known much (Middle) English (or Scots), unless of course her wet-nurse had taught her.
When William Wallace was alive, London-based nobility would speak French natively and were schooled in Latin. The servant staff would likely have been familiar with French, and the higher the servant’s post, the better their French would be. At some point in the social hierarchy, there were people who spoke good French and good local Anglo-language, too. What language is the right language for any task is a matter of complex, situation-specific social dynamics.
A significant portion of the American workforce was born and raised in a Spanish speaking home. For example, according to the State of California (pdf), ~38% of the labor force is “Hispanic” (including me). However, a large fraction of these Spanish speakers who live and work in the US immigrated to the US as adults. According to the US government, in 2018 nearly 11,000,000 Latin Americans (most, but not all Spanish speakers) moved to the US (not all stay). What happens when more than 30% of your labor force speaks Spanish? This is a classic population-level language-contact situation.
Now ask yourself, what would life be like in California if you grew up here but never learned more than 100 words in English (presumably for basic trade/survival purposes)? Instead you chose (or were forced to choose) to speak Spanish almost exclusively. I’d say it could become a rather hard situation. By contrast a very large number of Californians never learn Spanish! Some only speak English, while others do learn additional tongues at home including languages such as: Hindi, Gujarati, Farsi, Mandarin, Vietnamese, etc. Some, of course, don’t speak English or Spanish as they come from other regions of the world.*
Today, people who grow up in California all learn some dialect of English. English is the prestige language of the state and the US government. Which dialect of English you speak and what other non-English languages you learn depends on who your parents are and where in the state you grew up. If you sit outside the right coffee shop in the Bay Area, or you visit the right bar at the right time, you will hear people speaking in English and Spanish at the same time switching back and forth. Actually, I also get to hear people code-switching between Spanish and Nahuatl, but that is a whole ‘nother topic!
Sitting by our code-switching coffee sippers, you may hear full sentences in Spanish with the occasional English word and then a full sentence in English with a Spanish word thrown in. You may hear a sentence start in Spanish but end in an English saying. Unless you two are familiar with both languages, especially the sub-dialects being used, you’d be lost in the conversation. The people who talk like this can get by in each of these languages alone.
One person may be a Californio who grew up in ol’ San José. She, let’s call her Azeli, could do business in Spanish, but not super comfortably. Linguists call this kind of speaker “a heritage speaker.” Azeli probably speaks Spanish at home in this conversation and will use sayings, catch phrases, verbs, nouns and quips that all Californios use. Seated with her is Miguel, who came to the states when he was 18 and who speaks really good English. He reads novels in English and Spanish and so he represents the “ideal” foreign language learner. Our third interlocutor is David, who has been in California for 15 years. His English is not perfect, far from it. But he can get along, and do business with people who speak English well, but that’s because he works as a day laborer and does not need a large vocabulary or complex grammar.
These folks’ lingua franca is best described as Spanglish. The issues they face and care about require knowledge of both languages since they live in a world where English is the prestige language and Spanish is a widely spoken language and the language of their families, friends and ancestors. Their lives are not lived “wholly” in one language or another.
The choice as to what words and grammar and sayings are uttered in one language or another is made no different than how monolinguals make choices. The difference, as stated before, is that people who are acculturated into more than one linguistic tradition can, when in the presence of similarly acculturated people, make use of all the culture, all the symbolism, or language held in common.
This same phenomenon is why English is so full of sailing terminology today. English speakers were part of a culture where sailing was a prominent aspect of their lives. Great empires and great fortunes were made by means of sailing. “Try a new tack”, “feeling under the weather”, “give leeway”, “hand over fist” are all examples of sailing-specific language used metaphorically in English today. Instead of a difference in “language” the source of “switching” is related to a way of life with its own vocabulary and grammar.
When people come together and share ideas, they just use the language, phrases, and metaphors they have in common. The fact that some metaphors come from a linguistic tradition that is so radically different from yours that you call it “a foreign language” is purely an accident of history like everything else is. Consider that the term “give some leeway” would be meaningless to you if you had not inherited an English that had been washed in sailing culture for 400 years.
How Code-Switching Helped Create English
By the late 14th century, the Norman nobility in London had been intermarrying with local pre-conquest nobility and as a result had begun to speak some English (Middle English), the language of Chaucer. As the Anglo-Normans gained power French remained the prestige language. But as the years wore on, the value of Anglo-Saxon gained increased and they code-switched with the fancy French and Latin, but used the structure of their “native” tongue.
The plebeians probably spoke a less Frenchy form of Middle English but whatever they spoke was not fit to print. Only the language of the fancy people was immortalized on sheep’s skins—parchment. That simplified Anglo-Saxon, full of French terms, was the prestige language of the capital. The farther you got from the throne, the less Frenchy was the Anglo-Saxon you heard. The power of this particular code-switched language became cemented when John Wycliffe used it to translate the Bible.
Do realize that to the average peasant, the fancy code-switched language in Wycliffe’s bible was as odd as the English-filled Hinglish of fancy Prithviraj Road sounds to farmers in Chhattisgarhi. Wycliffe could have chosen a “purer” form of Anglo-Saxon still spoken all over the island, or Gaelic, but instead he chose the vernacular of the (rich) people who commissioned the work. For example, in Wycliff’s bible the word the old Anglo-Saxon costnung was replaced with the Frenchy “temptation.”
This form of “English” became the prestige language in the British Isles whether or not you spoke Gaelic, Scots, French or anything else. In the 19th and 20th centuries, when the central government ponied up for public education, the language taught was this odd but highly codified mix of languages, not the ancient (and long-forgotten) Anglo-Saxon or foreign French (still a prestige language taught to the rich). What may have begun as code-switching in the market and the English court ended up as a new, highly codified language of a nation state.
Code-Switching: Normal for Humans, Hard for AI
Code-switching is a communicative behavior in which two or more speakers are able to communicate more fully by using words, set phrases, grammar, and other cultural elements from different languages. In many code-switching situations, speakers have at their disposal something like 2x the synonyms and grammatical structures than do the speakers of any of the languages used in code-switching which means that there is choice in what gets said and how. Depending on unpredictable sociological factors, code-switching can lead to language change, even to new languages. Most of the time, code-switching is simply a more complete way to share ideas with others when you have more than one language in common.
The notion that all participants need to be well versed in the languages/cultures involved in code-switching is important as it has serious implications for AI. Back in 1100, you could not have understood fancy code-switched Anglo-Saxon/Old French unless you knew both languages. Likewise, today, to create automatic speech recognition for Spanglish or Hinglish, it is not enough to “mix” two “pure” datasets and train a model. This suggests that somehow code-switched speech is more than the sum of the component languages.
As a result, you have to treat code-switched speech like its own language or dialect. If you have read this essay you won’t find this assertion to be very surprising. In our next article we will look at some data science-based approaches to drawing a line between code-switching and other forms of language switching.
* It is important to note that of the dozens of languages that were spoken in what today is California in 1700, only a few have survived to the present day as spoken tongues. According to Wikipedia, the native California tongue with the most speakers is Yurok with around 400 native speakers alive today. The military invasions of the 18th and 19th century brought new diseases and a new political order to the western part of North America. The social value of knowing Spanish, then English increased greatly at the expense of the languages that had been spoken in the region for hundreds of years prior to the invasions.
If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.
More with these tags:
Share your feedback
Was this article useful or interesting to you?
We appreciate your response.
Benchmarking OpenAI Whisper for non-English ASR
- Dan Shafer
What is Code-Switching? And How Did it Make English?
- Morris Gevirtz
Text Cleaning for ASR: The Case of Turkish
- Morris Gevirtz
- Duygu Altinok
- Chris Doty
Whats the Difference Between a Language and a Dialect?
- Chris Doty