London, 1971. A Hungarian man enters a tobacco shop, looking to buy cigarettes and matches. The man doesn’t speak English, so he uses an English–Hungarian phrasebook. Unfortunately, for some reason, the English translations in the phrasebook are wrong. The customer tries to ask for matches, but instead says nonsensical phrases like “My hovercraft is full of eels” and “My nipples explode with delight.” The frustrated salesperson tries to understand the customer, but things go south when he looks up the Hungarian translation for the price in English. When he voices what is translated into an offensive Hungarian sentence, this earns him a punch in the face from the Hungarian customer.
This is the plot of the Dirty Hungarian Phrasebook sketch from Monty Python, and it ends with the customer’s arrest by the police. Although the sketch is grossly exaggerated – and even though physical dictionaries may be outdated – this scene remains my favorite example of the risks of communicating in a foreign language using an unreliable source of translation.
Fast forward to today, when automatic translation services such as Google Translate and DeepL have led to physical dictionaries becoming, for the most part, obsolete. Yet while such services do a pretty good job in translating between languages, it is generally considered a bad idea to rely on them unconditionally for writing a text in a foreign language you don’t understand.
True, it is unlikely that innocuous sentences will be translated to completely unrelated ones, such as Monty Python’s “Drop your panties, Sir William” (which is another phrase in the dirty Hungarian phrasebook). Nevertheless, evidence suggests that there are countless scenarios of automatic translation gone wrong, even some that similarly end with an arrest. In 2017, a Palestinian construction worker who was working in Israel posted to Facebook a photo of himself leaning against a bulldozer along with the text “يصبحهم.” This phrase, pronounced “ysabechhum,” literally means something like “may God bless them” and was meant as a good morning greeting. Unfortunately, Facebook’s automatic translation translated it as “attack them” in Hebrew and “hurt them” in English, leading to the man’s arrest by Israeli police. The source of confusion could have been the similarity to the phrase “يذبحهم,” pronounced “ydbachhum,” which translates to “slaughter them.” As I’m writing this in 2024, I turned to the website Reverso to understand how the phrase “ysabechhum” is used in Arabic, and it was incorrectly flagged as inappropriate context – likely for the same reason.
Most translation mishaps are less disturbing – and can be funny. A Romanian relative who doesn’t speak English once wished us a “happy birthday” on December 31 (coincidentally, his own birthday), because the generic Romanian greeting “La multi ani” (literally “many years”) is used for wishing both “happy birthday” and “happy new year.”
There are many examples on the web where automatic translation of restaurant menus results in hilarious descriptions of dishes. With a quick search, I came across a Chinese menu with an item whose description had been translated to “Fuck the duck until exploded” (please don’t) and a viral Twitter post from an American visitor in a hotel in Saudi Arabia asking for help in deciphering a menu with cryptic English translations, including “She is suspicious of cheese; Not a problem; A period of cream.”
Automatic translation systems have improved immensely in the last several years, but they don’t perform perfectly on every pair of languages and for every type of text. Specifically, they are likely to translate full sentences more accurately than short descriptions such as menu items. While such translation fails should be reason enough to be cautious with automatic translation, let me explain how these systems work – and what their limitations are.
Machine Translation: Is It Rendering Language Learning Obsolete?
Automatic translation, also known as machine translation, started in the 1950s. During the Cold War, IBM developed a system that could translate Russian texts to English. In the early days of machine translation, each pair of languages required human translators to develop lexicons and grammar rules and programmers to code these into the software.
The next generation of automatic translation, in the early 2000s, eliminated the need to rely on human translators to develop the software. Instead, these systems relied on parallel texts in the source and target languages, such as book translations – leveraging the existing labor of human translators. Parallel text sources became the only requirement for developing a translation system for a new pair of languages, so hiring someone proficient in both languages was no longer necessary. For example, translating from Hungarian to English required what linguists call a corpus – a language resource consisting of a large and unstructured set of texts – in Hungarian and its translations in English.
A basic algorithm, which can be applied to any pair of languages, would go through Hungarian sentences and their human-translated English equivalents. For a given pair of sentences, the algorithm then aligns Hungarian phrases with their English counterparts. It counts how many times each phrase in Hungarian is translated into each English phrase throughout the entire text. A phrase in Hungarian might have several translation options to English, and each option is scored according to the number of times the phrases appear in parallel.
Differences in translation of certain phrases can be the result of ambiguity in the source language. For example, the Hungarian word kormány means both “government” and “steering wheel” depending on the context. They can also happen due to lexical variability in the target language; for instance, the Hungarian word különböző might be translated in English to any of the synonyms of “different,” such as “diverse” or “distinct.”
With each sentence in Hungarian, the system would go through the phrase translation table and come up with multiple English sentence options based on the various English translations of each Hungarian phrase. It would then choose the best translation according to two criteria: faithfulness and fluency.
First, the translation should be as faithful as possible to the original Hungarian sentence. This may be achieved by translating each Hungarian phrase to what is designated as the most frequently used corresponding English phrase in the training corpus. However, this may result in an ungrammatical or nonsensical English sentence. For example, if “government” is a more frequent translation of kormány than “steering wheel,” the system might incorrectly translate the Hungarian equivalent of “Where is the steering wheel in this car?” to “Where is the government in this car?” in English.
To balance this criterion, the system also optimizes fluency in the target language. Fluency is measured with an English language model that estimates the probability of producing a given sentence in English. A simple and familiar illustration of a language model is auto-complete on your phone. You type a sentence in English and the phone suggests the most likely next word. A language model may be used to compute the probability of a sentence in English by computing the product of probabilities of each phrase given the beginning of the sentence. Language models capture interesting language phenomena. At the very basic level, a grammatically correct sentence such as “he eats pizza” would yield a higher score than the grammatically incorrect sentence “he eat pizza.” Language models even capture some logic, such as scoring “it’s raining outside and the ground is wet” higher than “it’s raining outside and the ground is dry,” and cultural norms, such as scoring “good Italian food” higher than “good British food.”
Another major advancement in automatic translation happened in 2016 with the switch to neural network-based methods. An artificial neural network, which we will refer to throughout the rest of the book simply as a “neural network,” is a method in artificial intelligence that learns from examples to recognize patterns in the data and make predictions accordingly. This approach is inspired by real neural networks in the human brain, which consist of connected neurons.
An artificial neural network gets an input, which is represented as an array of numbers. The input goes through layers of interconnected (artificial) neurons that transform these numbers by multiplying them with the “weight,” a number associated with each neuron. If the output of any individual node is above the specified “bias,” another number which serves as the threshold value for each neuron, the node sends data to the next layer of the network. The last layer is the output layer, containing the output or the prediction of the network.
To give a concrete example, one can train a neural network to recognize whether a certain email is spam or not. The input in this case would be an array of numbers representing the email. For example, we can count how many times each word in the English language appeared in the email. You can guess that certain words such as “win” and “free” would tend to appear more in spam emails, so predicting whether an email is spam or not based on the words inside the email is a reasonable thing to do. The output of this network would be a single number indicating how likely the email is to be spam.
The appealing property of neural networks is that they can learn various functions from examples – predicting if an email is spam or not, recognizing objects in an image, predicting whether a patient has a certain illness, and so on. All they require is data, in the form of inputs and their corresponding outputs. To be able to use a neural network for performing a specific task, it first needs to be trained. During training, the network observes training inputs and expected outputs and calibrates its weights and biases to correctly predict the expected outputs. Once trained to perform a specific task, the network can be applied to new inputs to predict an output.
To go back to translation, neural translation systems, like the previously described statistical translation systems, also rely on parallel text resources. However, instead of using the same algorithm for all pairs of languages, the system learns from the data a custom translation function for each pair of languages. The system’s architecture is based on two neural networks. The first network, the encoder, gets a Hungarian sentence and encodes it into a vector – an array of numbers which captures the sentence’s meaning but is indecipherable by people. The second network, the decoder, receives this vector that conveys the meaning of the Hungarian sentence and turns it into English, word by word. To add a new pair of languages, all the programmers need to do is train the network on a parallel body of text. With enough parallel texts to train the network, it can become an optimal translator with the ability to perform well on unseen sentences.
The release of Google Translate’s neural models in 2016 led to significant performance improvements: a “60 percent reduction in translation errors on several popular language pairs” [Reference Wu, Schuster and Chen1]. The language pairs were English to Spanish, English to French, English to Chinese, Spanish to English, French to English, and Chinese to English. All these languages are considered high-resource languages, or in simple terms, languages for which there are massive volumes of texts available, for example, from book translations and Wikipedias.
Much more challenging are low-resource languages, that is, languages for which there is not enough text available on the web. Neural networks are data-hungry and training them with a small amount of data isn’t likely to result in an optimal solution.
In 2018, popular media expressed worries about Google Translate spitting out some religious nonsense, completely unrelated to the source text. At the time, I demonstrated this phenomenon for Igbo, a low-resource African language spoken primarily in southeastern Nigeria, as the source language. I am not an Igbo speaker, and I wrote what is clearly not an Igbo sentence: “i i i i i i i i […]” – seventy-six i’s separated by spaces. A human translator presented with such an input would respond along the lines of “I’m sorry, this is not a valid Igbo sentence.” Ideally, a translation system should do the same. However, Google Translate instead presented me with the following English text: “As it is written in the book of the law of Moses, which was in the wilderness, which was before the man who did the work of the kingdom of Israel.” A slightly different gibberish input in Igbo was translated into the question: “Who has been using these technologies for a long time?” Hopefully not the Igbo speakers looking to translate their words into English.
It is not a coincidence that automatic translation systems often translate phrases from low-resource languages into unrelated religious texts in English. After all, they are trained on pairs of sentences, such as a source sentence in Igbo and a target sentence in English. What they are not trained to do is recognize inputs that are not valid Igbo sentences. It would be much more useful if the translator could spot such cases – and respond with something like “I honestly have no idea what you want from me.” Instead, the translator always assumes the input is valid. Even when it is given unrecognizable and nonsensical inputs, such as “i i i i i i i i […],” it still tries to provide a fluent translation – and ends up “hallucinating” sentences.
Why religious texts? Since religious texts like the Bible and the Qur’an exist in many languages, they probably make up a large portion of the available training data for translations to or from low-resource languages.
One solution for improving translation accuracy for low-resource language pairs is to go through a third language. For example, the training data between Hungarian and Igbo may be too scarce to result in a reasonably performing translation model. However, there is plenty of accurate training data for Hungarian–English translations, and just enough for English–Igbo training. So instead of aiming for a direct Hungarian–Igbo translation, the translator would first translate from Hungarian to English and then from English to Igbo.
While this is a reasonable solution, it increases the risk for meanings getting lost in translation. As a student, I had an assignment in machine translation class in which I implemented a “bad translator.” The bad translator receives an English text and translates it back to English through a chain of random languages, such as English to Czech to Swahili to Arabic to Hindu to English. Due to propagating errors, the output is sometimes nonsensical or completely different from the input. This is what I got by inputting some of the ten commandments:
“Thou shalt not kill” was translated to “You must remove.”
“Thou shalt not make unto thee any graven image” to “You can move the portrait.”
“Thou shalt not commit adultery” to “Because you’re here, try three.”
And “Thou shalt not steal” to “Woman.”Footnote 1
In sum, automatic translation tools like Google Translate have improved immensely in recent years. Although they are very useful, they don’t work equally well for every pair of languages and every genre and topic. Blindly relying on automatic translation can cause embarrassment and misunderstanding. For this reason, automatic translation doesn’t yet make second language acquisition obsolete.
Thinking in Your Native Language Makes You Sound Foreign
Mastering a second language means being able to think in that language rather than translating your thoughts from your native language. The language of our thoughts affects our word choice and grammatical constructions, so going through another language might result in incorrect or unnatural sentences.
Let me give you an example from my native language: Hebrew. I read in an online article the imperative phrase “Do sports and eat balanced.” While this is understandable, it doesn’t sound right in English. Given that the author had an Israeli name, I could easily reverse-engineer the English sentence and reconstruct their Hebrew thoughts.
First, the sentence was missing a noun. It should have read “eat a balanced diet.” In Hebrew, omitting the noun is common practice, and the word “diet” is implied. In English, an adjective such as “balanced” can only modify a noun. If “balanced” was meant to modify the verb “eat,” it should have been an adverb, but I don’t think that “balancedly” is a word. Second, the word choice is odd because in English the word “sports” typically refers to competitive sports. The author likely meant “fitness” or “exercise,” which also translates to “sport” in Hebrew. Finally, starting a sentence with the word “do” might prompt the reader to look for a question, as in “Do sports and eat balanced …?” This would be less confusing in speech, when the speaker can emphasize different words in the sentence to convey the intended grammatical role of “do.” An emphasis on “do” implies an imperative, whereas an emphasis on “sports” implies a question. Either way, a literal translation of this weird sentence back to Hebrew reads perfectly normally.
While I could be smug about noticing other people’s errors, my English is not unaffected by Hebrew. I once used the phrase “private case” instead of “special case” in an academic paper, literally translating the corresponding Hebrew phrase. It went unnoticed by my coauthors, one of whom was a native English speaker, but was later pointed out by another Israeli researcher.
Not Every Concept Is Translatable
English is a very rich language. The number of English words currently in use, based on the number of entries in the Merriam-Webster dictionary, is close to half a million. This is an order of magnitude higher than the number of words in Hebrew. In 2010, the Academy of the Hebrew Language estimated that there are 45,000 words in Hebrew. It is therefore no wonder I often find myself amazed by the specificity of English words.
Before I demonstrate this point, let us take a detour and discuss the issue of counting how many words exist in a certain language. This is not a trivial exercise for several reasons. First, words can be borrowed from other languages. English borrowed many words from French. If someone tried to estimate the number of English words in the early seventeenth century, they wouldn’t consider the French word “fatigue” as one of them. But at a certain point in time, this word became part of the English language.
Second, the dictionary typically only lists the base form of a word, which is in singular form for nouns (e.g. dog but not dogs) and in the root form for verbs (e.g. run but not running). Not only is it hard to count exactly how many words a certain language has, but we might underestimate the number of words in some languages more than in others. This is because some languages are morphologically richer than others. Morphology is an area in linguistics that studies how words are formed from smaller units such as stems, suffixes, and prefixes. Morphologically rich languages add suffixes and prefixes to words to mark plurality (plural or singular), grammatical gender (masculine or feminine), tenses in verbs (past, present, future), and more. Hebrew is morphologically rich, and conversely, English is morphologically poor in comparison. For example, however you conjugate the verb walk in English you will end up with one of the following four variants: walk, walks, walked, and walking. In comparison, the equivalent Hebrew verb has twenty-one variants. It’s likely, therefore, that the number of Hebrew words is substantially higher than the number of dictionary entries, slightly reducing the gap from English.
Third, there are other borderline cases. Should we count noun compounds like “avocado oil”? What about “ice cream”? While one can understand the meaning of “avocado oil” as “oil made of avocado” based on familiarity with the meanings of “avocado” and “oil,” the meaning of “ice cream” goes beyond the meaning of “ice” and “cream” and merits its own dictionary entry.
Finally, should we double count dictionary entries with multiple senses, such as “band”? Counting entries in the dictionary is therefore nothing but a reasonable proxy for the actual number of words in the language.
Interestingly, the 1989 Oxford dictionary estimated the number of words in English as the significantly smaller number of 171,476. With the caveats of equating the number of dictionary entries to the number of words in a language – nobody really thinks that the number of English words has tripled since the 1980s – this also signifies the evolution of language. Living languages keep evolving and new words are formed all the time.
The Linguistic Society of America (LSA) holds a “word of the year” contest every year. Some interesting words that came top in my lifetime include web (1995), Y2K (1999), 9–11 (2001), weapons of mass destruction (2002), metrosexual (2003), red/blue/purple state (2004), subprime (2007), tweet (2009), app (2010), hashtag (2012), fake news (2017), pronouns in the context of gender identity (2019), and covid (2020). It’s interesting to note that many of these are not even “words” in the traditional sense but rather acronyms, dates, and social concepts.
It is therefore expected that languages with a larger number of speakers will evolve and add new words more frequently than languages with fewer speakers. English has around 1.35 billion speakers worldwide, less than a third of them native speakers. It is followed by Mandarin Chinese (1.12 billion) and Hindi (600 million), Spanish (543 million), and Arabic (274 million) [2].
Personally, after decades of learning English, I’m still constantly amazed by the richness of the English vocabulary. The more I master English, the more domains I discover that have many distinct words describing nuances of the same Hebrew word. I remember a sign at the entrance to the US embassy in Tel Aviv that said that you may walk in with a wallet but you have to store your purse. I didn’t even know that “wallet” and “purse” referred to different things and was using these words interchangeably up to that point.
Indeed, taking different words for bags as an example demonstrates how specific English can get. Searching online, I found forty-six such words in English (e.g. backpack, luggage, suitcase) excluding multiword expressions like “vanity case.” Translating these words to Hebrew yielded forty-three unique terms. Many of the Hebrew translations were incorrect. For example, carry-on was translated as תמשיך הלאה = the imperative “carry on.” Some others translated to a multiword expression, for example תיק יד = hand bag. After removing incorrect translations and multiword expressions, I remained with twelve valid Hebrew words for bags, around a quarter of the number of English words.
Occasionally, the fact that your native language translates various different concepts to the same word may feel embarrassing. A Hebrew-speaking relative once told the nurse he wanted to get his flu shot in his left hand. “Hand” and “arm” are both translated in Hebrew to יד (“yad”), and the interpretation of “yad” as “hand” or “arm” depends on the context. Similarly, “leg” and “foot” are both translated to רגל (“regel”).
I sometimes find myself using a general English term in place of a more specific one that I don’t know. Using a general word instead of a specific one, which might be more suitable in the context, is typical for low-proficiency English learners of various native languages [Reference Crossley and McNamara3]. They make frequent use of general-purpose verbs, such as “do” and “make” [Reference Harley and King4]. When they need to refer to nouns, it’s easiest to default to a general noun such as “thing” or “stuff.” My Croatian friend Ana once said that when she lacks an English noun or can’t quickly retrieve it, she defaults to “shit.” So she might sit at the dinner table with her friends and ask someone to pass her “the shit.”
I assume that this is not a universal EFL experience as many languages have a richer vocabulary than English. Using the number of dictionary entries as a proxy for the number of words in a given language, we find that English surprisingly comes up only seventh after Korean (1,100,373 words), Portuguese (818,000), Finnish (800,000), Kurdish (735,320), Swedish (600,000), and Icelandic (560,000) [5]. But even for native languages with a poorer vocabulary than English, such as Hebrew, equivalent words may be missing. This typically happens with cultural concepts not relevant for most English speakers. For translating such words, I use a similar strategy of replacing them with a more general term – and potentially elaborating. Sometimes, I simply talk around it. Hebrew has a word for “enjoy your new thing” that you’d say to someone who wears a new shirt or just got a haircut. In English, I’m either specific about the new thing (“nice haircut!”) or withholding my firgun (a Hebrew word for making someone feel good without any ulterior motives) altogether.
Is This What You Meant to Say? Malapropism, Mispronunciation, and Mondegreens
A different kind of problem is when you think you know a word but you confuse it with another word or pronounce it incorrectly. In the real world, outside Monty Python sketches, it’s less common to replace a word like “matches” with an unrelated word like “eels.” It’s far more common to replace it with a similarly sounding word or to mispronounce it so it sounds like a different word. For example, instead of “matches,” the Hungarian customer could have said “watches,” but that would have resulted in a less hilarious exchange, with the seller simply directing the customer to the nearby watch shop.
One type of error, called malapropism, is the use of an incorrect word in place of a word with a similar sound. A common example is “should of” instead of “should have.” In many cases, malapropisms result in nonsensical or humorous utterances, which are funnier than replacing “matches” with “watches.” In fact, there is an entire subreddit dedicated to funny malapropisms.
Commonly, relatively long or complicated words are replaced by similarly long or complicated words, as in “syntactic wigs” instead of “synthetic wigs.” Or take the famous “shoplifters will be prostituted” signs. Is it just me or is this a rather harsh punishment?
In many examples, people use words they have heard but have never seen written. As a result, they spell them like other words, as in “tattoo diabetes” instead of “type two diabetes,” “I can still smell his colon” instead of “cologne,” or “lemonade a paper” instead of “laminate a paper.”
I have also seen some examples in which a word has been replaced by a less commonly used one. One user wrote that she made “synonym rolls,” posting a mouthwatering picture of cinnamon rolls. Another user was selling a “shuffle” rather than a “shovel.”
Donald Trump has been quoted saying, “I hope they now go and take a look at the oranges. The oranges of the, uh, uh, investigation,” rather than “the origins.” This inspired me to add “oranges of the investigation” to my list of potential names for my imaginary future rock band.
Sometimes malapropisms can get you in trouble, such as when you confuse the words “condemn” and “condone.” This happened when the English Football League accidentally condoned instead of condemned a Birmingham City fan’s assault on Aston Villa captain Jack Grealish. It also happened to the fictional character Dev in the TV show Master of None, played by Aziz Ansari. Asked about allegations of sexual harassment perpetrated by his boss, he accidentally condoned instead of condemned such behavior.
Interestingly, many of the examples I found online came from native English speakers. Does that mean foreigners who utter an occasional wrong word will not be judged too harshly? Can we feel safer as long as we try not to speak too correctly?Footnote 2
Even when you choose the right word, you might still mispronounce it. English is not a phonetic language. Similarly spelled words can have completely different pronunciations, as in “though” (pronounced “thoe”) and “tough” (pronounced “tuff”). Conversely, different spellings may have the same pronunciation in some English dialects. For example, many English speakers pronounce “aunt” like “ant.” I made the opposite mistake when I rented a car in California and noticed there were ants in the car after leaving the car rental agency. I called to complain about the “aunts in my car.” I think the only reason the agent understood me was that I wasn’t the first client to complain about ants in the car.
Finally, even if you choose the right word and you know how it should be pronounced, a foreign accent may make it sound different. This may even result in uttering an undesirable word. Imagine that the Hungarian phrasebook contained valid translations, with the misunderstanding coming from the mispronunciation of the English terms. For example, he could’ve said the price, “six and six,” as “sex and sex,” resulting in a different kind of humor. One notorious vowel in English is the short i, as in the word “sit.” Hebrew, Italian, most dialects of French, and many other languages don’t have this vowel. For years, I had no idea that “sit” is pronounced differently from “seat.” For the same reason, I thought it might be better if I avoided using the words “beach” and “sheet.”
Just like mispronunciation, you could also mishear a word as another – or as a completely nonexistent word. Mondegreens are such new words created by mishearing, often of song lyrics. The term was coined by the American writer Sylvia Wright when, as a girl, she misheard the phrase “laid him on the green” in the Scottish ballad “The Bonny Earl of Murray” as “Lady Mondegreen” [Reference Wright6]. Only a couple of years ago, I realized that the Nirvana song was called “On a Plain” rather than “On a Plane,” and it’s still playing in my head sometimes when I’m on a plane about to take off. Old habits die hard. I should feel no shame since mishearing lyrics is a common phenomenon. The website Kiss This Guy, named after a mishearing of Jimi Hendrix’s “Kiss the Sky,” collects such funny anecdotes. As of September 2024, more than 35,000 submissions have been received. My favorite example is a mishearing of the line “making love with his ego” from David Bowie’s “Ziggy Stardust” as “making love with his eagle.”
Mishearing a word as another can happen to anyone, but how about perceiving it as a completely nonsensical expression? Kiss This Guy lists a mishearing of “these five words I swear to you” from Bon Jovi’s “I’ll Be There for You” as “Fee Fi Fo I swear to you.” There are many people who grew up outside of America but who were influenced by American culture. They listened to American pop or rock music but didn’t quite understand the lyrics due to their limited English vocabulary. They may have made up some nonwords that sounded like the lyrics – and remembered them. After improving their English skills, they may have come across these songs again and realized what the lyrics actually are. I’m guessing this is less of a problem now, when lyrics are much more accessible online, but this is surely something that ’80s and ’90s kids could identify with.
As a new learner, when you hear a sentence with multiple unfamiliar words, you might not even be able to correctly separate the words. In grade 7, it took me months to correctly split “I beg your pardon” into words. I was sure that my English teacher Anne, with her British accent, was saying “abega pardon.” Not that I had any idea what she meant.
Mondegreens are the opposite of knowing a word from books but not knowing how to pronounce it. The latter used to happen to me far more often before I switched to audiobooks. Now, I often discover additional words that have various pronunciations depending on the specific English dialect or region of the narrator.
The Right Word May Be Used in the Wrong Context
It’s tempting to think that English words have many synonyms when, in truth, most should be considered “near-synonyms.” They don’t mean exactly the same thing, and they are often used in different contexts. Take the difference between “hot” and “warm,” for example. One time, an American friend complained it was cold, so I asked him whether he had brought “hotter” clothes. The correct word in this context would have been “warmer.” He chuckled because it sounded like I was saying that his clothes were not chic or sexy enough. An alternative version of a “hot shirt” would be a shirt that’s been heated in the oven, as Cosmo Kramer did in “The Calzone” Seinfeld episode.
Many English synonyms exist because each word is common in a certain English dialect, such as “elevator” in American English versus “lift” in British English. Dialects are variations of the same language spoken in different regions or by people from different social backgrounds. Even within the same dialect, speakers may choose to use a word over its synonym based on the word’s social and historical associations [Reference Bailey, Martin-Jones, Blackledge and Creese7]. Take for example Republicans versus Democrats in the US. Discussing the same topic, Democrats may use the term “undocumented workers” while Republicans may prefer “illegal aliens.” Each of these terms comes with a social implication – what should be the policy to handle these individuals; and the choice of one term over the other signals that the speaker belongs to a certain social group.
Language is not merely a communication system where words straightforwardly map to objects and concepts in the world. When words are listed as synonyms in a thesaurus, we still need to choose the proper word to use in each context – depending on the region, situation, time, participants in the conversation, our own identities, and more.
Using a word in an unsuitable context doesn’t happen to me that often, at least to my knowledge, since I’ve always preferred to err on the side of caution. Depending on my confidence level, I might admit, “I don’t know how to say it in English.” If I’m not entirely certain I’m using the correct word, I would raise the pitch to signal, “I am not certain of what I’m saying.” Now, don’t get me wrong. I’m not encouraging you to be less confident when speaking a foreign language. As The Smiths sing: “Shyness can stop you from doing all the things in life you’d like to.” I admire people who are not afraid of making mistakes when they speak a language they’re not fluent in. This is the best way to learn and improve.
In my work, I’ve been reading a lot of reports and drafts written in English by other EFL speakers. In these writings, I often come across words used inappropriately in their context. I feel that this is not necessarily because the writers didn’t know how to express their thoughts in English. Rather, it is the result of their desire to sound smart. Instead of using a “simple” word they know, writers often search the web for more academically sounding synonyms. However, since English synonyms are almost never true synonyms, these authors often end up using a word that may have a similar meaning but doesn’t fit with the intended context.
I’ve found that the best way to minimize such errors is to conduct a Google search for a short sentence or a phrase and compare a number of results. For example, “wear warm clothes in the winter” versus “wear hot clothes in the winter.” This method is not 100 percent error-proof, because other people on the web make mistakes that you might inadvertently echo. Alternatively, some websites like Reverso offer a search of word usages and translation in natural contexts. If you are trying to sound smart by replacing “simple” words with their synonyms, take into account that you might achieve the opposite result if you use them incorrectly.
My PhD thesis was on English lexical semantics, a field concerning the meaning of words and their relationships with other words. My PhD was in computer science, so I was mostly interested in modeling meaning computationally. In simpler words, as I used to present my research to nonexperts, I was teaching computers that “cat is an animal.”
My research field, natural language processing, was immensely influenced by the distributional hypothesis, attributed to the British linguist John Rupert Firth and the American linguist Zellig Harris. According to the distributional hypothesis, the meaning of a word is not just its definition in the dictionary but rather the way it is commonly used – that is, by the (distribution of) words typically surrounding it. Or as Firth stated in a 1957 paper: “You shall know a word by the company it keeps” [Reference Firth and Firth8].
Thanks to the distributional hypothesis, English learners can understand new English words from the contexts in which they are written or spoken. For this reason, when I started reading English books, I didn’t have to reach for the dictionary for every second word. This is easily demonstrated with made-up words. Researchers at the University of Trento coined the word “wampimuk” to demonstrate this phenomenon. People who heard the sentence “we found a cute, hairy wampimuk sleeping behind the tree,” typically guessed that a wampimuk is an animal, and even that it is a mammal [Reference Lazaridou, Bruni and Baroni9].
Children do this too when they acquire their first language. They see things and actions and hear their parents describing these things and actions in words. Over time, they learn this mapping between concepts in the world and the words that describe them. This is called “grounding” and will be discussed in Chapter 6. A child encountering a brown dog and being told that it is a “brown dog” will ground the concept of “brown dog” with the brown dog in front of them. However, the child might still not have a clear idea of what a “dog” is and what “brown” is. If they see a different-colored dog and are told it’s a “black dog,” or when they encounter a different brown animal like a “brown cat,” they gather more evidence related to dogs and brown things [Reference Quine10]. They also learn to compose new phrases describing the color of an animal. Upon seeing a white cat for the first time, they will be able to describe it correctly. While this may sound very basic, it applies to how EFLs can approach unfamiliar compositional phrases, especially in the earlier learning stages.
I listen to audiobooks on my commute and, although this happens less frequently these days, I sometimes hear a new word. In most cases, I won’t bother looking it up. I will simply assume I understand the word based on its context. Due to my lack of familiarity with the word, I will not rush to use it on the first plausible occasion. And again, when I do, it will be spoken in a high-pitched manner to signal I’m not an idiot – only an EFL with limited confidence. Nevertheless, I’ve noticed that I have a certain subconscious threshold for the number of times I need to hear a word before I feel confident enough to use it. On occasion, I find myself uttering a word and I’m retrospectively surprised that I knew this word. Typically, I would then try to observe the listener’s reaction to determine whether I correctly cracked the code of that word’s meaning. Then again, North Americans might be too polite to correct me.
“Work English” versus Small Talk: Our English Proficiency Is Domain-Specific
We all have certain topics that are easier for us to discuss in English. For me, these are work-related topics. In tech companies and in academia, it is common to use English in written communication even in various non-English-speaking countries. In my alma mater, classes were given in Hebrew but the accompanying slides were in English. The result was that my professional English has improved significantly. So much so that, at some point, giving a presentation in Hebrew became more difficult than giving the same presentation in English. Talking about my work in Hebrew meant I had to translate the technical terms in my head. I wasn’t the only one facing this challenge. Most of my Israeli colleagues who were living abroad politely asked to give presentations in English when they were invited to speak in Israel. Indeed, research in educational linguistics backs up this finding. If your only exposure to work-related terms is in a foreign language, you might find it difficult to think and talk about these terms in your native language [Reference Cohen11].
At the same time, obtaining a decent level of proficiency in your work vocabulary might give you the illusion of a general proficiency. This illusion is bound to shatter as soon as you need to use the foreign language for anything other than work. For me, the first shock came at an international conference, when I realized there were many mundane English words missing from my vocabulary. Despite that challenge, I’ve made many good friends at international conferences, where engaging in small talk as well as deeper conversation about life helped to improve my day-to-day English.
When I moved to the US, I had to extend my vocabulary across multiple domains. I joined a gym and started taking fitness classes, which over time extended my knowledge about the words describing body parts. This was also useful when I visited a clinic and could more accurately describe where I was experiencing pain – although in retrospect, I realize that my pronunciation made it sound as if I had pain in my “heap.”
When I had to buy products whose names I didn’t know, I searched their descriptions online. Many search engines today implement a semantic search. That is, they retrieve not only search results containing the verbatim search query but also those with similar terms. Thanks to the semantic search in many shopping websites, I managed to find “stool under the desk” (footrest), “curtains that block the sun” (blackout curtains), and “shower holder for shampoo and soap” (shower caddy). Even something as mundane as getting a haircut had me searching ahead of time for the term for cutting just a little bit of hair (trim). Other immigrants I talked with shared horror stories about getting the wrong haircut because they lacked the vocabulary to explain what they wanted. Luckily, I like to be prepared.
After two years in the US, I had to enlist the assistance of search engines for shopping expeditions less frequently, and then I moved to Canada. The vocabulary I learned in the US largely transferred to my daily life in Canada, although some terms were still different … what on earth is a toque?