Resources Article AI Hallucinations: Bad or Misunderstood?

AI Hallucinations: Bad or Misunderstood?

Ben Luks

Published on 05/08/23Updated on 10/11/23

Table of Contents

What is Hallucination in AI?Hallucination: Bug or Feature?Anthropomorphizing Model (Mis-)Behavior Hallucinations: a Double-Edged Sword?Select Bibliography Measuring and mitigating hallucinations in LLMs Applications of hallucinations Hallucinations are scary, sometimes

Share this guide

There has been much excitement surrounding OpenAI’s new AI-powered chatbot, ChatGPT. This follows in a now-years-long trend of releasing increasingly large and capable Large Language Models (LLMs). These models’ outputs have grown increasingly responsive, nuanced, and arguably eerily human-like.

The apparent lucidity of these models, for all the praise they’ve garnered, have raised concerns with regards to safety and ethics. Among these concerns are LLMs’ tendency to output false, yet plausible and coherent information. This phenomenon, dubbed hallucination, has been brought up with regards to the trustworthiness of LLMs, and how prepared we are, as a society, to invite their use into our day-to-day lives.

What is Hallucination in AI?

Strictly speaking, the term hallucination in the field of Natural Language Generation (NLG) is used to denote the relaying of untruthful information on a so-called “closed-domain” task. That is, a task whose information is of limited and pre-determined scope. You can think of this as summarizations and question-answerings that include fallacies. This could be relaying current events, or a movie summary. In either of these cases, the model is expected to draw from a limited scope of source material when making factual claims. In this case, the “closed-domain” is the background information from which the model is supposed to derive its summarization or response.

The term "hallucination," used more broadly in machine learning, doesn’t appear to be so universally ominous. As far as I can tell, the term saw earlier use in the field of computer vision, specifically robotics. This 2010 article on multi-modal object recognition is the earliest use of the term that I can find. The term is applied to the practice of using unlabeled modality pairs to, essentially, “invent” labeled data.

It works as follows: You’re given a bunch of labeled data in one modality, for example black-and-white webcam images containing an object, say a chair or a printer, and a corresponding object label, in this case, “CHAIR”, or “PRINTER”. You’re also given unlabeled data in another domain. For this example, we’ll say high-resolution color images correspond to unlabeled black-and-white images. Suffice to say that these high resolution images could certainly be of use to an object classifier, but without labels, they can’t just be plugged into the model. That’s where hallucination comes into play: the model uses probabilistic methods to “hallucinate” a high-resolution color image corresponding to each labeled black-and-white image based on the black-and-white-to-color relationship learned from the unlabeled data. The function that generates the made-up hi-res color image is what’s causing the hallucination. The term is used in computer vision to denote similar techniques; we’ll call them “data-extension” techniques.

In automatic speech recognition and related transcription tasks, hallucinations can sometimes result in humorous misinterpretations of ground-truth data. While usually harmless, they, like all hallucination events, have the potential to cause errors and even, sometimes, offend.

It should be noted that, in real-world applications where accuracy is important (or in some cases, mission-critical), hallucinations can come with serious consequences. However, outside of certain high-stakes production environments, are hallucinations all that bad? Might this phenomenon point toward AI that’s not merely generative, but legitimately creative?

Hallucination: Bug or Feature?

Machine learning is built around generalization. It’s a prime example of “the simplest solution is probably the right one.” The matter of not-quite-true-but-still-completely-plausible is more useful, and frankly more acceptable in some domains than others. So, when it comes to text summarization, distinguishing between generalizing about matters of text structure, syntax, and delivery, but not the falsifiable content is just too fine a distinction to ask of the non-human models.

These LLMs have been trained to predict new text based on lots and lots of text. The base models, trained on no specific task, and applications based on them (such as ChatGPT), trained on seemingly every task, have a pretty open-ended learning regimen by design. It’s what makes them so versatile. We trust that between the breadth of training data and the nuance afforded by the models’ so-many billion parameters, grasping the restriction that summarization outputs be factual is a given. And, for the most part, it works. Until it doesn’t.

The authors of InstructGPT, sibling model to ChatGPT, belabor the point of avoiding hallucinations, one of many goals of striving for so-called "alignment," or the model’s ability to honor the user-input prompt. In spite of the amount of work they put in in pursuit of this goal, implementing specific safeguards to filter all possible untruthful outputs would greatly reduce diversity of outputs and versatility of the model in general.

Anthropomorphizing Model (Mis-)Behavior

I can’t help but call attention to the intensity of the choice of metaphor. Hallucination is barely a stone’s throw away from psychosis. It conjures the idea of involuntary, uncontrollable, and possibly violent behavior. It makes me wonder if we’re getting in the habit of using it to the point of disintegrating any specific meaning. Part of me thinks it’ll quickly become the catch-all for fear mongering about unpredictable AI behavior, alienated from its industry-specific meaning. Think “GMO” used in discourse about food sources and agriculture.

Machine learning is, at its core, an optimization problem. That’s frightening to an audience that's looking for a human-to-AI workflow, unmediated by human intervention. But AI isn’t ready for that. The role of AI, as anyone who's been cutting corners on their articles can tell you, is to help us converge towards a solution. For all that a hallucinated output lacks in veracity, it’s immensely informative about the form and delivery.

Have you considered asking yourself why these fallacies are so plausible? Sure, it got the figures wrong in a generated quarterly report about Tesla's earnings. Those are easily verifiable facts. On the other hand, try looking for equally succinct and personalized instructions on writing a quarterly report, demanding only that you do a bare-minimum fact-check. The GPT-4 System Card lists overreliance as a safety concern; part of me wonders if they’re talking down to us.

Hallucinations: a Double-Edged Sword?

We’ve learned that, specifically in the context of large language models, there’s no firewall that separates prompt-faithful content outputs from open-ended wildcards. We also learned that expecting hand-off fact checking may be a naive misuse of these models altogether. Rather than framing this topic from the perspective of things you can’t do with machine learning, I thought I’d leave you with some examples of strides that are being made because of them.

Lung tumor segmentation: Segmenting lung tumors in CT scans can be considerably trickier in situations where the tumor is close to soft tissue. MRI scans, on the other hand, contain soft-tissue contrast information that could help mark these distinctions. Hallucinations are used to generate MRIs to supplement CT scans, with a significant performance increase compared to models trained on CT scans alone.
Learning Optimal Paths in Autonomous Navigation: Autonomous navigation (think robotics, self-driving) classically requires loads of labeled data and expert intervention. This method has a navigation system hallucinate obstacles, using these imaginary obstacles as supplementary data which the system can learn to avoid, making it more attuned to recognizing and mitigating real-life obstacles. The authors claim that this approach reduces risk and adapts well to novel environments.