Glossary

Hallucination (in LLMs)

Discover what causes hallucinations in Large Language Models (LLMs) and explore effective strategies to mitigate inaccuracies in AI-generated content.

In the context of Large Language Models (LLMs), a hallucination refers to a phenomenon where the model generates text that is confident and plausible-sounding but is factually incorrect, nonsensical, or not grounded in the provided source data. These models, designed for advanced text generation, can sometimes invent facts, sources, or details, presenting them as if they were true. This happens because an LLM's primary objective is to predict the next word in a sequence to form coherent sentences, not to verify the truthfulness of the information it generates. Understanding and mitigating hallucinations is a central challenge in making Generative AI more reliable.

Why Do LLMs Hallucinate?

Hallucinations are not intentional deceptions but are byproducts of how LLMs are built and trained. The main causes include:

Training Data Imperfections: Models like GPT-3 and GPT-4 learn from immense volumes of text from the internet, which inevitably contain errors, outdated information, and algorithmic bias. The model learns these patterns from its training data without an inherent understanding of truth.
Architectural Design: The underlying Transformer architecture is optimized for pattern matching and language modeling, not for factual recall or logical reasoning. This can lead to what some researchers call a "stochastic parrot," an entity that can mimic language without understanding its meaning.
Inference-Time Ambiguity: During generation, if the model is uncertain about the next best token, it may "fill in the gaps" with plausible but fabricated information. Adjusting inference parameters like temperature can sometimes reduce this, but it remains a core challenge. For a technical overview, see this survey on LLM hallucinations from arXiv.

Real-World Examples of Hallucination

Legal Research: A lawyer using an AI assistant for case research asked it to find legal precedents. The chatbot cited several completely fabricated court cases, including case names and legal analyses, which were plausible but non-existent. This real-world incident highlighted the serious risks of deploying LLMs in high-stakes fields without robust fact-checking.
Product Recommendations: A user asks a chatbot for the "best hiking backpack with a built-in solar panel." The LLM might confidently recommend a specific model, describing its features in detail, even if that particular product or feature combination does not exist. The model combines concepts from its training data to create a plausible but fictional product.

How to Reduce Hallucinations

Researchers and developers are actively working on several mitigation strategies:

Retrieval-Augmented Generation (RAG): This technique equips an LLM with the ability to retrieve information from an external, authoritative knowledge base (like a vector database) before generating an answer. By grounding the model in verifiable facts, RAG significantly reduces fabrications. You can learn more about how RAG works from IBM Research.
Better Prompting Techniques: Methods like Chain-of-Thought prompting encourage the model to break down its reasoning step-by-step, which can lead to more accurate outputs. The original research paper on Chain-of-Thought shows its effectiveness.
Reinforcement Learning from Human Feedback (RLHF): Models are refined using RLHF, where human reviewers rate different model responses. This feedback trains the model to prefer truthful and helpful answers, a process detailed by organizations like OpenAI and Anthropic.
Fact-Checking and Verification Layers: Implementing a separate process to cross-check the claims made by an LLM against trusted sources before presenting the output to the user. This adds a layer of responsible AI development.
High-Quality Datasets and Fine-Tuning: Continuously improving the quality of data used for training and performing fine-tuning on specific, high-quality datasets can help align a foundation model with factual accuracy.

Hallucination vs. Other AI Errors

Bias in AI: Bias in AI refers to systematic errors where a model's outputs unfairly favor certain groups, usually reflecting societal or dataset biases. Hallucination is about factual incorrectness, not necessarily prejudice. Both are serious concerns in AI ethics.
Computer Vision Errors: The concept of hallucination is primarily associated with Natural Language Processing (NLP). In Computer Vision (CV), an error typically means a model like Ultralytics YOLO makes a mistake in object detection (e.g., misclassifying a cat as a dog) or fails to detect an object, which relates to its accuracy. This is an error of perception, not an invention of information. However, as multi-modal models that merge vision and language become more common, they can also "hallucinate" incorrect descriptions of images. Managing both types of models can be streamlined on platforms like Ultralytics HUB.

Hallucination (in LLMs)

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

Train YOLO models simply with Ultralytics HUB

Why Do LLMs Hallucinate?

Real-World Examples of Hallucination

How to Reduce Hallucinations

Hallucination vs. Other AI Errors

Read more in this category

Manufacturing ERP Guide

Manufacturing execution system (MES): AI-driven production

Understanding additive manufacturing: Technology & use cases

Join the Ultralytics community