If you have ever asked a model a simple factual question and gotten a confident, wrong answer, you already know the pain. Ask for a researcher’s birthday, you might get three different dates. Ask how many Ds are in DEEPSEEK, you might get two, then three, then six. The obvious question is simple, why do LLMs hallucinate. The honest answer is not magic. It is statistics, training objectives, and the way we grade models. That is the core finding of the new OpenAI hallucination paper, and it lands with the clarity of a good code review. Fix the incentives, then the behavior changes.
Table of Contents
1. The One-Sentence Model Of The Problem

Here is the short version of why do LLMs hallucinate. During pretraining, a model learns to imitate fluent language, not to separate true from false. During post-training, we grade it with binary right or wrong tests that punish “I don’t know.” So when the model is unsure, the score-maximizing move is to guess. That is not a bug. That is an optimization target doing what we told it to do. The OpenAI hallucination paper shows both parts clearly, then argues for a socio-technical fix, change the way we score mainstream evaluations so that honesty wins over bluffing.
2. Pretraining, How Fluent Guessers Are Born
Pretraining teaches a model to fit the distribution of text. It is density estimation, not truth detection. That alone explains a big slice of llm hallucination. Some patterns, like spelling and parentheses, are abundant and regular. The model nails them. Other patterns, like one-off facts, are sparse and patternless. Think birthdays, obscure titles, or idiosyncratic numbers. You cannot generalize reliably from one sighting of a birthday to all future mentions of that person’s birthday. When the data has no signal, the model has to guess. That is what causes AI hallucinations at their root during pretraining.
2.1 Arbitrary Facts And The “Singleton” Trap
The paper formalizes a case many practitioners intuit. If a fact appears exactly once in the pretraining corpus, a fluent generator will still produce an answer, yet it lacks statistical footing to be right consistently. The authors connect generation to a simpler binary task, “Is this output valid.” If you cannot reliably classify validity, you cannot reliably generate only valid answers. This reduction ties llm hallucination examples like birthdays directly to learnability. If your data shows a long tail of singletons, your base model will hallucinate at least that often on those items. This is not a defect. It is a bound.
2.2 Poor Models And Tokenization Friction
Not every error is an arbitrary-fact error. Some are model-mismatch errors. Letter counting is a good example. If the tokenizer splits “DEEPSEEK” as D, EEP, SEEK, a non-reasoning model might miscount Ds. Reasoning-oriented systems that step through characters do better. This is a classic representation issue, not a mystery about why do LLMs hallucinate. Change the tool, and the error drops.
3. Post-Training, Why Tests Quietly Reward Guessing

Now we get to the uncomfortable part. Even if you fix pretraining with better retrieval, better tools, and cleaner data, post-training can put hallucinations back in play. The field’s favorite exams use binary scoring, accuracy or pass rate, where saying “IDK” earns zero, the same as a wrong answer. Under that system, a model that guesses when uncertain will beat a calibrated model that abstains when appropriate. This is the combinational heart of AI model overconfidence. We built leaderboards that reward overconfident bluffing. The models noticed.
3.1 The Multiple-Choice Logic Everyone Forgets
Imagine a multiple-choice question. If your chance of being right is 25 percent, guessing yields an expected score of 0.25. If abstention always scores 0, guessing dominates. That is the whole story behind one stubborn piece of why do LLMs hallucinate. We made the scoreboard, then we trained to it. The paper calls it an epidemic of penalizing uncertainty. The cure is to stop rewarding lucky guesses over honest uncertainty.
3.2 What Benchmarks Actually Do
Most widely used evaluations are binary. Many do not credit IDK. A few rubric-graded sets give partial credit, yet even there bluffing can slip through. If you have ever tuned for leaderboard accuracy, you have felt this pressure. It bleeds into prompts, policy, and system messages. That is why why do LLMs hallucinate keeps returning as a theme in production, even when offline metrics look solid. The incentives are off by a few degrees. The outcomes drift.
4. Table, Guessing Beats IDK Under Binary Grading
The scoring math is simple, and it explains why do LLMs hallucinate under today’s exams. The table below shows when guessing beats abstaining, given a confidence threshold.
Confidence Threshold t | Penalty For Wrong Answer | Expected Score If Confidence p | Better Strategy When p ≤ t |
---|---|---|---|
0.00, binary accuracy | 0 points deducted | Guess scores p, IDK scores 0 | Always guess |
0.50 | 1 point deducted | Guess scores p − 1 if wrong, IDK scores 0 | Abstain unless p > 0.5 |
0.75 | 2 points deducted | Guess scores p − 2 if wrong, IDK scores 0 | Abstain unless p > 0.75 |
0.90 | 9 points deducted | Guess scores p − 9 if wrong, IDK scores 0 | Abstain unless p > 0.90 |
When t sits at zero, which is standard accuracy, guessing is always rational. Set a clear threshold in the instructions, then bluffing stops being the dominant strategy. That small switch changes the gradient that models chase during alignment. This is a direct lever for how to reduce llm hallucinations.
5. Table, Do Benchmarks Credit Honesty
Here is a compact view of mainstream tests, paraphrased from the OpenAI hallucination paper. It shows why why do LLMs hallucinate is partly a scoreboard issue.
Benchmark | Primary Scoring | Binary Accuracy | IDK Credit |
---|---|---|---|
GPQA | Multiple-choice accuracy | Yes | None |
MMLU-Pro | Multiple-choice accuracy | Yes | None |
BBH | Multiple-choice or exact match | Yes | None |
SWE-bench | Patch passes unit tests | Yes | None |
WildBench | LM-graded rubric | No | Partial, rubric dependent |
As long as binary accuracy dominates the field, abstention is punished and guessing is rewarded. If you care about trustworthy AI, you need to care about how we grade.
6. What Causes AI Hallucinations, A Practical Map
Let’s put the causes in one place and keep it crisp.
6.1 Sparse Facts And Missing Signal

Some facts are essentially random at language scale, like low-frequency birthdays or one-off titles. When the signal is absent, a fluent model can only produce a plausible candidate. That creates llm hallucination even if the model is calibrated. This is a core ingredient in why do LLMs hallucinate.
6.2 Model Mismatch And Representation
Tokenization, context limits, and weak intermediate reasoning create errors that look like hallucinations. They are not metaphysical. They are engineering details. Change the representation or the reasoning path, and that subclass drops.
6.3 Garbage In, Garbage Out
Large corpora contain errors. Fluent imitation faithfully reproduces some of them. That yields another stream of llm hallucination examples. Cleaning helps, but it does not remove the incentives that post-training adds.
6.4 Distribution Shift
Prompts drift from training. Edge cases appear. Under pressure, a binary-graded model guesses. That amplifies AI model overconfidence when the context is unfamiliar.
7. How To Reduce LLM Hallucinations, An Actionable Playbook
You can act today, without waiting for new architectures.
7.1 Put Confidence Targets Into Prompts
Tell the model the evaluation rule. “Answer only if you’re above 75 percent confident, wrong answers cost two points, IDK gets zero.” This flips the payoff. It aligns the conversational policy with the metric. It is the most direct change suggested by the OpenAI hallucination paper. It belongs in the system prompt of critical workflows. It is also trivial to audit. Track accuracy and abstention at multiple thresholds. That gives you a picture of behavioral calibration. It also gives your users a safer interaction by default.
7.2 Penalize Confident Errors More Than Uncertainty
In your internal evals, score wrong answers beneath IDK. Do not hide this rule. State it. Models adapt to clear constraints faster than you think. This alone reduces a visible portion of why do LLMs hallucinate in production.
7.3 Separate Metrics, Accuracy, Error Rate, Abstention
Stop publishing a single accuracy number. Publish three. Accuracy on attempted items. Error rate, which is the hallucination rate. Abstention rate. This prevents a model from gaming a leaderboard with reckless attempts. It also helps your buyers judge trustworthy AI claims with more context.
7.4 Use Simple Tools, Then Layer Reasoning
For letters, counts, and transforms, use functions that operate at character level or use verified tools. For multi-step problems, require chain-of-thought internally, then return a short answer to the user. The point is not mysticism. It is mechanical sympathy. Design a path that makes wrong answers less likely. This trims a chunk of why do LLMs hallucinate that is really model mismatch.
7.5 Retrieval And Guardrails With Teeth
Ground answers with retrieval when the question is fact-heavy. Add a checker that rejects outputs without citations for high-risk domains. And do not let the checker grade with pure right or wrong if abstention is available. Give partial credit for “cannot verify,” then return a graceful fallback. This is how to reduce llm hallucinations while staying friendly to the user experience.
8. Why Do LLMs Hallucinate, The Cultural Fix We Keep Avoiding
The authors make a case that will likely age well. Do not invent more boutique hallucination leaderboards while the big leaderboards keep rewarding bluffing. Modify the heavy hitters first. Add confidence targets to them. Give models a clear instruction to abstain when confidence is below a threshold. Then grade accordingly. Once the mainstream metrics stop penalizing uncertainty, alignment teams will have air cover to reduce overconfident answers without losing rank. That is the quiet answer to why do LLMs hallucinate at scale. It is not only a model problem. It is a culture problem about how we measure progress.
9. A Grounded Way To Talk About Hallucinations
There is a lot of mythology around llm hallucination. This paper pulls it back to the ground. Hallucinations are not a mystical glitch. They are predictable statistical errors under the objectives and scoreboards we chose. Change the scoreboard, then the optimizer behaves differently. Keep the scoreboard as is, then why do LLMs hallucinate will keep returning no matter how big the model gets. That clarity lets teams stop wasting time on folk remedies and focus on leverage points that move metrics and outcomes together.
10. Closing, Let’s Reward Models For Knowing Their Limits
If you are a developer, add confidence targets to your prompts today. If you run evaluations, publish accuracy, error, and abstention together. If you run a leaderboard, add an IDK credit right now. That is how trustworthy AI becomes more than a tagline. It also answers the question why do llms hallucinate with a plan, not a shrug.
If this helped, share it with the person who writes your evals and the person who signs off on your model’s goals. Then send me your before and after charts. Let’s make honesty the winning strategy, not the losing one.
Citation:
Kalai, Adam Tauman, Santosh Vempala, Ofir Nachum, Eddie Zhang, David Robinson, Saachi Jain, Eric Mitchell, Alex Beutel, and Johannes Heidecke. 2025. Why Language Models Hallucinate. San Francisco, OpenAI, September 5.
What is an example of an LLM hallucination?
A well known example is the Adam Tauman Kalai test. Ask a model for his birthday and it may confidently return different dates, all wrong. Ask for his PhD dissertation title and it may invent plausible, incorrect titles. These are classic llm hallucination examples.
What is the main reason LLMs hallucinate, according to OpenAI’s research?
OpenAI’s study answers the question why do llms hallucinate in two parts. Pretraining creates statistical errors because the model learns to imitate fluent text, not truth. Post-training and leaderboards score accuracy without rewarding uncertainty, so models guess instead of saying IDK.
What is the difference between a hallucination and a simple mistake?
A simple mistake looks like a typo or a small slip. A hallucination is a confident, fluent, but false statement. In other words, a plausible falsehood. The paper and recent surveys define llm hallucination as nonfactual output that sounds correct, which makes it harder to spot.
How do training and evaluation methods cause AI hallucinations?
Can AI hallucinations be completely stopped or prevented?
Not completely. Some questions are inherently unanswerable from training data, and the paper proves lower bounds tied to “singleton” facts. You can lower rates by changing evaluations to reward honest uncertainty, adding confidence targets, and using detection tools, but zero is unrealistic.