What this is
Legal AI hallucinations are the most-studied domain-specific failure mode of large language models. Stanford published the foundational results in 2024 and follow-up work in 2025; the picture in 2026 is that purpose-built legal AI tools hallucinate less than general models but still produce wrong answers at meaningful rates. This page is a 2026-05-15 reference snapshot.
Hallucination Rates by Tool
| Tool / Model | Hallucination rate | Source |
|---|---|---|
| Lexis+ AI | 17% | Stanford 2025 (Magesh et al.) |
| Westlaw AI-Assisted Research | 33% | Stanford 2025 |
| GPT-4 (general purpose) | 43% | Stanford 2025 (legal-RAG eval) |
| GPT-4 (general purpose, no RAG) | 58% | Stanford 2024 ("Large Legal Fictions") |
| GPT-3.5 | 69% | Stanford 2024 |
| Llama 2 | 88% | Stanford 2024 |
| Harvey (reported community estimate) | ~17% (1-in-6 queries) | Tao An / Medium aggregation |
Failure Mode Categories
| Category | What goes wrong |
|---|---|
| Fabricated case citations | Plausible-sounding case names and citations that do not exist |
| Misattributed holdings | Real case cited but the proposition attached is wrong |
| Wrong jurisdiction | Citation belongs to a different state/circuit than claimed |
| Outdated authority | Cited case has been overruled or superseded |
| Misquoted statutory text | Statute reference correct, language paraphrased and changed |
Court-Filing Incidents
| Metric | Value |
|---|---|
| Cumulative documented filings with AI hallucinations | Approaching 1,000 |
| Practitioner vs self-represented mix | Both, with practitioner share rising |
| Typical sanction | Monetary sanctions + bar referrals |
| Jurisdictional response | Standing orders requiring AI-use disclosure (multiple US districts) |
Six Things the Data Tells You
- Purpose-built legal AI cuts hallucinations roughly in half compared to general-purpose LLMs (17% Lexis+ vs 43% GPT-4 on the same eval).
- 17% is still high enough to require lawyer review. No legal AI tool in 2026 is reliable enough to submit unchecked.
- Citations are the highest-risk surface. Fabricated case names are the most common failure and the easiest to detect with a Westlaw / Lexis lookup.
- Court sanctions are accelerating. The cumulative documented case count is approaching 1,000, with monetary sanctions and bar referrals now routine.
- Standing-order disclosure rules are spreading. Multiple US districts now require AI-use disclosure on filings.
- Harvey-class tools are under-evaluated. Stanford's study did not cover Harvey directly; community estimates suggest ~1-in-6, consistent with Lexis+ AI.
What This Means for AI Visibility
Legal AI hallucinations don't just affect end users — they affect how brands and companies are referenced in legal AI answers. A brand that is mis-cited in a Westlaw AI response (e.g., wrong product, wrong jurisdiction) can have a material business impact if that response feeds a court filing. Brands that are mis-described inside Lexis+ AI or Harvey have a direct interest in correcting upstream sources.
Methodology
Hallucination rate figures sourced from Stanford HAI's legal-models-hallucinate study, the Magesh et al. 2025 paper in Journal of Empirical Legal Studies, and LegalAIWorld's plain-English summary. Court-filing incident data drawn from AI Law Librarians 2026 review and LLRX's 2026 hallucination-in-legal-research roundup.
How Presenc AI Helps
Brands monitor how they are described in legal AI tools (Westlaw AI, Lexis+ AI, Harvey, CoCounsel) using Presenc AI's vertical-aware citation monitoring. Misrepresentations are flagged with the prompt that triggered them, so legal-content and PR teams can correct the source content before the wrong answer lands in a filing.