Research

Legal AI Hallucination Rates 2026

Legal AI hallucinations 2026: Lexis+ AI 17%, Westlaw AI 33%, GPT-4 43% (Stanford). General models hallucinate 58-88% of legal queries. Nearly 1,000 documented court-filing incidents. Snapshot for 2026-05-15.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: May 2026

What this is

Legal AI hallucinations are the most-studied domain-specific failure mode of large language models. Stanford published the foundational results in 2024 and follow-up work in 2025; the picture in 2026 is that purpose-built legal AI tools hallucinate less than general models but still produce wrong answers at meaningful rates. This page is a 2026-05-15 reference snapshot.

Hallucination Rates by Tool

Tool / ModelHallucination rateSource
Lexis+ AI17%Stanford 2025 (Magesh et al.)
Westlaw AI-Assisted Research33%Stanford 2025
GPT-4 (general purpose)43%Stanford 2025 (legal-RAG eval)
GPT-4 (general purpose, no RAG)58%Stanford 2024 ("Large Legal Fictions")
GPT-3.569%Stanford 2024
Llama 288%Stanford 2024
Harvey (reported community estimate)~17% (1-in-6 queries)Tao An / Medium aggregation

Failure Mode Categories

CategoryWhat goes wrong
Fabricated case citationsPlausible-sounding case names and citations that do not exist
Misattributed holdingsReal case cited but the proposition attached is wrong
Wrong jurisdictionCitation belongs to a different state/circuit than claimed
Outdated authorityCited case has been overruled or superseded
Misquoted statutory textStatute reference correct, language paraphrased and changed

Court-Filing Incidents

MetricValue
Cumulative documented filings with AI hallucinationsApproaching 1,000
Practitioner vs self-represented mixBoth, with practitioner share rising
Typical sanctionMonetary sanctions + bar referrals
Jurisdictional responseStanding orders requiring AI-use disclosure (multiple US districts)

Six Things the Data Tells You

  1. Purpose-built legal AI cuts hallucinations roughly in half compared to general-purpose LLMs (17% Lexis+ vs 43% GPT-4 on the same eval).
  2. 17% is still high enough to require lawyer review. No legal AI tool in 2026 is reliable enough to submit unchecked.
  3. Citations are the highest-risk surface. Fabricated case names are the most common failure and the easiest to detect with a Westlaw / Lexis lookup.
  4. Court sanctions are accelerating. The cumulative documented case count is approaching 1,000, with monetary sanctions and bar referrals now routine.
  5. Standing-order disclosure rules are spreading. Multiple US districts now require AI-use disclosure on filings.
  6. Harvey-class tools are under-evaluated. Stanford's study did not cover Harvey directly; community estimates suggest ~1-in-6, consistent with Lexis+ AI.

What This Means for AI Visibility

Legal AI hallucinations don't just affect end users — they affect how brands and companies are referenced in legal AI answers. A brand that is mis-cited in a Westlaw AI response (e.g., wrong product, wrong jurisdiction) can have a material business impact if that response feeds a court filing. Brands that are mis-described inside Lexis+ AI or Harvey have a direct interest in correcting upstream sources.

Methodology

Hallucination rate figures sourced from Stanford HAI's legal-models-hallucinate study, the Magesh et al. 2025 paper in Journal of Empirical Legal Studies, and LegalAIWorld's plain-English summary. Court-filing incident data drawn from AI Law Librarians 2026 review and LLRX's 2026 hallucination-in-legal-research roundup.

How Presenc AI Helps

Brands monitor how they are described in legal AI tools (Westlaw AI, Lexis+ AI, Harvey, CoCounsel) using Presenc AI's vertical-aware citation monitoring. Misrepresentations are flagged with the prompt that triggered them, so legal-content and PR teams can correct the source content before the wrong answer lands in a filing.

Frequently Asked Questions

Purpose-built tools hallucinate at 17% (Lexis+ AI) to 33% (Westlaw AI-Assisted Research) per Stanford's 2025 study. General-purpose models hallucinate 43% (GPT-4 with legal RAG) to 88% (Llama 2 with no RAG) on the same evaluations.
Yes. Approaching 1,000 documented cases where practitioners or self-represented litigants submitted filings containing AI-generated hallucinations. Monetary sanctions and bar referrals are now routine, and multiple US districts require AI-use disclosure.
Community estimates suggest about 1-in-6 queries (~17%), consistent with Lexis+ AI on Stanford's evaluation. Harvey was not directly evaluated in Stanford's 2025 study because access was restricted.
Use purpose-built legal AI rather than general LLMs (cuts hallucinations roughly in half), always verify citations against Westlaw or Lexis, and treat AI output as a research starting point rather than a finished memo. Several US districts now require disclosure of AI use on filings.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.