How often does legal AI hallucinate?

Purpose-built tools hallucinate at 17% (Lexis+ AI) to 33% (Westlaw AI-Assisted Research) per Stanford's 2025 study. General-purpose models hallucinate 43% (GPT-4 with legal RAG) to 88% (Llama 2 with no RAG) on the same evaluations.

Are AI court filings causing real problems?

Yes. Approaching 1,000 documented cases where practitioners or self-represented litigants submitted filings containing AI-generated hallucinations. Monetary sanctions and bar referrals are now routine, and multiple US districts require AI-use disclosure.

Does Harvey hallucinate?

Community estimates suggest about 1-in-6 queries (~17%), consistent with Lexis+ AI on Stanford's evaluation. Harvey was not directly evaluated in Stanford's 2025 study because access was restricted.

How can law firms reduce AI hallucinations?

Use purpose-built legal AI rather than general LLMs (cuts hallucinations roughly in half), always verify citations against Westlaw or Lexis, and treat AI output as a research starting point rather than a finished memo. Several US districts now require disclosure of AI use on filings.

Legal AI Hallucination Rates 2026: Stanford Data, Westlaw, Lexis+, Harvey

What this is

Legal AI hallucinations are the most-studied domain-specific failure mode of large language models. Stanford published the foundational results in 2024 and follow-up work in 2025; the picture in 2026 is that purpose-built legal AI tools hallucinate less than general models but still produce wrong answers at meaningful rates. This page is a 2026-05-15 reference snapshot.

Hallucination Rates by Tool

Tool / Model	Hallucination rate	Source
Lexis+ AI	17%	Stanford 2025 (Magesh et al.)
Westlaw AI-Assisted Research	33%	Stanford 2025
GPT-4 (general purpose)	43%	Stanford 2025 (legal-RAG eval)
GPT-4 (general purpose, no RAG)	58%	Stanford 2024 ("Large Legal Fictions")
GPT-3.5	69%	Stanford 2024
Llama 2	88%	Stanford 2024
Harvey (reported community estimate)	~17% (1-in-6 queries)	Tao An / Medium aggregation

Failure Mode Categories

Category	What goes wrong
Fabricated case citations	Plausible-sounding case names and citations that do not exist
Misattributed holdings	Real case cited but the proposition attached is wrong
Wrong jurisdiction	Citation belongs to a different state/circuit than claimed
Outdated authority	Cited case has been overruled or superseded
Misquoted statutory text	Statute reference correct, language paraphrased and changed

Court-Filing Incidents

Metric	Value
Cumulative documented filings with AI hallucinations	Approaching 1,000
Practitioner vs self-represented mix	Both, with practitioner share rising
Typical sanction	Monetary sanctions + bar referrals
Jurisdictional response	Standing orders requiring AI-use disclosure (multiple US districts)

Six Things the Data Tells You

Purpose-built legal AI cuts hallucinations roughly in half compared to general-purpose LLMs (17% Lexis+ vs 43% GPT-4 on the same eval).
17% is still high enough to require lawyer review. No legal AI tool in 2026 is reliable enough to submit unchecked.
Citations are the highest-risk surface. Fabricated case names are the most common failure and the easiest to detect with a Westlaw / Lexis lookup.
Court sanctions are accelerating. The cumulative documented case count is approaching 1,000, with monetary sanctions and bar referrals now routine.
Standing-order disclosure rules are spreading. Multiple US districts now require AI-use disclosure on filings.
Harvey-class tools are under-evaluated. Stanford's study did not cover Harvey directly; community estimates suggest ~1-in-6, consistent with Lexis+ AI.

What This Means for AI Visibility

Legal AI hallucinations don't just affect end users — they affect how brands and companies are referenced in legal AI answers. A brand that is mis-cited in a Westlaw AI response (e.g., wrong product, wrong jurisdiction) can have a material business impact if that response feeds a court filing. Brands that are mis-described inside Lexis+ AI or Harvey have a direct interest in correcting upstream sources.

Methodology

Hallucination rate figures sourced from Stanford HAI's legal-models-hallucinate study, the Magesh et al. 2025 paper in Journal of Empirical Legal Studies, and LegalAIWorld's plain-English summary. Court-filing incident data drawn from AI Law Librarians 2026 review and LLRX's 2026 hallucination-in-legal-research roundup.

How Presenc AI Helps

Brands monitor how they are described in legal AI tools (Westlaw AI, Lexis+ AI, Harvey, CoCounsel) using Presenc AI's vertical-aware citation monitoring. Misrepresentations are flagged with the prompt that triggered them, so legal-content and PR teams can correct the source content before the wrong answer lands in a filing.

Legal AI Hallucination Rates 2026