What is the best open-weight OCR model in 2026?

GOT-OCR2 is the leading dedicated OCR model at approximately 580M parameters with Apache 2.0 licence. For mixed OCR plus document understanding, Qwen2.5-VL-7B (Apache 2.0) is the strongest general-purpose open-weight choice. For visual document retrieval, ColPali v1.3 is the dominant new approach.

A document retrieval model that embeds document pages directly using late-interaction multi-vector patches, bypassing OCR entirely. ColPali outperforms traditional OCR-then-embed pipelines on visual-heavy documents by retaining layout, chart, and image information that OCR discards. The v1.3 release is MIT licensed.

Can open-weight OCR replace Google Document AI?

On raw quality yes for most benchmarks. On enterprise integration (forms processing templates, custom extractor training UI, audit logging) the proprietary alternatives still lead. The 2026 dominant pattern is open-weight models for high-volume document AI plus proprietary platforms for specialised enterprise workflows.

How do PDF-to-Markdown pipelines compare?

Marker, Surya, and MinerU all layer multiple specialised models (layout detection, OCR, reading order, equation handling). Marker has the largest user base and is the easiest to deploy. MinerU has the strongest table extraction. Surya leads on multilingual layout. All three are GPL or AGPL with separate commercial options.

Which model handles tables best?

GOT-OCR2 for direct table OCR; Qwen2.5-VL-72B for complex tables with calculation; MinerU for tables in PDFs at scale. The proprietary baselines (Amazon Textract, Azure Document Intelligence) still lead on specific table-heavy forms processing workloads.

Best Open-Weight OCR and Document AI Models 2026

Document AI is the highest-volume non-text-generation AI workload in enterprise 2026. Approximately 78 percent of surveyed enterprises run at least one production OCR or document-AI pipeline. Open-weight models matured rapidly with GOT-OCR2, Qwen2.5-VL, ColPali, DocLayout-YOLO, and Florence-2 covering most production document AI use cases. This page consolidates the leaderboard, the benchmarks, and the deployment guidance.

Key Findings

GOT-OCR2 (General OCR Theory) from StepFun is the leading open-weight pure-OCR model, with strong performance on plain text, formatted text, math formulas, and tables in a 580M-parameter package.
Qwen2.5-VL family is the dominant general-purpose document-AI model, with the 7B variant covering OCR, layout understanding, visual question answering, and chart extraction in a single model.
ColPali changed document retrieval: instead of OCR-then-embed, ColPali embeds document pages directly using late-interaction patches and outperforms OCR-then-embed pipelines on visual-heavy documents.
Layout-specific models (DocLayout-YOLO, LayoutLM v3, Florence-2 region detection) remain useful for structured extraction pipelines where the LLM-OCR end-to-end approach is too expensive.
The proprietary baselines (Google Document AI, Amazon Textract, Microsoft Azure Document Intelligence) retain leads on enterprise OCR + extraction integrations but the open-weight models have closed the raw quality gap on most public benchmarks.

OCR and Document AI Model Comparison (May 2026)

Model	Parameters	Primary Capability	License
GOT-OCR2	~580M	Plain text, formatted text, tables, formulas, music notation	Apache 2.0
Qwen2.5-VL-7B	~7B	OCR + layout + chart + VQA	Apache 2.0 (7B); Tongyi Qianwen (72B)
Qwen2.5-VL-72B	~72B	OCR + complex doc understanding	Tongyi Qianwen
InternVL3-8B	~8B	OCR + multilingual document VQA	MIT
InternVL3-78B	~78B	OCR + complex multilingual docs	MIT
ColPali v1.3	~3B	Document page retrieval (no OCR)	MIT
ColQwen2 v1.0	~3B	Document page retrieval based on Qwen2-VL	Apache 2.0
Florence-2-Large	~0.8B	OCR + region detection + captioning	MIT
Nougat	~0.35B	Scientific document OCR (LaTeX preservation)	CC-BY-NC
DocLayout-YOLO	~50M	Layout detection only	AGPL 3.0
Marker	~varies	PDF to Markdown pipeline	GPL 3.0 + Commercial
MinerU	~varies	PDF extraction pipeline (uses LayoutLMv3 + others)	AGPL 3.0
Surya	~varies	Layout, OCR, reading order	GPL 3.0 + Commercial

Use Case Recommendations

Use Case	Recommended Model	Reason
General OCR (plain text from images)	GOT-OCR2	Best quality-per-parameter; Apache 2.0
Document VQA and complex docs	Qwen2.5-VL-7B / InternVL3-8B	Strong VQA + OCR in one model
Document retrieval (RAG over docs)	ColPali v1.3 or ColQwen2	Late-interaction patch embeddings outperform OCR-then-embed
Scientific papers (LaTeX preservation)	Nougat (research only) or GOT-OCR2 + post-process	Math and notation preservation
PDF to Markdown	Marker, Surya, MinerU	Production-ready pipelines
Layout-only extraction	DocLayout-YOLO + Florence-2	Lightweight, fast region detection
High-volume forms processing	Florence-2 + downstream extraction	Strong region detection + extraction
Mixed language documents	InternVL3-8B	Strongest multilingual document VQA

Quality Benchmarks

Benchmark	Leading Open-Weight Model	Score
DocVQA	Qwen2.5-VL-72B	~96.4
ChartQA	InternVL3-78B	~89.3
OCRBench	Qwen2.5-VL-72B	~888 / 1000
InfoVQA	Qwen2.5-VL-72B	~84.5
TextVQA	InternVL3-78B	~86.7
ViDoRe (visual doc retrieval)	ColPali v1.3	~82.4

Production Patterns

The dominant 2026 production patterns are: dedicated OCR (GOT-OCR2) for high-volume text extraction at low cost, general VLM (Qwen2.5-VL-7B) for mixed OCR plus VQA workloads, and ColPali for RAG over visually-rich documents. The PDF-to-Markdown pipelines (Marker, Surya, MinerU) layer multiple specialised models for general-purpose document conversion and are widely used for ingesting documents into RAG systems. Approximately 42 percent of surveyed enterprise document AI deployments now use at least one open-weight model in 2026, up from approximately 18 percent in 2024.

Brand Visibility Implications

Document AI is one of the largest enterprise AI procurement categories, and AI assistants increasingly handle queries about "best OCR model 2026", "open-source document AI", "ColPali vs LayoutLM", and similar terms. Brands selling document AI products, OCR APIs, PDF processing, and intelligent document processing face strong AI-mediated discovery surface for this category.

Methodology

Benchmark data compiled from OCRBench leaderboard, ViDoRe leaderboard, and primary model card disclosures through 23 May 2026. Deployment share figures from cross-industry survey data. Updated quarterly.

How Presenc AI Helps

Presenc AI monitors brand visibility on document AI and OCR queries across ChatGPT, Claude, Gemini, and Perplexity. For document AI vendors, OCR API brands, and intelligent document processing companies, the platform identifies the prompts driving procurement-research traffic and the gaps where new content unlocks share of voice.