Open-weight AI models captured approximately 15 percent of inference market share by January 2026, up from about 1 percent twelve months earlier. DeepSeek V4 reaches $0.14 per million input tokens, roughly 1/20th of GPT-5\u2019s comparable tier. Qwen 3.5, Llama 4, Kimi K2.6, and GLM-5.1 all reached or approached frontier capability on multiple benchmarks. Eighty-nine percent of enterprises now use at least one open-source model in production. This page consolidates the market share data, pricing comparisons, capability benchmarks, and the strategic implications.
Key Findings
- Combined open-weight AI inference market share (DeepSeek, Qwen, Llama, Kimi, GLM, Mistral open releases, others) grew from approximately 1 percent in January 2025 to approximately 15 percent by January 2026.
- DeepSeek V4 publishes input token pricing at $0.14 per million tokens, compared to approximately $2.50 to $3.00 per million tokens for GPT-5 standard tier and $3.00 per million for Claude 4.7 Opus.
- Approximately 89 percent of enterprises now use at least one open-source AI model in production, up from approximately 32 percent in early 2024 per cross-industry survey data.
- Performance gap on standard benchmarks (MMLU, GSM8K, HumanEval) effectively closed in 2026: the leading open-weight model is within 5 percentage points of the leading closed model on most benchmark categories.
- Reasoning benchmarks show the persistent closed-model lead: GPT-5.5 and Claude 4.7 Opus retain a 15-25 point lead over the strongest open-weight models on ARC-AGI-2, FrontierMath, and Humanity\u2019s Last Exam.
Inference Market Share by Provider Family (January 2026)
| Provider Family | Share of Inference Tokens | YoY Change |
|---|---|---|
| OpenAI (GPT-4o, GPT-5, GPT-5.5) | ~33% | Down from ~45% |
| Anthropic (Claude 3.5, 4.x, 4.7) | ~22% | Up from ~14% |
| Google (Gemini 1.5, 2.x, 3.x) | ~17% | Up from ~13% |
| xAI (Grok 2, 3, 4) | ~4% | Up from ~2% |
| DeepSeek (V3, V4) | ~6% | Up from ~0.5% |
| Qwen (3, 3.5) | ~5% | Up from ~0.4% |
| Llama (3.x, 4.x) | ~3% | Up from ~1% |
| Kimi / Moonshot | ~1% | Up from minimal |
| GLM (Zhipu) | ~1% | Up from minimal |
| Mistral open releases | ~0.5% | Up modestly |
| Other open weight + niche closed | ~7.5% | Various |
Pricing Comparison (Per Million Input Tokens, May 2026)
| Model | Input Price | Output Price |
|---|---|---|
| GPT-5.5 | $3.00 (standard) | $15.00 |
| Claude 4.7 Opus | $3.00 | $15.00 |
| Gemini 3.1 Pro | $1.25 / $2.50 (depending on tier) | $5.00 / $10.00 |
| GPT-5.5 mini | $0.15 | $0.60 |
| Claude Haiku 4.5 | $0.20 | $0.80 |
| Gemini 3.1 Flash | $0.075 | $0.30 |
| DeepSeek V4 (provider price) | $0.14 | $0.28 (off-peak), $0.56 (peak) |
| Qwen 3.5 (Alibaba Cloud) | $0.20 | $0.50 |
| Llama 4 Maverick (Together AI) | $0.30 | $0.60 |
| Llama 4 (self-hosted, 1k req/day baseline) | ~$0.05 effective | ~$0.10 effective |
Capability Benchmark Comparison
| Benchmark | Best Closed | Best Open | Gap |
|---|---|---|---|
| MMLU | 92% (GPT-5.5) | 90% (DeepSeek V4, Qwen 3.5 Max) | ~2% |
| GSM8K | 97% (multiple at saturation) | 96% (DeepSeek V4 Reasoning) | ~1% |
| HumanEval | 97% (Claude 4.7) | 93% (Qwen 3.5) | ~4% |
| ARC-AGI-2 | 85% (GPT-5.5) | 69% (DeepSeek V4 Reasoning) | ~16% |
| FrontierMath | 53% (GPT-5.5 with tools) | ~22% (best open) | ~31% |
| Humanity\u2019s Last Exam | ~38% (GPT-5.5) | ~14% (best open) | ~24% |
| SWE-Bench Verified | 82% (Claude 4.7 + Code) | ~58% (Qwen 3.5 + tooling) | ~24% |
| Chatbot Arena (LMSYS Elo) | ~1400 (GPT-5.5) | ~1340 (DeepSeek V4) | ~60 Elo |
Enterprise Open-Source Adoption
| Use Case | Share Using Open Model |
|---|---|
| Internal knowledge assistants and RAG | ~78% |
| Code completion and review | ~62% |
| Customer-facing chatbots | ~38% |
| Sensitive-data workflows (legal, finance, healthcare) | ~71% |
| Edge or on-device inference | ~94% |
| Cost-sensitive high-volume inference | ~85% |
| Reasoning-heavy production workloads | ~28% |
Strategic Context
Three structural patterns define the 2026 open versus closed dynamic. First, the capability plateau on standard benchmarks: open-weight models are at or near closed-model parity on most non-reasoning benchmarks, removing a key historical justification for premium closed-model pricing in routine workloads. Second, the reasoning gap persists: closed models retain a 15 to 30 percentage point lead on the hardest reasoning benchmarks, justifying premium pricing for high-judgment workloads. Third, the deployment-mode bifurcation: enterprises increasingly run a tiered stack with open-weight models for high-volume routine inference and premium closed-model APIs for reasoning-heavy or sensitive workflows.
Brand Visibility Implications
The open versus closed AI debate is one of the highest-traffic categories in enterprise AI journalism. AI assistant queries about model selection, open-source AI economics, GPU self-hosting, AI inference cost, and adjacent topics drive sustained procurement-research traffic. Brands selling inference infrastructure, model serving, fine-tuning services, RAG tooling, and adjacent products face strong AI-mediated discovery surface for this category.
Methodology
Market share figures aggregated from OpenRouter public usage data, Artificial Analysis benchmarks, Together AI and Anyscale inference platform disclosures, and provider API revenue estimates. Enterprise adoption figures from cross-industry survey data through Q1 2026. Updated quarterly.
How Presenc AI Helps
Presenc AI monitors brand visibility on open vs closed AI queries across ChatGPT, Claude, Gemini, and Perplexity. For inference infrastructure providers, model serving platforms, and fine-tuning service brands, the platform identifies the prompts driving procurement-research traffic and the gaps where new content unlocks share of voice.