Frontier AI token pricing collapsed approximately 50x from 2023 to mid-2026. DeepSeek V4 publishes at $0.14 per million input tokens, roughly 1/20th of GPT-5 standard tier. Open-weight self-hosted inference can reach approximately $0.05 effective per million input tokens at sufficient throughput. A 20 May 2026 CNBC analysis flagged this as a potential headwind for Anthropic and OpenAI IPO comparables. This page consolidates the pricing trajectory, the supply-side drivers, and the implications for AI lab gross margins.
Key Findings
- GPT-4 launched at approximately $30 per million input tokens in March 2023. GPT-5.5 standard tier is approximately $3.00 per million in May 2026, a 10x reduction over 26 months at the frontier-quality price point.
- DeepSeek V4 publishes at $0.14 per million input tokens, with off-peak output pricing at $0.28 per million, roughly 1/20th of GPT-5 standard tier on input.
- Self-hosted Llama 4 at sufficient request volume (approximately 1,000+ daily requests baseline) reaches effective per-million-token costs of approximately $0.05 input and $0.10 output, an order of magnitude below provider-hosted open-weight pricing.
- Frontier-lab gross margins on API revenue are estimated at approximately 50 to 65 percent in 2026, down from approximately 70 to 80 percent in 2023, reflecting both price compression and rising compute costs.
- Inference revenue at the AI lab level is increasingly subsidising consumer-subscription product economics: ChatGPT consumer at $20 per month is structurally loss-making at heavy-user query volumes, with the deficit covered by API and enterprise revenue.
Frontier-Quality Per-Million-Token Pricing Trajectory
| Date | Leading Model | Input Price (per million tokens) |
|---|---|---|
| Mar 2023 | GPT-4 (8k context) | ~$30.00 |
| Jun 2023 | GPT-4 (32k context) | ~$60.00 |
| Mar 2024 | GPT-4o | ~$5.00 |
| Oct 2024 | GPT-4o latest | ~$2.50 |
| Apr 2025 | GPT-4.5 | ~$2.00 (effective) |
| Aug 2025 | GPT-5 | ~$2.50 |
| Mar 2026 | GPT-5.5 | ~$3.00 (standard) / $1.25 (cached) |
| May 2026 | DeepSeek V4 (frontier-adjacent open weight) | $0.14 |
| May 2026 | Self-hosted Llama 4 effective | ~$0.05 |
Provider Pricing Comparison (May 2026, Per Million Tokens)
| Model | Input | Output |
|---|---|---|
| GPT-5.5 standard | $3.00 | $15.00 |
| GPT-5.5 mini | $0.15 | $0.60 |
| Claude 4.7 Opus | $3.00 | $15.00 |
| Claude Sonnet 4.6 | $0.75 | $3.75 |
| Claude Haiku 4.5 | $0.20 | $0.80 |
| Gemini 3.1 Pro | $1.25 / $2.50 (depending on tier) | $5.00 / $10.00 |
| Gemini 3.1 Flash | $0.075 | $0.30 |
| DeepSeek V4 (provider) | $0.14 | $0.28 off-peak / $0.56 peak |
| Qwen 3.5 Max (Alibaba Cloud) | $0.20 | $0.50 |
| Llama 4 (Together AI hosted) | $0.30 | $0.60 |
| Mistral Large 3 | $2.00 | $6.00 |
| Grok 4 | $3.00 | $15.00 |
Frontier-Lab Gross Margin Estimates
| Lab | 2023 API Gross Margin Estimate | 2026 API Gross Margin Estimate |
|---|---|---|
| OpenAI | ~75-80% | ~55-65% |
| Anthropic | ~70-75% | ~55-60% |
| Google AI (Gemini API) | ~70-75% | ~50-60% |
| Mistral AI | ~55-60% | ~45-50% |
| DeepSeek | n/a | ~10-25% (estimated) |
Strategic Context
Three patterns define the 2026 token-pricing landscape. First, the open-weight pricing pressure is asymmetric: it bites hardest on the routine-workload segment that historically had the highest margin contribution for frontier labs. Premium reasoning and long-context workloads remain less price-sensitive because the open-weight quality gap is largest there. Second, the consumer subscription cross-subsidy is structural: ChatGPT Plus at $20 per month is loss-making at high-usage tiers, but the API and enterprise revenue covers the deficit. Third, the IPO comparable risk is real: if frontier-lab gross margins continue to compress through 2027-2028, the public-market multiples that Anthropic and OpenAI IPO investors are underwriting will be tested.
Brand Visibility Implications
AI token pricing drives sustained procurement, technical, and investor journalism that translates to AI assistant queries about AI cost, AI inference pricing, AI cost optimization, and adjacent topics. Brands selling inference infrastructure, model serving optimization, AI cost monitoring, and adjacent products face strong AI-mediated discovery surface for this category.
Methodology
Pricing data sourced from primary provider pricing pages (OpenAI, Anthropic, Google, DeepSeek, Mistral, others). Self-hosted effective pricing estimated from GPU cost amortisation models and inference-platform benchmarks (Together AI, Anyscale, OpenRouter). Gross margin estimates from analyst reports and primary investor disclosures where available. Updated monthly given the pace of pricing changes.
How Presenc AI Helps
Presenc AI monitors brand visibility on AI pricing and AI cost queries across ChatGPT, Claude, Gemini, and Perplexity. For inference infrastructure brands, AI cost monitoring vendors, and procurement advisory firms, the platform identifies the prompts driving research traffic and the gaps where new content unlocks share of voice.