Research

Cheap AI Token Pricing Cliff 2026

DeepSeek V4 at $0.14/M input. The price-per-token collapse trajectory 2023-2026, implications for AI lab gross margins, and the impact on Anthropic and OpenAI IPO comparables.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: May 2026

Frontier AI token pricing collapsed approximately 50x from 2023 to mid-2026. DeepSeek V4 publishes at $0.14 per million input tokens, roughly 1/20th of GPT-5 standard tier. Open-weight self-hosted inference can reach approximately $0.05 effective per million input tokens at sufficient throughput. A 20 May 2026 CNBC analysis flagged this as a potential headwind for Anthropic and OpenAI IPO comparables. This page consolidates the pricing trajectory, the supply-side drivers, and the implications for AI lab gross margins.

Key Findings

  1. GPT-4 launched at approximately $30 per million input tokens in March 2023. GPT-5.5 standard tier is approximately $3.00 per million in May 2026, a 10x reduction over 26 months at the frontier-quality price point.
  2. DeepSeek V4 publishes at $0.14 per million input tokens, with off-peak output pricing at $0.28 per million, roughly 1/20th of GPT-5 standard tier on input.
  3. Self-hosted Llama 4 at sufficient request volume (approximately 1,000+ daily requests baseline) reaches effective per-million-token costs of approximately $0.05 input and $0.10 output, an order of magnitude below provider-hosted open-weight pricing.
  4. Frontier-lab gross margins on API revenue are estimated at approximately 50 to 65 percent in 2026, down from approximately 70 to 80 percent in 2023, reflecting both price compression and rising compute costs.
  5. Inference revenue at the AI lab level is increasingly subsidising consumer-subscription product economics: ChatGPT consumer at $20 per month is structurally loss-making at heavy-user query volumes, with the deficit covered by API and enterprise revenue.

Frontier-Quality Per-Million-Token Pricing Trajectory

DateLeading ModelInput Price (per million tokens)
Mar 2023GPT-4 (8k context)~$30.00
Jun 2023GPT-4 (32k context)~$60.00
Mar 2024GPT-4o~$5.00
Oct 2024GPT-4o latest~$2.50
Apr 2025GPT-4.5~$2.00 (effective)
Aug 2025GPT-5~$2.50
Mar 2026GPT-5.5~$3.00 (standard) / $1.25 (cached)
May 2026DeepSeek V4 (frontier-adjacent open weight)$0.14
May 2026Self-hosted Llama 4 effective~$0.05

Provider Pricing Comparison (May 2026, Per Million Tokens)

ModelInputOutput
GPT-5.5 standard$3.00$15.00
GPT-5.5 mini$0.15$0.60
Claude 4.7 Opus$3.00$15.00
Claude Sonnet 4.6$0.75$3.75
Claude Haiku 4.5$0.20$0.80
Gemini 3.1 Pro$1.25 / $2.50 (depending on tier)$5.00 / $10.00
Gemini 3.1 Flash$0.075$0.30
DeepSeek V4 (provider)$0.14$0.28 off-peak / $0.56 peak
Qwen 3.5 Max (Alibaba Cloud)$0.20$0.50
Llama 4 (Together AI hosted)$0.30$0.60
Mistral Large 3$2.00$6.00
Grok 4$3.00$15.00

Frontier-Lab Gross Margin Estimates

Lab2023 API Gross Margin Estimate2026 API Gross Margin Estimate
OpenAI~75-80%~55-65%
Anthropic~70-75%~55-60%
Google AI (Gemini API)~70-75%~50-60%
Mistral AI~55-60%~45-50%
DeepSeekn/a~10-25% (estimated)

Strategic Context

Three patterns define the 2026 token-pricing landscape. First, the open-weight pricing pressure is asymmetric: it bites hardest on the routine-workload segment that historically had the highest margin contribution for frontier labs. Premium reasoning and long-context workloads remain less price-sensitive because the open-weight quality gap is largest there. Second, the consumer subscription cross-subsidy is structural: ChatGPT Plus at $20 per month is loss-making at high-usage tiers, but the API and enterprise revenue covers the deficit. Third, the IPO comparable risk is real: if frontier-lab gross margins continue to compress through 2027-2028, the public-market multiples that Anthropic and OpenAI IPO investors are underwriting will be tested.

Brand Visibility Implications

AI token pricing drives sustained procurement, technical, and investor journalism that translates to AI assistant queries about AI cost, AI inference pricing, AI cost optimization, and adjacent topics. Brands selling inference infrastructure, model serving optimization, AI cost monitoring, and adjacent products face strong AI-mediated discovery surface for this category.

Methodology

Pricing data sourced from primary provider pricing pages (OpenAI, Anthropic, Google, DeepSeek, Mistral, others). Self-hosted effective pricing estimated from GPU cost amortisation models and inference-platform benchmarks (Together AI, Anyscale, OpenRouter). Gross margin estimates from analyst reports and primary investor disclosures where available. Updated monthly given the pace of pricing changes.

How Presenc AI Helps

Presenc AI monitors brand visibility on AI pricing and AI cost queries across ChatGPT, Claude, Gemini, and Perplexity. For inference infrastructure brands, AI cost monitoring vendors, and procurement advisory firms, the platform identifies the prompts driving research traffic and the gaps where new content unlocks share of voice.

Frequently Asked Questions

Approximately 10x at the frontier-quality price point from March 2023 to May 2026, from approximately $30 per million input tokens (GPT-4) to approximately $3.00 (GPT-5.5 standard tier). Open-weight pricing has fallen further: DeepSeek V4 at $0.14 per million is approximately 200x cheaper than GPT-4 March 2023.
Yes on input token list pricing. DeepSeek V4 is $0.14 per million input tokens versus GPT-5.5 at $3.00 per million standard tier. The effective comparison depends on workload, off-peak pricing, caching, and reasoning capability differences; the headline price ratio is real but production economics vary by use case.
Likely at high-usage tiers. ChatGPT Plus at $20 per month is structurally loss-making at heavy-user query volumes because inference compute exceeds subscription revenue. The shortfall is covered by API and enterprise revenue. Sora\u2019s shutdown in April 2026 reflected a similar but more extreme unit-economics issue.
Most likely yes through 2027. Open-weight competition (DeepSeek, Qwen, Llama, Kimi, GLM) continues to put downward pressure on routine-workload pricing. The frontier-quality and reasoning premium are likely to remain more price-stable as long as the closed-model quality gap on hard reasoning tasks persists.
It is a real headwind for the equity story. If frontier-lab gross margins compress from approximately 70 percent in 2023 to approximately 55 to 65 percent in 2026 and continue downward through 2027-2028, the public-market multiples that IPO investors are underwriting will be tested. CNBC flagged this directly in a 20 May 2026 analysis.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.