Frontier LLM API Pricing at a Glance, May 2026
This page consolidates published per-million-token API pricing for the major frontier LLMs in May 2026. Numbers below are pulled from each vendor's own pricing page or developer documentation and represent standard pay-as-you-go rates, not committed-use, batch, or enterprise pricing. Caching, batch, and long-context surcharges are noted where the vendor publishes them.
Frontier Tier Comparison (Standard Pricing, USD per 1M Tokens)
| Vendor | Model | Input | Output | Context |
|---|---|---|---|---|
| OpenAI | GPT-5.5 Pro | $30.00 | $180.00 | ~270K |
| OpenAI | GPT-5.5 | $5.00 | $30.00 | ~270K |
| OpenAI | GPT-5.4 | $2.50 | $15.00 | ~270K |
| Anthropic | Claude Opus 4.7 | $5.00 | $25.00 | 200K |
| Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | 1M (flat) |
| Gemini 3.1 Pro Preview | $2.00 / $4.00* | $12.00 / $18.00* | 1M | |
| Gemini 2.5 Pro | $1.25 / $2.50* | $10.00 / $15.00* | 1M | |
| xAI | grok-4.3 | $1.25 | $2.50 | 1M |
| xAI | grok-4.20 (reasoning) | $1.25 | $2.50 | 2M |
| Mistral | Mistral Medium 3.5 | $1.50 | $7.50 | 128K |
| Cohere | Command R+ (08-2024) | $2.50 | $10.00 | 128K |
*Google tiered pricing: lower number for prompts up to 200K input tokens, higher for prompts above 200K.
Cost-Leader Tier Comparison (Standard Pricing, USD per 1M Tokens)
| Vendor | Model | Input | Output | Context |
|---|---|---|---|---|
| Anthropic | Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| OpenAI | GPT-5.4 Mini | $0.75 | $4.50 | ~270K |
| Gemini 3 Flash Preview | $0.50 | $3.00 | 1M | |
| Mistral | Mistral Large 3 | $0.50 | $1.50 | 128K |
| Mistral | Mistral Medium 3 | $0.40 | $2.00 | 128K |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1M | |
| OpenAI | GPT-5.4 Nano | $0.20 | $1.25 | ~270K |
| xAI | grok-4-1-fast | $0.20 | $0.50 | 2M |
| DeepSeek | DeepSeek-V4-Flash | $0.14 | $0.28 | 1M |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | |
| OpenAI | GPT-4.1 Nano | $0.10 | $0.40 | ~270K |
Inference-Speed Specialists (Groq, OSS Models)
Groq hosts open-weight models on custom LPU hardware and prices on tokens, not seconds. The headline figure for many workloads is throughput (tokens per second) rather than dollars per token.
| Model | Input | Output | Throughput |
|---|---|---|---|
| Llama 4 Scout (17B x 16E) | $0.11 | $0.34 | 594 tps |
| Llama 3.3 70B Versatile | $0.59 | $0.79 | 394 tps |
| Llama 3.1 8B Instant | $0.05 | $0.08 | 840 tps |
| GPT-OSS 120B (128K) | $0.15 | $0.60 | 500 tps |
| Qwen3 32B (131K) | $0.29 | $0.59 | 662 tps |
Five Things the Table Tells You
- Frontier-to-frugal spans roughly 300x at the output token. GPT-5.5 Pro output is $180 per million; Gemini 2.5 Flash-Lite output is $0.40. The same workload routed to the right model can be three orders of magnitude cheaper.
- Output costs 4-6x input across most vendors. The output multiplier is the dominant cost lever. Workloads that read a lot and write a little (summarization, classification, extraction) are dramatically cheaper per request than workloads that write a lot (long-form generation, agentic loops).
- Sonnet 4.6 is the only flagship with a flat 1M-token context. Google charges roughly 2x above 200K input on Gemini 2.5 Pro and Gemini 3.1 Pro; OpenAI's long-context rates apply above ~270K. Anthropic's flat-rate 1M tier removes that surcharge planning step.
- DeepSeek-V4-Flash redefined the cost floor. At $0.14 input / $0.28 output with a 1M context, V4-Flash is roughly 50x cheaper than GPT-5.5 on input. Cache-hit pricing on V4-Flash falls to $0.0028 per million input tokens, which is functionally free for read-heavy retrieval workloads.
- Caching and batch discounts are the second cost lever after model choice. Anthropic batch processing halves token costs and prompt caching reduces cached input by 90 percent. DeepSeek-V4-Pro's temporary 75 percent discount (effective through May 31, 2026) and Google's context caching offer comparable structural savings. Workloads with stable system prompts or retrieval contexts see the largest gains.
What This Means for AI Visibility and Brand-Recommendation Workloads
The model that a downstream agent uses to evaluate brand options has a direct effect on which brands surface. Cheap models (Flash-Lite, V4-Flash, Nano tiers) handle bulk routing and lightweight extraction; flagship models (Opus 4.7, GPT-5.5, Gemini 3.1 Pro) handle reasoning-heavy comparison and final recommendations. Brands optimising for visibility should test their visibility across both tiers because cheap models increasingly run upstream of the flagship that produces the consumer-visible answer.
Methodology
Pricing data collected May 14, 2026 from vendor sources: OpenAI API pricing, Anthropic Claude API pricing, Gemini API pricing, Mistral pricing, DeepSeek pricing, xAI Grok models, Cohere pricing, and Groq pricing. All figures are USD per million tokens at standard pay-as-you-go rates. Long-context surcharges and tiered rates are flagged where the vendor publishes them. Pricing is refreshed quarterly. Treat figures as accurate at time of capture; verify against the vendor source before committing to enterprise spend.
How Presenc AI Helps
Presenc AI monitors brand-recommendation outputs across the major AI platforms, including the flagship and cost-leader tiers in the tables above. When a brand's visibility shifts because an upstream model swap routes more traffic through Gemini 2.5 Flash or DeepSeek-V4-Flash rather than the flagship, that change shows up in our platform-by-platform tracking. For brands building AI visibility strategy, the relevant question is not just which model is cheapest, but which models the recommendation pipeline actually uses end-to-end.