What this is
The cost of running LLM inference has fallen roughly 10x per year since 2021, but the trajectory is bifurcating: commodity inference is approaching free, while frontier reasoning models hold premium pricing. This page is a 2026-05-15 trajectory snapshot.
GPT-4-Equivalent Inference Cost Over Time
| Date | Cost per million input tokens | Notes |
|---|---|---|
| Nov 2021 (GPT-3-class) | ~$60 | davinci-003 era |
| Mar 2023 | ~$30 input / $60 output | GPT-4 launch |
| Nov 2023 | ~$10 | GPT-4 Turbo |
| May 2024 | ~$5 | GPT-4o |
| 2025 | ~$2.50 | Mid-tier GPT-4.x / Claude Sonnet |
| 2026 (May) | ~$0.40-$2.50 | Commodity to mid-tier; 150-1000x drop from 2021 |
Current Frontier Pricing (May 2026)
| Model | Input $/M | Output $/M | Tier |
|---|---|---|---|
| GPT-5.4 Pro | ~$30 | ~$60 | Frontier reasoning |
| Claude Opus 4.6 | $5 | $25 | Frontier |
| GPT-5.4 | $2.50 | $10 | Frontier general |
| Claude Sonnet 4.6 | $3 | $15 | Workhorse |
| Gemini 2.5 Pro | $1.25-$2.50 | $5-$10 | Workhorse |
| Gemini 2.5 Flash | $0.30 | $2.50 | Commodity |
| GPT-4.1 Nano | $0.10 | $0.40 | Commodity |
| DeepSeek V4 | $0.14 | $0.28 | Commodity open |
Cost Compression Rate by Era
| Era | Compression rate | Driver |
|---|---|---|
| 2021-2025 | ~10x/year | Distillation + chip + competition |
| 2025-2027 (expected) | ~3-5x/year | Diminishing returns + reasoning workloads |
| 2027+ (expected) | ~1.5-2x/year | Reasoning compute floor |
Six Things the Trajectory Tells You
- GPT-4-equivalent performance is now ~150-1000x cheaper than late 2021. Unprecedented decline rate for any utility category.
- Commodity inference is below $0.40/M and falling. Many workloads no longer have meaningful API cost as a bottleneck.
- Frontier reasoning holds the line at $30/M. Reasoning workloads require thinking-mode compute that doesn't compress as fast.
- The 10x/year era is ending. Expect 3-5x/year compression 2026-2027, then 1.5-2x/year as reasoning compute hits a floor.
- Open-source closed the commodity gap. DeepSeek V4 at $0.14/M input is competitive with proprietary commodity tiers.
- Per-query economics now depend on reasoning depth, not raw token count. A "useful" query at frontier-reasoning prices costs orders of magnitude more than at commodity prices.
What This Means for AI Visibility
Cheap commodity inference means AI assistants can be deployed across more surfaces (agentic workflows, automated content generation, batch summarisation) than ever before. The brand visibility implication is that the surface area where your brand can appear inside an AI-generated response is expanding even as frontier-reasoning costs hold steady.
Methodology
Historical and current pricing data combine Silicon Data's 2026 LLM cost-per-token guide, Introl's inference unit-economics analysis, Featherless LLM API pricing comparison 2026, TLDL's LLM API pricing 2026, BenchLM's LLM pricing history dashboard, and CloudIDR's 105-model 2026 analysis.
How Presenc AI Helps
Cost compression means AI assistants run across more surfaces every quarter. Presenc AI tracks brand visibility across the expanding surface area (free vs paid tiers, mobile vs desktop, frontier vs commodity model deployments) so brand teams see where new mention opportunities open up as inference costs fall.