How much has LLM inference cost dropped since 2021?

GPT-4-equivalent performance has fallen from approximately $60/M tokens in November 2021 to $0.40/M in May 2026, a 150-1000x reduction depending on model. The compression averaged about 10x/year from 2021 through 2025.

What does GPT-5.4 cost in 2026?

$2.50 per million input tokens and $10 per million output. GPT-5.4 Pro (frontier reasoning) is around $30/$60. Claude Opus 4.6 sits at $5/$25; commodity tiers like GPT-4.1 Nano at $0.10/$0.40 and DeepSeek V4 at $0.14/$0.28.

Will inference costs keep falling 10x per year?

Unlikely. Industry analysts expect 3-5x annual compression through 2027, then 1.5-2x as reasoning compute hits a hardware floor. Commodity inference will keep falling; frontier reasoning will hold near-current pricing.

What is the cheapest competitive LLM in 2026?

DeepSeek V4 at $0.14/M input is the cheapest competitive open model. Gemini 2.5 Flash at $0.30/M input and GPT-4.1 Nano at $0.10/M input are the cheapest proprietary commodity tiers.

AI Inference Cost Trajectory 2022-2026: From $60/M to $0.40/M for GPT-4 Performance

What this is

The cost of running LLM inference has fallen roughly 10x per year since 2021, but the trajectory is bifurcating: commodity inference is approaching free, while frontier reasoning models hold premium pricing. This page is a 2026-05-15 trajectory snapshot.

GPT-4-Equivalent Inference Cost Over Time

Date	Cost per million input tokens	Notes
Nov 2021 (GPT-3-class)	~$60	davinci-003 era
Mar 2023	~$30 input / $60 output	GPT-4 launch
Nov 2023	~$10	GPT-4 Turbo
May 2024	~$5	GPT-4o
2025	~$2.50	Mid-tier GPT-4.x / Claude Sonnet
2026 (May)	~$0.40-$2.50	Commodity to mid-tier; 150-1000x drop from 2021

Current Frontier Pricing (May 2026)

Model	Input $/M	Output $/M	Tier
GPT-5.4 Pro	~$30	~$60	Frontier reasoning
Claude Opus 4.6	$5	$25	Frontier
GPT-5.4	$2.50	$10	Frontier general
Claude Sonnet 4.6	$3	$15	Workhorse
Gemini 2.5 Pro	$1.25-$2.50	$5-$10	Workhorse
Gemini 2.5 Flash	$0.30	$2.50	Commodity
GPT-4.1 Nano	$0.10	$0.40	Commodity
DeepSeek V4	$0.14	$0.28	Commodity open

Cost Compression Rate by Era

Era	Compression rate	Driver
2021-2025	~10x/year	Distillation + chip + competition
2025-2027 (expected)	~3-5x/year	Diminishing returns + reasoning workloads
2027+ (expected)	~1.5-2x/year	Reasoning compute floor

Six Things the Trajectory Tells You

GPT-4-equivalent performance is now ~150-1000x cheaper than late 2021. Unprecedented decline rate for any utility category.
Commodity inference is below $0.40/M and falling. Many workloads no longer have meaningful API cost as a bottleneck.
Frontier reasoning holds the line at $30/M. Reasoning workloads require thinking-mode compute that doesn't compress as fast.
The 10x/year era is ending. Expect 3-5x/year compression 2026-2027, then 1.5-2x/year as reasoning compute hits a floor.
Open-source closed the commodity gap. DeepSeek V4 at $0.14/M input is competitive with proprietary commodity tiers.
Per-query economics now depend on reasoning depth, not raw token count. A "useful" query at frontier-reasoning prices costs orders of magnitude more than at commodity prices.

What This Means for AI Visibility

Cheap commodity inference means AI assistants can be deployed across more surfaces (agentic workflows, automated content generation, batch summarisation) than ever before. The brand visibility implication is that the surface area where your brand can appear inside an AI-generated response is expanding even as frontier-reasoning costs hold steady.

Methodology

Historical and current pricing data combine Silicon Data's 2026 LLM cost-per-token guide, Introl's inference unit-economics analysis, Featherless LLM API pricing comparison 2026, TLDL's LLM API pricing 2026, BenchLM's LLM pricing history dashboard, and CloudIDR's 105-model 2026 analysis.

How Presenc AI Helps

Cost compression means AI assistants run across more surfaces every quarter. Presenc AI tracks brand visibility across the expanding surface area (free vs paid tiers, mobile vs desktop, frontier vs commodity model deployments) so brand teams see where new mention opportunities open up as inference costs fall.

AI Inference Cost Trajectory 2022-2026