Why does Claude Opus 4.7 cost more per word than Opus 4.6?

Claude Opus 4.7 ships with a new tokenizer that produces approximately 35 percent more tokens for the same English input compared to prior Claude tokenizers. At the same per-token rate ($5/$25), Opus 4.7 therefore costs approximately 40 percent more per word in practice. This is one of the most consequential vendor-side changes to AI economics in May 2026 and is not widely understood.

Which LLM has the most efficient tokenizer for English?

Gemini 2.5/3.1 at approximately 1.20 tokens per English word is the most efficient among major frontier models. GPT-5.5 (o200k_base) follows at approximately 1.25, then Mistral and prior Claude versions at approximately 1.25-1.30. Claude Opus 4.7's new tokenizer at approximately 1.75 is the least efficient among current frontier models.

Which LLM is best for Chinese content cost?

Qwen 3.5 at approximately 1.0-1.1 tokens per Chinese character is the most efficient. DeepSeek V4 at approximately 1.1 is second. Western-tuned tokenizers (GPT, Claude) typically use 1.4-1.8 tokens per Chinese character, which translates to 30-50 percent higher cost per Chinese word independent of per-token pricing differences.

Does tokenizer efficiency affect AI visibility outcomes?

Yes, indirectly. Agent stacks that route between models often pick by cost-per-query, which is a function of tokenizer efficiency × rate-card price. Models with efficient tokenizers attract more routing volume, which shifts brand visibility outcomes downstream. For brands operating bilingual or multilingual programmes, tokenizer-aware analysis is now part of complete AI visibility instrumentation.

LLM Tokenizer Efficiency Comparison 2026

Why Tokenizer Efficiency Changes Effective LLM Cost

Per-token pricing is the headline figure, but real cost per word depends on tokenizer efficiency: the same English paragraph may produce 800 tokens for one vendor and 1,100 for another. The same Chinese paragraph may produce 1.5x more tokens on a Western-tuned tokenizer than on a Chinese-optimised one. Effective cost comparisons require normalising for tokenizer efficiency, not headline rate-card numbers. This page consolidates the tokenizer-efficiency picture for the major vendors in May 2026.

Approximate Tokens per English Word (May 2026)

Model	Vendor	Tokens per English Word	Notes
GPT-4o-era and earlier	OpenAI	~1.30	cl100k_base tokenizer
GPT-5.5	OpenAI	~1.25	o200k_base tokenizer, slightly more efficient
Claude Opus 4.6 and earlier	Anthropic	~1.30	Standard Claude tokenizer
Claude Opus 4.7	Anthropic	~1.75	New tokenizer produces ~35% MORE tokens for same input
Gemini 2.5 / 3.1	Google	~1.20	SentencePiece-based
DeepSeek V4	DeepSeek	~1.30 (English) / 1.1 chars per Chinese char	Chinese-optimised tokenizer
Qwen 3.5	Alibaba	~1.20 (English) / 1.0-1.1 chars per Chinese char	Chinese-native tokenizer
Mistral / Codestral	Mistral	~1.25	Standard SentencePiece

Chinese / Japanese / Korean Tokenization (Tokens per Character)

Model	Chinese (Mandarin) Tokens per Character	Cost Multiplier vs English
GPT-2 (legacy reference)	~2.1	3.0x
GPT-4o / GPT-5.5	~1.4-1.8	2.0-2.5x
Claude Opus 4.7	~1.4-1.6	1.9-2.2x
Gemini 2.5 / 3.1	~1.2-1.4	1.5-1.8x
DeepSeek V4	~1.1	1.3x
Qwen 3.5	~1.0-1.1	1.2-1.4x

Effective Cost Recalculation (1,000-Word English Document)

Model	Rate-Card Input ($/1M)	Tokens for 1K Words	Effective Cost
DeepSeek V4-Flash	$0.14	~1,300	$0.00018
Gemini 2.5 Flash-Lite	$0.10	~1,200	$0.00012
Claude Sonnet 4.6	$3.00	~1,750	$0.00525
GPT-5.5	$5.00	~1,250	$0.00625
Claude Opus 4.7	$5.00	~1,750	$0.00875

Note: Claude Opus 4.7's new tokenizer makes its effective cost ~40 percent higher than the rate-card-implied cost would suggest, because the tokenizer produces approximately 35 percent more tokens for the same English input. Cost-conscious users should normalise across vendors using tokens-per-word, not rate-card-only.

Six Things the Tokenizer Picture Tells You

Claude Opus 4.7's new tokenizer changed effective pricing. The 35 percent token-inflation versus prior Claude tokenizers means at the same per-token rate, Opus 4.7 costs ~40 percent more per word than Opus 4.6 did. This is the single most consequential vendor-side change to AI economics in May 2026 and is not widely understood.
DeepSeek wins on Chinese cost by 30-40 percent. At 1.1 tokens per Chinese character versus 1.4-1.8 for Western tokenizers, DeepSeek's effective Chinese-content cost is 30-40 percent lower than equivalent vendors before pricing differences are even considered.
Qwen 3.5's tokenizer is the most efficient for Chinese content. 1.0-1.1 tokens per Chinese character means Qwen approaches one-token-per-character efficiency. Combined with Qwen's competitive English performance, Qwen is the most cost-efficient choice for bilingual EN-CN workloads.
GPT-5.5's o200k_base is more efficient than cl100k_base. OpenAI's newer tokenizer produces ~4 percent fewer tokens for the same English input compared to GPT-4o-era tokenizers. Small per-call savings; meaningful at scale.
Gemini is competitive on tokenizer efficiency. ~1.20 tokens per English word makes Gemini one of the more efficient tokenizers, partially offsetting Gemini's above-threshold long-context surcharges on a per-word basis.
Code uses more tokens than prose. Code averages 1.4-1.8 tokens per "word" (where a word is loosely defined for code), making code-heavy workloads structurally more expensive per task than equivalent natural-language workloads. Plan capacity accordingly.

What This Means for AI Visibility

Tokenizer efficiency affects the unit economics of every AI-mediated workload, including brand-recommendation pipelines. A brand whose name is tokenised in 2 tokens versus 5 tokens has a small but measurable advantage in retrieval-augmented prompts that approach token budgets. More importantly, the cost-per-recommendation differs by vendor in ways that affect routing: an agent stack choosing between models for a Chinese-language brand-comparison query will route to DeepSeek or Qwen for cost reasons, which shifts brand visibility outcomes. Brands operating bilingual or multilingual visibility programmes should benchmark on both English-tokenizer and Chinese-tokenizer cost bases, not rate-card alone.

Methodology

Tokenizer efficiency estimates compiled May 14, 2026 from vendor tokenizer documentation, third-party tokenizer-comparison analyses, and Presenc AI's own measurement against standardised English and Chinese corpora. Tokens-per-word figures vary with content style (formal prose, casual text, code, technical writing); the figures here represent typical English prose. Refreshed quarterly as vendors update tokenizers.

How Presenc AI Helps

Presenc AI tracks brand-mention rates across LLMs with attention to tokenizer-driven cost differences. When an agent stack routes to a cheaper model for a brand-comparison query and the brand's visibility shifts, the underlying tokenizer economics may be the cause. For brands optimising AI visibility across cost-sensitive deployments, tokenizer-aware analysis is now part of complete competitive intelligence.

LLM Tokenizer Efficiency Comparison, May 2026