Research

LLM Tokenizer Efficiency Comparison, May 2026

How LLM tokenizers differ in 2026: tokens-per-word ratios across Claude Opus 4.7, GPT-5.5, Gemini 3.1, DeepSeek V4, and Qwen 3.5. Why headline per-token pricing understates real cost.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: May 2026

Why Tokenizer Efficiency Changes Effective LLM Cost

Per-token pricing is the headline figure, but real cost per word depends on tokenizer efficiency: the same English paragraph may produce 800 tokens for one vendor and 1,100 for another. The same Chinese paragraph may produce 1.5x more tokens on a Western-tuned tokenizer than on a Chinese-optimised one. Effective cost comparisons require normalising for tokenizer efficiency, not headline rate-card numbers. This page consolidates the tokenizer-efficiency picture for the major vendors in May 2026.

Approximate Tokens per English Word (May 2026)

ModelVendorTokens per English WordNotes
GPT-4o-era and earlierOpenAI~1.30cl100k_base tokenizer
GPT-5.5OpenAI~1.25o200k_base tokenizer, slightly more efficient
Claude Opus 4.6 and earlierAnthropic~1.30Standard Claude tokenizer
Claude Opus 4.7Anthropic~1.75New tokenizer produces ~35% MORE tokens for same input
Gemini 2.5 / 3.1Google~1.20SentencePiece-based
DeepSeek V4DeepSeek~1.30 (English) / 1.1 chars per Chinese charChinese-optimised tokenizer
Qwen 3.5Alibaba~1.20 (English) / 1.0-1.1 chars per Chinese charChinese-native tokenizer
Mistral / CodestralMistral~1.25Standard SentencePiece

Chinese / Japanese / Korean Tokenization (Tokens per Character)

ModelChinese (Mandarin) Tokens per CharacterCost Multiplier vs English
GPT-2 (legacy reference)~2.13.0x
GPT-4o / GPT-5.5~1.4-1.82.0-2.5x
Claude Opus 4.7~1.4-1.61.9-2.2x
Gemini 2.5 / 3.1~1.2-1.41.5-1.8x
DeepSeek V4~1.11.3x
Qwen 3.5~1.0-1.11.2-1.4x

Effective Cost Recalculation (1,000-Word English Document)

ModelRate-Card Input ($/1M)Tokens for 1K WordsEffective Cost
DeepSeek V4-Flash$0.14~1,300$0.00018
Gemini 2.5 Flash-Lite$0.10~1,200$0.00012
Claude Sonnet 4.6$3.00~1,750$0.00525
GPT-5.5$5.00~1,250$0.00625
Claude Opus 4.7$5.00~1,750$0.00875

Note: Claude Opus 4.7's new tokenizer makes its effective cost ~40 percent higher than the rate-card-implied cost would suggest, because the tokenizer produces approximately 35 percent more tokens for the same English input. Cost-conscious users should normalise across vendors using tokens-per-word, not rate-card-only.

Six Things the Tokenizer Picture Tells You

  1. Claude Opus 4.7's new tokenizer changed effective pricing. The 35 percent token-inflation versus prior Claude tokenizers means at the same per-token rate, Opus 4.7 costs ~40 percent more per word than Opus 4.6 did. This is the single most consequential vendor-side change to AI economics in May 2026 and is not widely understood.
  2. DeepSeek wins on Chinese cost by 30-40 percent. At 1.1 tokens per Chinese character versus 1.4-1.8 for Western tokenizers, DeepSeek's effective Chinese-content cost is 30-40 percent lower than equivalent vendors before pricing differences are even considered.
  3. Qwen 3.5's tokenizer is the most efficient for Chinese content. 1.0-1.1 tokens per Chinese character means Qwen approaches one-token-per-character efficiency. Combined with Qwen's competitive English performance, Qwen is the most cost-efficient choice for bilingual EN-CN workloads.
  4. GPT-5.5's o200k_base is more efficient than cl100k_base. OpenAI's newer tokenizer produces ~4 percent fewer tokens for the same English input compared to GPT-4o-era tokenizers. Small per-call savings; meaningful at scale.
  5. Gemini is competitive on tokenizer efficiency. ~1.20 tokens per English word makes Gemini one of the more efficient tokenizers, partially offsetting Gemini's above-threshold long-context surcharges on a per-word basis.
  6. Code uses more tokens than prose. Code averages 1.4-1.8 tokens per "word" (where a word is loosely defined for code), making code-heavy workloads structurally more expensive per task than equivalent natural-language workloads. Plan capacity accordingly.

What This Means for AI Visibility

Tokenizer efficiency affects the unit economics of every AI-mediated workload, including brand-recommendation pipelines. A brand whose name is tokenised in 2 tokens versus 5 tokens has a small but measurable advantage in retrieval-augmented prompts that approach token budgets. More importantly, the cost-per-recommendation differs by vendor in ways that affect routing: an agent stack choosing between models for a Chinese-language brand-comparison query will route to DeepSeek or Qwen for cost reasons, which shifts brand visibility outcomes. Brands operating bilingual or multilingual visibility programmes should benchmark on both English-tokenizer and Chinese-tokenizer cost bases, not rate-card alone.

Methodology

Tokenizer efficiency estimates compiled May 14, 2026 from vendor tokenizer documentation, third-party tokenizer-comparison analyses, and Presenc AI's own measurement against standardised English and Chinese corpora. Tokens-per-word figures vary with content style (formal prose, casual text, code, technical writing); the figures here represent typical English prose. Refreshed quarterly as vendors update tokenizers.

How Presenc AI Helps

Presenc AI tracks brand-mention rates across LLMs with attention to tokenizer-driven cost differences. When an agent stack routes to a cheaper model for a brand-comparison query and the brand's visibility shifts, the underlying tokenizer economics may be the cause. For brands optimising AI visibility across cost-sensitive deployments, tokenizer-aware analysis is now part of complete competitive intelligence.

Frequently Asked Questions

Claude Opus 4.7 ships with a new tokenizer that produces approximately 35 percent more tokens for the same English input compared to prior Claude tokenizers. At the same per-token rate ($5/$25), Opus 4.7 therefore costs approximately 40 percent more per word in practice. This is one of the most consequential vendor-side changes to AI economics in May 2026 and is not widely understood.
Gemini 2.5/3.1 at approximately 1.20 tokens per English word is the most efficient among major frontier models. GPT-5.5 (o200k_base) follows at approximately 1.25, then Mistral and prior Claude versions at approximately 1.25-1.30. Claude Opus 4.7's new tokenizer at approximately 1.75 is the least efficient among current frontier models.
Qwen 3.5 at approximately 1.0-1.1 tokens per Chinese character is the most efficient. DeepSeek V4 at approximately 1.1 is second. Western-tuned tokenizers (GPT, Claude) typically use 1.4-1.8 tokens per Chinese character, which translates to 30-50 percent higher cost per Chinese word independent of per-token pricing differences.
Yes, indirectly. Agent stacks that route between models often pick by cost-per-query, which is a function of tokenizer efficiency × rate-card price. Models with efficient tokenizers attract more routing volume, which shifts brand visibility outcomes downstream. For brands operating bilingual or multilingual programmes, tokenizer-aware analysis is now part of complete AI visibility instrumentation.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.