Why Tokenizer Efficiency Changes Effective LLM Cost
Per-token pricing is the headline figure, but real cost per word depends on tokenizer efficiency: the same English paragraph may produce 800 tokens for one vendor and 1,100 for another. The same Chinese paragraph may produce 1.5x more tokens on a Western-tuned tokenizer than on a Chinese-optimised one. Effective cost comparisons require normalising for tokenizer efficiency, not headline rate-card numbers. This page consolidates the tokenizer-efficiency picture for the major vendors in May 2026.
Approximate Tokens per English Word (May 2026)
| Model | Vendor | Tokens per English Word | Notes |
|---|---|---|---|
| GPT-4o-era and earlier | OpenAI | ~1.30 | cl100k_base tokenizer |
| GPT-5.5 | OpenAI | ~1.25 | o200k_base tokenizer, slightly more efficient |
| Claude Opus 4.6 and earlier | Anthropic | ~1.30 | Standard Claude tokenizer |
| Claude Opus 4.7 | Anthropic | ~1.75 | New tokenizer produces ~35% MORE tokens for same input |
| Gemini 2.5 / 3.1 | ~1.20 | SentencePiece-based | |
| DeepSeek V4 | DeepSeek | ~1.30 (English) / 1.1 chars per Chinese char | Chinese-optimised tokenizer |
| Qwen 3.5 | Alibaba | ~1.20 (English) / 1.0-1.1 chars per Chinese char | Chinese-native tokenizer |
| Mistral / Codestral | Mistral | ~1.25 | Standard SentencePiece |
Chinese / Japanese / Korean Tokenization (Tokens per Character)
| Model | Chinese (Mandarin) Tokens per Character | Cost Multiplier vs English |
|---|---|---|
| GPT-2 (legacy reference) | ~2.1 | 3.0x |
| GPT-4o / GPT-5.5 | ~1.4-1.8 | 2.0-2.5x |
| Claude Opus 4.7 | ~1.4-1.6 | 1.9-2.2x |
| Gemini 2.5 / 3.1 | ~1.2-1.4 | 1.5-1.8x |
| DeepSeek V4 | ~1.1 | 1.3x |
| Qwen 3.5 | ~1.0-1.1 | 1.2-1.4x |
Effective Cost Recalculation (1,000-Word English Document)
| Model | Rate-Card Input ($/1M) | Tokens for 1K Words | Effective Cost |
|---|---|---|---|
| DeepSeek V4-Flash | $0.14 | ~1,300 | $0.00018 |
| Gemini 2.5 Flash-Lite | $0.10 | ~1,200 | $0.00012 |
| Claude Sonnet 4.6 | $3.00 | ~1,750 | $0.00525 |
| GPT-5.5 | $5.00 | ~1,250 | $0.00625 |
| Claude Opus 4.7 | $5.00 | ~1,750 | $0.00875 |
Note: Claude Opus 4.7's new tokenizer makes its effective cost ~40 percent higher than the rate-card-implied cost would suggest, because the tokenizer produces approximately 35 percent more tokens for the same English input. Cost-conscious users should normalise across vendors using tokens-per-word, not rate-card-only.
Six Things the Tokenizer Picture Tells You
- Claude Opus 4.7's new tokenizer changed effective pricing. The 35 percent token-inflation versus prior Claude tokenizers means at the same per-token rate, Opus 4.7 costs ~40 percent more per word than Opus 4.6 did. This is the single most consequential vendor-side change to AI economics in May 2026 and is not widely understood.
- DeepSeek wins on Chinese cost by 30-40 percent. At 1.1 tokens per Chinese character versus 1.4-1.8 for Western tokenizers, DeepSeek's effective Chinese-content cost is 30-40 percent lower than equivalent vendors before pricing differences are even considered.
- Qwen 3.5's tokenizer is the most efficient for Chinese content. 1.0-1.1 tokens per Chinese character means Qwen approaches one-token-per-character efficiency. Combined with Qwen's competitive English performance, Qwen is the most cost-efficient choice for bilingual EN-CN workloads.
- GPT-5.5's o200k_base is more efficient than cl100k_base. OpenAI's newer tokenizer produces ~4 percent fewer tokens for the same English input compared to GPT-4o-era tokenizers. Small per-call savings; meaningful at scale.
- Gemini is competitive on tokenizer efficiency. ~1.20 tokens per English word makes Gemini one of the more efficient tokenizers, partially offsetting Gemini's above-threshold long-context surcharges on a per-word basis.
- Code uses more tokens than prose. Code averages 1.4-1.8 tokens per "word" (where a word is loosely defined for code), making code-heavy workloads structurally more expensive per task than equivalent natural-language workloads. Plan capacity accordingly.
What This Means for AI Visibility
Tokenizer efficiency affects the unit economics of every AI-mediated workload, including brand-recommendation pipelines. A brand whose name is tokenised in 2 tokens versus 5 tokens has a small but measurable advantage in retrieval-augmented prompts that approach token budgets. More importantly, the cost-per-recommendation differs by vendor in ways that affect routing: an agent stack choosing between models for a Chinese-language brand-comparison query will route to DeepSeek or Qwen for cost reasons, which shifts brand visibility outcomes. Brands operating bilingual or multilingual visibility programmes should benchmark on both English-tokenizer and Chinese-tokenizer cost bases, not rate-card alone.
Methodology
Tokenizer efficiency estimates compiled May 14, 2026 from vendor tokenizer documentation, third-party tokenizer-comparison analyses, and Presenc AI's own measurement against standardised English and Chinese corpora. Tokens-per-word figures vary with content style (formal prose, casual text, code, technical writing); the figures here represent typical English prose. Refreshed quarterly as vendors update tokenizers.
How Presenc AI Helps
Presenc AI tracks brand-mention rates across LLMs with attention to tokenizer-driven cost differences. When an agent stack routes to a cheaper model for a brand-comparison query and the brand's visibility shifts, the underlying tokenizer economics may be the cause. For brands optimising AI visibility across cost-sensitive deployments, tokenizer-aware analysis is now part of complete competitive intelligence.