What the Crowd Actually Prefers in May 2026
The LMSYS Chatbot Arena (now hosted as arena.ai, formerly chat.lmsys.org) is the most-cited blind human preference benchmark in AI. Users compare two anonymous model responses side by side and pick a winner; the Bradley-Terry Elo system aggregates roughly 6 million pairwise votes accumulated since launch in May 2023. This page captures the top 25 of the Text leaderboard as of May 14, 2026, with vote counts and confidence intervals.
Top 25 Text Leaderboard (May 14, 2026)
| Rank | Model | Org | Elo | ± | Votes |
|---|---|---|---|---|---|
| 1 | claude-opus-4-6-thinking | Anthropic | 1502 | 5 | 24,925 |
| 2 | claude-opus-4-7-thinking | Anthropic | 1501 | 6 | 10,413 |
| 3 | claude-opus-4-6 | Anthropic | 1498 | 4 | 26,459 |
| 4 | claude-opus-4-7 | Anthropic | 1492 | 6 | 11,006 |
| 5 | muse-spark | Meta | 1491 | 6 | Preliminary |
| 6 | gemini-3.1-pro-preview | 1490 | 4 | 31,012 | |
| 7 | gemini-3-pro | 1486 | 4 | 41,339 | |
| 8 | gpt-5.5-high | OpenAI | 1484 | 7 | 7,877 |
| 9 | grok-4.20-beta1 | xAI | 1479 | 5 | 20,258 |
| 10 | gpt-5.4-high | OpenAI | 1479 | 5 | 18,521 |
| 11 | gpt-5.2-chat-latest-20260210 | OpenAI | 1477 | 4 | 25,130 |
| 12 | grok-4.20-beta-0309-reasoning | xAI | 1477 | 5 | 18,895 |
| 13 | gpt-5.5 | OpenAI | 1476 | 7 | 7,982 |
| 14 | grok-4.20-multi-agent-beta-0309 | xAI | 1474 | 5 | 19,137 |
| 15 | gemini-3-flash | 1474 | 4 | 30,753 | |
| 16 | ernie-5.1 | Baidu | 1473 | 7 | 6,949 |
| 17 | claude-opus-4-5-20251101-thinking-32k | Anthropic | 1473 | 4 | 37,127 |
| 18 | gpt-5.5-instant | OpenAI | 1472 | 8 | 4,927 |
| 19 | glm-5.1 | Z.ai | 1471 | 6 | 11,485 |
| 20 | claude-opus-4-5-20251101 | Anthropic | 1468 | 3 | 56,217 |
| 21 | grok-4.1-thinking | xAI | 1467 | 3 | 56,685 |
| 22 | claude-sonnet-4-6 | Anthropic | 1467 | 5 | 18,529 |
| 23 | gpt-5.4 | OpenAI | 1467 | 5 | 19,364 |
| 24 | mimo-v2.5-pro | Xiaomi | 1465 | 7 | 7,476 |
| 25 | qwen3.5-max-preview | Alibaba | 1465 | 5 | 15,533 |
Vendor Share of the Top 25
| Vendor | Models in Top 25 | Top Rank |
|---|---|---|
| Anthropic | 7 | #1 (Opus 4.6 Thinking, 1502) |
| OpenAI | 6 | #8 (GPT-5.5-high, 1484) |
| xAI (Grok) | 4 | #9 (Grok-4.20-beta1, 1479) |
| 3 | #6 (Gemini-3.1-Pro-Preview, 1490) | |
| Chinese vendors (Baidu, Z.ai, Xiaomi, Alibaba) | 4 | #16 (ERNIE-5.1, 1473) |
| Meta | 1 | #5 (Muse Spark, 1491 preliminary) |
Six Things the Rankings Tell You
- Anthropic holds 4 of the top 5 slots. Claude Opus 4.6 and Opus 4.7 (both regular and thinking variants) dominate the leaderboard top, with only Meta's preliminary Muse Spark breaking up the cluster. As of this snapshot, Anthropic is the human-preference leader by a margin not closely matched since Claude 3 Opus briefly led in 2024.
- Thinking-mode adds roughly 3-6 Elo. Pairing Claude Opus 4.6 vs 4.6-thinking (1498 vs 1502), Claude Opus 4.7 vs 4.7-thinking (1492 vs 1501), and GPT-5.5 vs GPT-5.5-high (1476 vs 1484) shows the pattern. The premium is small in absolute Elo but consistent across vendors that publish both variants.
- OpenAI has the most models in the top 25 but the lowest top rank. Six OpenAI entries (GPT-5.5-high, 5.5, 5.5-instant, 5.4-high, 5.4, 5.2-chat-latest) span ranks 8-23 but no GPT model breaks into the top 5. The Anthropic-OpenAI gap is roughly 10-20 Elo points at the top of each vendor's stack.
- Chinese vendors hold four top-25 slots and are gaining ground. Baidu ERNIE 5.1, Z.ai GLM 5.1, Xiaomi MiMo V2.5 Pro, and Alibaba Qwen 3.5 Max Preview occupy ranks 16, 19, 24, and 25. All four are clustered in the 1465-1473 range with overlapping confidence intervals. The frontier-vs-Chinese gap is roughly 30 Elo points (1502 vs 1473).
- Meta's Muse Spark surprises at #5 with preliminary votes. 1491 Elo with the "Preliminary" votes label means Meta is shipping a frontier-grade model under a new brand, distinct from the Llama family. Once vote count converges, the confidence interval will tighten and the position may shift, but the entry signal is material.
- Confidence intervals tighten dramatically with votes. Claude Opus 4.5 at 56,217 votes has ±3 Elo. GPT-5.5-instant at 4,927 votes has ±8 Elo. For new model launches that ship with fewer than 10K votes, treat the absolute Elo as provisional and watch vote count alongside rank when deciding whether a leaderboard move is real.
What This Means for AI Visibility
Arena Elo correlates with the model a consumer-facing assistant chooses to default users to, because vendors anchor pricing tiers, marketing claims, and default-routing decisions to leaderboard position. When a brand's mention rate shifts on Claude before it shifts on GPT, that often traces back to a leaderboard rank change that prompted user migration. Brands building AI visibility strategy should monitor the Arena leaderboard alongside their own platform-specific mention-rate tracking, because the leaderboard movement leads consumer behavior shifts by roughly 30-60 days.
Methodology
Rankings, Elo scores, confidence intervals, and vote counts pulled from arena.ai/leaderboard/text/ on May 14, 2026. arena.ai is the rebranded operational home of the original LMSYS Chatbot Arena, run by lmarena-ai (formerly the LMSYS organisation at UC Berkeley). Elo computed via Bradley-Terry pairwise model on roughly 6 million accumulated user votes. Confidence intervals reflect 95 percent bounds; "Preliminary" indicates that a model has not yet accumulated enough votes for a tight interval. Leaderboard refreshes weekly; treat this snapshot as accurate at time of capture and re-check arena.ai for the current state.
How Presenc AI Helps
Presenc AI tracks brand-mention rates across the major AI platforms whose underlying models are ranked above. The Arena leaderboard tells you which model wins the consumer's blind preference vote; Presenc AI tells you which brand wins inside those models' recommendations. When a model rises on the leaderboard, the brand-visibility outcomes on its hosted assistant typically follow within a quarter, which makes leaderboard-tracking a leading indicator for AI-visibility teams.