What Released in Late April Through May 2026
This page tracks every frontier and significant open-weight large language model released or announced from late April 2026 through mid-May 2026, the window covering the latest active release cycle. May 2026 itself has been unusually quiet at the frontier compared to the rapid cadence of January through April; the late-April releases are still the active state of the leaderboard. Page refreshes monthly with the next-month update folded in.
Release Timeline (Late April Through May 2026)
| Date | Model | Vendor | Channel | Headline Capability |
|---|---|---|---|---|
| 2026-04-23 | GPT-5.5 (general availability) | OpenAI | Plus, Pro, Business, Enterprise + API | Frontier reasoning; leads Terminal-Bench 2.0 at 82.7 percent |
| 2026-04 (mid-month) | Claude Opus 4.7 | Anthropic | claude.ai + API | Leads SWE-bench Pro at 64.3 percent; new tokenizer producing ~35 percent more tokens for same input |
| 2026-04 | Mistral Medium 3.5 | Mistral | La Plateforme + API | Mid-tier model at $1.50 input / $7.50 output per 1M tokens |
| 2026-04 | DeepSeek-V4 (Flash + Pro) | DeepSeek | DeepSeek API | 1M-token context; V4-Flash at $0.14 input / $0.28 output, the price-floor leader |
| 2026-04 to 05 | grok-4.3 / grok-4.20 (reasoning + non-reasoning + multi-agent) | xAI | x.ai API + Grok product | Up to 2M context; multi-agent variant published mid-cycle |
| 2026-04 to 05 | Muse Spark (preliminary on Arena) | Meta | Preview | Debuted in Chatbot Arena top 5 at 1491 Elo with preliminary vote count |
| 2026-05 (early) | (No new frontier launches) | — | — | First quiet month at the frontier since February 2026 |
Benchmark Leadership Snapshot (May 14, 2026)
| Benchmark | Leader | Score |
|---|---|---|
| SWE-bench Pro (complex coding) | Claude Opus 4.7 | 64.3% |
| Terminal-Bench 2.0 (agentic terminal) | GPT-5.5 | 82.7% |
| GPQA Diamond (scientific reasoning) | Gemini 3.1 Pro | 94.3% |
| LMSYS Chatbot Arena Elo (human preference) | Claude Opus 4.6 Thinking | 1502 Elo |
| Cost per 1M output tokens (frontier-capable) | DeepSeek V4-Flash | $0.28 |
| Largest published context window | Llama 4 Scout (older release) | 10M tokens |
| Largest active flagship context | grok-4.20 | 2M tokens |
Pricing of the Active Frontier (May 14, 2026)
| Model | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|
| GPT-5.5 | $5.00 | $30.00 | ~270K |
| Claude Opus 4.7 | $5.00 | $25.00 | 200K |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M flat |
| Gemini 3.1 Pro Preview | $2.00-$4.00 | $12.00-$18.00 | 1M |
| grok-4.3 | $1.25 | $2.50 | 1M |
| Mistral Medium 3.5 | $1.50 | $7.50 | 128K |
| DeepSeek V4-Flash | $0.14 | $0.28 | 1M |
Detailed cross-vendor pricing comparison in our LLM API Pricing Comparison May 2026.
Five Trends From the Late-April Release Cycle
- The frontier is split, not centralised. Five distinct vendors now lead at least one major benchmark: Anthropic (SWE-bench Pro), OpenAI (Terminal-Bench), Google (GPQA Diamond), xAI (largest active context), and DeepSeek (cost). No single model leads across all four standard categories.
- Claude Opus 4.7's new tokenizer changes cost math. The Opus 4.7 tokenizer produces approximately 35 percent more tokens for the same input text versus the previous Opus tokenizer, which means published per-token pricing understates true cost per word for English content. Effective cost comparisons require normalising for tokenizer efficiency, not headline rate-card numbers.
- DeepSeek V4 reset the price floor. V4-Flash at $0.14 input / $0.28 output with a 1M context is roughly 50x cheaper than GPT-5.5 input. The cache-hit pricing on V4-Flash falls to $0.0028 per million input tokens, functionally free for retrieval-heavy workloads with stable prompts.
- May 2026 broke the release cadence. January, February, March, and April all featured at least one frontier-class launch. May has so far featured none. The market may be settling into a quarterly rather than monthly cadence as model differentiation tightens.
- Meta is reentering the frontier conversation. Muse Spark debuted with a preliminary 1491 Elo on the Chatbot Arena, breaking Anthropic and Google's grip on the top of the leaderboard. The full release with hardened vote counts will likely arrive in May or June and would mark Meta's first frontier-grade entry distinct from the Llama family.
What This Means for AI Visibility
Each release shifts the model that brands surface in front of. A brand that ranked well on GPT-5.4 may rank differently on GPT-5.5 because of changed reasoning patterns; a brand that ranked well on Claude Opus 4.6 may rank differently on 4.7 due to the new tokenizer's effect on retrieval-augmented prompt construction. Brand-visibility programmes should re-test mention rates after every flagship release in their target platforms. The Anthropic and OpenAI flagship cadence specifically tends to ship with retraining or refreshed RLHF that changes brand recall patterns.
Methodology
Release dates, capabilities, and benchmark scores aggregated on May 14, 2026 from vendor announcement posts, the LLM Stats updates feed, the LM Council benchmark tracker, and the LMSYS Chatbot Arena Elo leaderboard. Pricing from vendor pricing pages (see linked comparison page for sources). Page refreshes monthly with the next-month roundup; archive at slug-level rather than re-using the URL so quarter-by-quarter comparisons remain possible.
How Presenc AI Helps
Presenc AI monitors brand-mention rates across each platform whose model is listed above. After every flagship release, we re-baseline brand visibility on the new model within 48 hours and surface mention-rate deltas so brand teams know which platforms shifted and which held steady. For brand-visibility programmes targeting the active frontier, this is the operational signal that connects release news to recommendation-rate outcomes.