Research

LLM and Foundation Model Releases, May 2026

Every large language model and foundation model released or announced in the late-April through May 2026 window. GPT-5.5, Claude Opus 4.7, Gemini 3.1 Pro, DeepSeek V4, Grok 4.20, and the benchmark leadership picture.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: May 2026

What Released in Late April Through May 2026

This page tracks every frontier and significant open-weight large language model released or announced from late April 2026 through mid-May 2026, the window covering the latest active release cycle. May 2026 itself has been unusually quiet at the frontier compared to the rapid cadence of January through April; the late-April releases are still the active state of the leaderboard. Page refreshes monthly with the next-month update folded in.

Release Timeline (Late April Through May 2026)

DateModelVendorChannelHeadline Capability
2026-04-23GPT-5.5 (general availability)OpenAIPlus, Pro, Business, Enterprise + APIFrontier reasoning; leads Terminal-Bench 2.0 at 82.7 percent
2026-04 (mid-month)Claude Opus 4.7Anthropicclaude.ai + APILeads SWE-bench Pro at 64.3 percent; new tokenizer producing ~35 percent more tokens for same input
2026-04Mistral Medium 3.5MistralLa Plateforme + APIMid-tier model at $1.50 input / $7.50 output per 1M tokens
2026-04DeepSeek-V4 (Flash + Pro)DeepSeekDeepSeek API1M-token context; V4-Flash at $0.14 input / $0.28 output, the price-floor leader
2026-04 to 05grok-4.3 / grok-4.20 (reasoning + non-reasoning + multi-agent)xAIx.ai API + Grok productUp to 2M context; multi-agent variant published mid-cycle
2026-04 to 05Muse Spark (preliminary on Arena)MetaPreviewDebuted in Chatbot Arena top 5 at 1491 Elo with preliminary vote count
2026-05 (early)(No new frontier launches)First quiet month at the frontier since February 2026

Benchmark Leadership Snapshot (May 14, 2026)

BenchmarkLeaderScore
SWE-bench Pro (complex coding)Claude Opus 4.764.3%
Terminal-Bench 2.0 (agentic terminal)GPT-5.582.7%
GPQA Diamond (scientific reasoning)Gemini 3.1 Pro94.3%
LMSYS Chatbot Arena Elo (human preference)Claude Opus 4.6 Thinking1502 Elo
Cost per 1M output tokens (frontier-capable)DeepSeek V4-Flash$0.28
Largest published context windowLlama 4 Scout (older release)10M tokens
Largest active flagship contextgrok-4.202M tokens

Pricing of the Active Frontier (May 14, 2026)

ModelInput ($/1M)Output ($/1M)Context
GPT-5.5$5.00$30.00~270K
Claude Opus 4.7$5.00$25.00200K
Claude Sonnet 4.6$3.00$15.001M flat
Gemini 3.1 Pro Preview$2.00-$4.00$12.00-$18.001M
grok-4.3$1.25$2.501M
Mistral Medium 3.5$1.50$7.50128K
DeepSeek V4-Flash$0.14$0.281M

Detailed cross-vendor pricing comparison in our LLM API Pricing Comparison May 2026.

Five Trends From the Late-April Release Cycle

  1. The frontier is split, not centralised. Five distinct vendors now lead at least one major benchmark: Anthropic (SWE-bench Pro), OpenAI (Terminal-Bench), Google (GPQA Diamond), xAI (largest active context), and DeepSeek (cost). No single model leads across all four standard categories.
  2. Claude Opus 4.7's new tokenizer changes cost math. The Opus 4.7 tokenizer produces approximately 35 percent more tokens for the same input text versus the previous Opus tokenizer, which means published per-token pricing understates true cost per word for English content. Effective cost comparisons require normalising for tokenizer efficiency, not headline rate-card numbers.
  3. DeepSeek V4 reset the price floor. V4-Flash at $0.14 input / $0.28 output with a 1M context is roughly 50x cheaper than GPT-5.5 input. The cache-hit pricing on V4-Flash falls to $0.0028 per million input tokens, functionally free for retrieval-heavy workloads with stable prompts.
  4. May 2026 broke the release cadence. January, February, March, and April all featured at least one frontier-class launch. May has so far featured none. The market may be settling into a quarterly rather than monthly cadence as model differentiation tightens.
  5. Meta is reentering the frontier conversation. Muse Spark debuted with a preliminary 1491 Elo on the Chatbot Arena, breaking Anthropic and Google's grip on the top of the leaderboard. The full release with hardened vote counts will likely arrive in May or June and would mark Meta's first frontier-grade entry distinct from the Llama family.

What This Means for AI Visibility

Each release shifts the model that brands surface in front of. A brand that ranked well on GPT-5.4 may rank differently on GPT-5.5 because of changed reasoning patterns; a brand that ranked well on Claude Opus 4.6 may rank differently on 4.7 due to the new tokenizer's effect on retrieval-augmented prompt construction. Brand-visibility programmes should re-test mention rates after every flagship release in their target platforms. The Anthropic and OpenAI flagship cadence specifically tends to ship with retraining or refreshed RLHF that changes brand recall patterns.

Methodology

Release dates, capabilities, and benchmark scores aggregated on May 14, 2026 from vendor announcement posts, the LLM Stats updates feed, the LM Council benchmark tracker, and the LMSYS Chatbot Arena Elo leaderboard. Pricing from vendor pricing pages (see linked comparison page for sources). Page refreshes monthly with the next-month roundup; archive at slug-level rather than re-using the URL so quarter-by-quarter comparisons remain possible.

How Presenc AI Helps

Presenc AI monitors brand-mention rates across each platform whose model is listed above. After every flagship release, we re-baseline brand visibility on the new model within 48 hours and surface mention-rate deltas so brand teams know which platforms shifted and which held steady. For brand-visibility programmes targeting the active frontier, this is the operational signal that connects release news to recommendation-rate outcomes.

Frequently Asked Questions

As of May 14, 2026, no new frontier LLM has launched within May itself; the most recent frontier-class releases were in late April. GPT-5.5 (OpenAI, April 23), Claude Opus 4.7 (Anthropic, mid-April), Mistral Medium 3.5, DeepSeek V4 Flash and Pro, and several xAI Grok variants are the active frontier. Meta Muse Spark appeared in preliminary Arena rankings during the late-April/early-May window.
Claude Opus 4.7 leads SWE-bench Pro at 64.3 percent (the standard complex-coding benchmark for the May 2026 era). GPT-5.5 leads Terminal-Bench 2.0 at 82.7 percent (the agentic-terminal-workflow benchmark). The two leaderships are complementary, with Claude Opus stronger on cold-start code synthesis and GPT-5.5 stronger on multi-turn agent loops.
DeepSeek V4-Flash at $0.14 input / $0.28 output per million tokens with a 1M-token context window. It is roughly 50x cheaper on input than GPT-5.5 and 18x cheaper on output. Cache-hit pricing falls to $0.0028 per million input tokens, the lowest cache rate in the market.
Most vendors are between major launches. OpenAI shipped GPT-5.5 in April; Anthropic shipped Opus 4.7 in mid-April; Google shipped Gemini 3.1 Pro in late February. Each typically waits roughly one quarter between flagship launches. The May quiet is consistent with the underlying release cadence rather than a structural slowdown.
Monthly, on the second week of each month, when the prior month's releases have stabilised. Each refresh adds the new month's launches and updates benchmark scores. We archive prior monthly snapshots at separate slugs so quarter-by-quarter comparisons remain possible.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.