What new LLMs were released in May 2026?

As of May 14, 2026, no new frontier LLM has launched within May itself; the most recent frontier-class releases were in late April. GPT-5.5 (OpenAI, April 23), Claude Opus 4.7 (Anthropic, mid-April), Mistral Medium 3.5, DeepSeek V4 Flash and Pro, and several xAI Grok variants are the active frontier. Meta Muse Spark appeared in preliminary Arena rankings during the late-April/early-May window.

Which LLM leads on coding in May 2026?

Claude Opus 4.7 leads SWE-bench Pro at 64.3 percent (the standard complex-coding benchmark for the May 2026 era). GPT-5.5 leads Terminal-Bench 2.0 at 82.7 percent (the agentic-terminal-workflow benchmark). The two leaderships are complementary, with Claude Opus stronger on cold-start code synthesis and GPT-5.5 stronger on multi-turn agent loops.

Which LLM is cheapest at frontier-class quality?

DeepSeek V4-Flash at $0.14 input / $0.28 output per million tokens with a 1M-token context window. It is roughly 50x cheaper on input than GPT-5.5 and 18x cheaper on output. Cache-hit pricing falls to $0.0028 per million input tokens, the lowest cache rate in the market.

Why did the May 2026 release cadence slow?

Most vendors are between major launches. OpenAI shipped GPT-5.5 in April; Anthropic shipped Opus 4.7 in mid-April; Google shipped Gemini 3.1 Pro in late February. Each typically waits roughly one quarter between flagship launches. The May quiet is consistent with the underlying release cadence rather than a structural slowdown.

How often does this page update?

Monthly, on the second week of each month, when the prior month's releases have stabilised. Each refresh adds the new month's launches and updates benchmark scores. We archive prior monthly snapshots at separate slugs so quarter-by-quarter comparisons remain possible.

LLM and Foundation Model Releases May 2026

What Released in Late April Through May 2026

This page tracks every frontier and significant open-weight large language model released or announced from late April 2026 through mid-May 2026, the window covering the latest active release cycle. May 2026 itself has been unusually quiet at the frontier compared to the rapid cadence of January through April; the late-April releases are still the active state of the leaderboard. Page refreshes monthly with the next-month update folded in.

Release Timeline (Late April Through May 2026)

Date	Model	Vendor	Channel	Headline Capability
2026-04-23	GPT-5.5 (general availability)	OpenAI	Plus, Pro, Business, Enterprise + API	Frontier reasoning; leads Terminal-Bench 2.0 at 82.7 percent
2026-04 (mid-month)	Claude Opus 4.7	Anthropic	claude.ai + API	Leads SWE-bench Pro at 64.3 percent; new tokenizer producing ~35 percent more tokens for same input
2026-04	Mistral Medium 3.5	Mistral	La Plateforme + API	Mid-tier model at $1.50 input / $7.50 output per 1M tokens
2026-04	DeepSeek-V4 (Flash + Pro)	DeepSeek	DeepSeek API	1M-token context; V4-Flash at $0.14 input / $0.28 output, the price-floor leader
2026-04 to 05	grok-4.3 / grok-4.20 (reasoning + non-reasoning + multi-agent)	xAI	x.ai API + Grok product	Up to 2M context; multi-agent variant published mid-cycle
2026-04 to 05	Muse Spark (preliminary on Arena)	Meta	Preview	Debuted in Chatbot Arena top 5 at 1491 Elo with preliminary vote count
2026-05 (early)	(No new frontier launches)	—	—	First quiet month at the frontier since February 2026

Benchmark Leadership Snapshot (May 14, 2026)

Benchmark	Leader	Score
SWE-bench Pro (complex coding)	Claude Opus 4.7	64.3%
Terminal-Bench 2.0 (agentic terminal)	GPT-5.5	82.7%
GPQA Diamond (scientific reasoning)	Gemini 3.1 Pro	94.3%
LMSYS Chatbot Arena Elo (human preference)	Claude Opus 4.6 Thinking	1502 Elo
Cost per 1M output tokens (frontier-capable)	DeepSeek V4-Flash	$0.28
Largest published context window	Llama 4 Scout (older release)	10M tokens
Largest active flagship context	grok-4.20	2M tokens

Pricing of the Active Frontier (May 14, 2026)

Model	Input ($/1M)	Output ($/1M)	Context
GPT-5.5	$5.00	$30.00	~270K
Claude Opus 4.7	$5.00	$25.00	200K
Claude Sonnet 4.6	$3.00	$15.00	1M flat
Gemini 3.1 Pro Preview	$2.00-$4.00	$12.00-$18.00	1M
grok-4.3	$1.25	$2.50	1M
Mistral Medium 3.5	$1.50	$7.50	128K
DeepSeek V4-Flash	$0.14	$0.28	1M

Detailed cross-vendor pricing comparison in our LLM API Pricing Comparison May 2026.

Five Trends From the Late-April Release Cycle

The frontier is split, not centralised. Five distinct vendors now lead at least one major benchmark: Anthropic (SWE-bench Pro), OpenAI (Terminal-Bench), Google (GPQA Diamond), xAI (largest active context), and DeepSeek (cost). No single model leads across all four standard categories.
Claude Opus 4.7's new tokenizer changes cost math. The Opus 4.7 tokenizer produces approximately 35 percent more tokens for the same input text versus the previous Opus tokenizer, which means published per-token pricing understates true cost per word for English content. Effective cost comparisons require normalising for tokenizer efficiency, not headline rate-card numbers.
DeepSeek V4 reset the price floor. V4-Flash at $0.14 input / $0.28 output with a 1M context is roughly 50x cheaper than GPT-5.5 input. The cache-hit pricing on V4-Flash falls to $0.0028 per million input tokens, functionally free for retrieval-heavy workloads with stable prompts.
May 2026 broke the release cadence. January, February, March, and April all featured at least one frontier-class launch. May has so far featured none. The market may be settling into a quarterly rather than monthly cadence as model differentiation tightens.
Meta is reentering the frontier conversation. Muse Spark debuted with a preliminary 1491 Elo on the Chatbot Arena, breaking Anthropic and Google's grip on the top of the leaderboard. The full release with hardened vote counts will likely arrive in May or June and would mark Meta's first frontier-grade entry distinct from the Llama family.

What This Means for AI Visibility

Each release shifts the model that brands surface in front of. A brand that ranked well on GPT-5.4 may rank differently on GPT-5.5 because of changed reasoning patterns; a brand that ranked well on Claude Opus 4.6 may rank differently on 4.7 due to the new tokenizer's effect on retrieval-augmented prompt construction. Brand-visibility programmes should re-test mention rates after every flagship release in their target platforms. The Anthropic and OpenAI flagship cadence specifically tends to ship with retraining or refreshed RLHF that changes brand recall patterns.

Methodology

Release dates, capabilities, and benchmark scores aggregated on May 14, 2026 from vendor announcement posts, the LLM Stats updates feed, the LM Council benchmark tracker, and the LMSYS Chatbot Arena Elo leaderboard. Pricing from vendor pricing pages (see linked comparison page for sources). Page refreshes monthly with the next-month roundup; archive at slug-level rather than re-using the URL so quarter-by-quarter comparisons remain possible.

How Presenc AI Helps

Presenc AI monitors brand-mention rates across each platform whose model is listed above. After every flagship release, we re-baseline brand visibility on the new model within 48 hours and surface mention-rate deltas so brand teams know which platforms shifted and which held steady. For brand-visibility programmes targeting the active frontier, this is the operational signal that connects release news to recommendation-rate outcomes.

LLM and Foundation Model Releases, May 2026