Research

SWE-bench Verified Leaderboard June 2026

SWE-bench Verified leaderboard for June 2026. Claude Opus 4.7 and Mythos 5 lead the closed frontier; DeepSeek V4.1 and Qwen 3.7 close the open-weight gap.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: June 2026

SWE-bench Verified is the most-cited real-world coding benchmark for frontier LLMs, measuring resolved-issue rate on a curated set of GitHub issues from popular Python repositories. This page snapshots the public leaderboard as of June 2026.

June 2026 Leaderboard

Rank	Model	Vendor	SWE-bench Verified %
1	Claude Mythos 5	Anthropic	~78%
2	Claude Opus 4.7	Anthropic	~75%
3	GPT-5.6 Pro	OpenAI	~73%
4	GPT-5.6	OpenAI	~70%
5	DeepSeek V4.1 Pro	DeepSeek	~69%
6	Claude Sonnet 4.6	Anthropic	~68%
7	Qwen 3.7	Alibaba	~66%
8	Gemini 3.2 Pro	Google	~65%
9	DeepSeek V4.1 Flash	DeepSeek	~63%
10	GLM-6	Zhipu AI	~58%
11	Llama 4.5 Maverick	Meta	~55%
12	Mistral Large 3	Mistral AI	~52%

Key Takeaways

Claude Mythos 5 GA in June 2026 took the top spot from Claude Opus 4.7.
Open-weight DeepSeek V4.1 Pro sits within ~6 points of frontier closed-model performance.
Qwen 3.7 leads the Chinese frontier set on coding evaluations.
The gap between top closed and top open-weight has narrowed to single digits in 12 months.

Methodology

Scores compiled from vendor disclosures, the public SWE-bench Verified leaderboard at swebench.com, and third-party replication where available. Numbers expressed as ranges or rounded values; treat as directional pending independent verification. Updated monthly.

How Presenc AI Helps

Presenc AI tracks how frontier coding capability shifts shape brand visibility inside developer tools and self-hosted enterprise deployments where these models get embedded.

Frequently Asked Questions

A coding benchmark measuring resolved-issue rate on a curated set of real GitHub issues from popular Python repositories. Verified means human-validated as solvable.

Claude Mythos 5 from Anthropic at approximately 78%, narrowly ahead of Claude Opus 4.7 and GPT-5.6 Pro.

Within roughly 6 percentage points as of June 2026. DeepSeek V4.1 Pro at approximately 69% sits close to GPT-5.6 base and behind only the top three closed models.

Material reorderings happen roughly every six to eight weeks as new frontier models ship. Smaller updates from fine-tunes and inference improvements happen weekly.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.