What this is
Chinese open-source coding models dominate the open-weight coding benchmark leaderboard in 2026, with multiple Chinese models outperforming Western open-weight alternatives. This page is a 2026-05-15 head-to-head on the top Chinese coding model lines.
Top Chinese Coding Models (2026)
| Model | SWE-Bench Verified | SWE-Bench Pro | HumanEval | Context | License |
|---|---|---|---|---|---|
| DeepSeek V4-Pro | 83.7% (leader) | ~55% | 90% | 1M | MIT |
| Kimi K2.6 | ~76% | 58.6 (open-weight leader) | ~87% | 262K | Open weights |
| Qwen3-Coder-Plus | ~74% | ~52% | ~86% | 128K | Apache 2.0 |
| Yi-Coder (9B) | ~38% (size-adjusted) | n/a | ~78% | 128K | Apache-style open |
| GLM-4.7 (coding) | ~67% | ~48% | ~82% | 128K+ | Open weights |
| Doubao Seed 2.0 | ~65% | ~46% | ~80% | 256K | Proprietary |
Specialisation by Task
| Task | Best pick |
|---|---|
| Pure SWE-Bench Verified (autonomous code agent) | DeepSeek V4-Pro |
| SWE-Bench Pro (harder agentic tasks) | Kimi K2.6 |
| Lowest cost per useful query | DeepSeek V4 ($0.14/M) |
| Smallest model that handles real coding (under 10B) | Yi-Coder 9B or Qwen3-Coder 7B |
| Multimodal coding (vision + code) | Qwen3-VL or Seed 2.0 |
| On-prem with Cambricon hardware | GLM-4.7 |
| Permissive licence (no MAU cap) | DeepSeek V4 (MIT) or Qwen3-Coder (Apache 2.0) |
| Long-context refactor across a monorepo | DeepSeek V4 (1M) or Kimi K2.6 (262K) |
Six Things the Comparison Tells You
- DeepSeek V4-Pro leads SWE-Bench Verified at 83.7%. Beats every open-weight competitor and closes in on Claude Opus 4.6.
- Kimi K2.6 leads SWE-Bench Pro at 58.6. First open-weight model to beat GPT-5.4 (xhigh) at 57.7. The benchmarks are different cuts; both are leaders in their lane.
- Qwen3-Coder rounds out the top three. Apache 2.0 friendliest licence among the top open-weight coding models.
- Yi-Coder remains relevant at the small-model tier. 9B parameters with 128K context still useful for on-device and embedded code workflows.
- Chinese coding models displaced Llama-based fine-tunes on the open leaderboard. The 2024 era of "Code Llama" derivatives is over.
- Cost is the silent differentiator. DeepSeek V4 at ~$0.14/M input via open hosters undercuts every proprietary coding API by 5-20x.
What This Means for AI Visibility
Chinese coding models increasingly power the under-the-hood inference for AI coding tools, particularly outside the US frontier-vendor ecosystem. Brand-visibility teams for developer tool vendors should test how their products appear inside Cursor, Cline, Continue, and Aider when those tools route to DeepSeek V4, Kimi K2.6, or Qwen3-Coder — the answers can diverge meaningfully from Claude Code or GPT-5.4 outputs.
Methodology
Benchmarks combine Spheron's DeepSeek vs Llama 4 vs Qwen 3 production comparison, AkitaOnRails LLM coding benchmark May 2026, BenchLM's best Chinese LLMs 2026, and Latent Space on Kimi K2.6 SWE-Bench Pro.
How Presenc AI Helps
Presenc AI tracks how dev-tool brands appear inside AI coding workflows backed by Chinese coding models. As DeepSeek V4 and Kimi K2.6 absorb open-weight coding share, brand teams need monitoring across these surfaces alongside Claude Code and Copilot defaults.