Long-context open-weight LLMs reached the 1 million plus token regime in 2026. Llama 4 Scout shipped with a 10 million token effective context window. Qwen3 long-context variants reach 1 million tokens. MiniMax M1 (open weight) reaches 1 million tokens. Jamba 1.5 Large remains a strong production choice at 256k. The retrieval and reasoning quality across the full window remains the binding evaluation metric. This page consolidates the long-context model landscape.
Key Findings
- Llama 4 Scout (109B MoE / 17B active) ships with a 10 million token effective context window, the largest in the open-weight ecosystem as of May 2026.
- MiniMax M1 (456B MoE / ~45B active) released open-weight in 2025 reaches 1 million tokens with strong recall and reasoning at long range.
- Qwen3 long-context variants and Qwen2.5 1M Context all reach 1 million tokens with quality degradation that is competitive with frontier closed alternatives.
- Jamba 1.5 Large (Transformer + Mamba MoE hybrid) achieves 256k tokens with sub-linear memory scaling, making it the most-deployed long-context model in production.
- Long-context quality is uneven: most models that claim 1M+ token contexts have measurable degradation in retrieval and reasoning quality beyond 128k to 256k effective context.
Open-Weight Long-Context Models (May 2026)
| Model | Context Window | Architecture | License |
|---|---|---|---|
| Llama 4 Scout | 10M tokens | MoE Transformer (109B/17B active) | Llama 4 Community |
| Llama 4 Maverick | 1M tokens | MoE Transformer (400B/17B active) | Llama 4 Community |
| Qwen2.5 1M Context | 1M tokens | Transformer | Apache 2.0 / Tongyi |
| Qwen3 long-context variants | 1M tokens | Transformer | Apache 2.0 / Tongyi |
| MiniMax M1 | 1M tokens | Lightning Attention + MoE (456B/~45B) | Apache 2.0 |
| Jamba 1.5 Large | 256K tokens | Transformer + Mamba MoE hybrid | Jamba Community Licence |
| Jamba 1.5 Mini | 256K tokens | Transformer + Mamba MoE hybrid | Jamba Community Licence |
| Mistral Large 3 | 256K tokens | Transformer | Mistral Research / Commercial |
| GLM-4-9B-1M / GLM-4.5-9B-1M | 1M tokens | Transformer | MIT / GLM Licence |
| InternLM-2.5-7B-1M | 1M tokens | Transformer | Apache 2.0 |
| Yi-200K family | 200K tokens | Transformer | Apache 2.0 |
| DeepSeek V3 / V4 | 128K tokens | MoE Transformer | MIT |
| Mamba 2 (Hybrid) | 1M+ tokens | State-space + attention hybrid | Apache 2.0 |
Long-Context Quality (RULER Benchmark at Effective Context)
| Model | Effective Context (RULER score above 85) |
|---|---|
| Llama 4 Scout | ~256K tokens (despite 10M claim) |
| Llama 4 Maverick | ~512K tokens |
| Qwen2.5 1M Context | ~128K tokens |
| MiniMax M1 | ~256K tokens |
| Jamba 1.5 Large | ~140K tokens (within 256K claim) |
| Mistral Large 3 | ~128K tokens |
| Mamba 2 Hybrid | ~256K tokens |
| Gemini 2.5 Pro (reference closed) | ~1M+ tokens (~95 RULER) |
| GPT-5.5 (reference closed) | ~256K tokens |
Hardware Requirements for Long-Context Inference
| Context Length | VRAM Requirement (FP16, Llama 3.1 70B-class) |
|---|---|
| 32K | ~140 GB (KV cache ~10 GB) |
| 128K | ~180 GB (KV cache ~40 GB) |
| 256K | ~220 GB (KV cache ~80 GB) |
| 512K | ~300 GB (KV cache ~160 GB) |
| 1M | ~460 GB (KV cache ~320 GB) |
| 10M (Llama 4 Scout) | Multi-node required |
Hybrid attention (Jamba) and state-space (Mamba 2) architectures have sub-linear KV cache growth, materially reducing long-context VRAM requirements.
Brand Visibility Implications
Long-context AI is a fast-growing procurement category. AI assistant queries about "1 million token LLM", "long-context AI", "open-source long context", and similar terms drive direct production decisions for codebase analysis, long-document understanding, and agentic workloads. Brands selling AI infrastructure, long-document processing, and agentic platforms face strong AI-mediated discovery surface for this category.
Methodology
Benchmark data compiled from RULER long-context evaluations, primary model card disclosures, and the long-context-specific benchmark publications through 23 May 2026. Updated quarterly.
How Presenc AI Helps
Presenc AI monitors brand visibility on long-context AI queries across ChatGPT, Claude, Gemini, and Perplexity. For AI infrastructure brands, long-document processing vendors, and agentic platform firms, the platform identifies the prompts driving procurement-research traffic and the gaps where new content unlocks share of voice.