Research

Open-Weight Long-Context Models 2026

Open-weight long-context LLMs 2026: Qwen3 1M context, Llama 4 Scout 10M, Jamba 256k, MiniMax M1 1M. Long-context benchmarks, recall accuracy, deployment cost, hardware requirements.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: May 2026

Long-context open-weight LLMs reached the 1 million plus token regime in 2026. Llama 4 Scout shipped with a 10 million token effective context window. Qwen3 long-context variants reach 1 million tokens. MiniMax M1 (open weight) reaches 1 million tokens. Jamba 1.5 Large remains a strong production choice at 256k. The retrieval and reasoning quality across the full window remains the binding evaluation metric. This page consolidates the long-context model landscape.

Key Findings

  1. Llama 4 Scout (109B MoE / 17B active) ships with a 10 million token effective context window, the largest in the open-weight ecosystem as of May 2026.
  2. MiniMax M1 (456B MoE / ~45B active) released open-weight in 2025 reaches 1 million tokens with strong recall and reasoning at long range.
  3. Qwen3 long-context variants and Qwen2.5 1M Context all reach 1 million tokens with quality degradation that is competitive with frontier closed alternatives.
  4. Jamba 1.5 Large (Transformer + Mamba MoE hybrid) achieves 256k tokens with sub-linear memory scaling, making it the most-deployed long-context model in production.
  5. Long-context quality is uneven: most models that claim 1M+ token contexts have measurable degradation in retrieval and reasoning quality beyond 128k to 256k effective context.

Open-Weight Long-Context Models (May 2026)

ModelContext WindowArchitectureLicense
Llama 4 Scout10M tokensMoE Transformer (109B/17B active)Llama 4 Community
Llama 4 Maverick1M tokensMoE Transformer (400B/17B active)Llama 4 Community
Qwen2.5 1M Context1M tokensTransformerApache 2.0 / Tongyi
Qwen3 long-context variants1M tokensTransformerApache 2.0 / Tongyi
MiniMax M11M tokensLightning Attention + MoE (456B/~45B)Apache 2.0
Jamba 1.5 Large256K tokensTransformer + Mamba MoE hybridJamba Community Licence
Jamba 1.5 Mini256K tokensTransformer + Mamba MoE hybridJamba Community Licence
Mistral Large 3256K tokensTransformerMistral Research / Commercial
GLM-4-9B-1M / GLM-4.5-9B-1M1M tokensTransformerMIT / GLM Licence
InternLM-2.5-7B-1M1M tokensTransformerApache 2.0
Yi-200K family200K tokensTransformerApache 2.0
DeepSeek V3 / V4128K tokensMoE TransformerMIT
Mamba 2 (Hybrid)1M+ tokensState-space + attention hybridApache 2.0

Long-Context Quality (RULER Benchmark at Effective Context)

ModelEffective Context (RULER score above 85)
Llama 4 Scout~256K tokens (despite 10M claim)
Llama 4 Maverick~512K tokens
Qwen2.5 1M Context~128K tokens
MiniMax M1~256K tokens
Jamba 1.5 Large~140K tokens (within 256K claim)
Mistral Large 3~128K tokens
Mamba 2 Hybrid~256K tokens
Gemini 2.5 Pro (reference closed)~1M+ tokens (~95 RULER)
GPT-5.5 (reference closed)~256K tokens

Hardware Requirements for Long-Context Inference

Context LengthVRAM Requirement (FP16, Llama 3.1 70B-class)
32K~140 GB (KV cache ~10 GB)
128K~180 GB (KV cache ~40 GB)
256K~220 GB (KV cache ~80 GB)
512K~300 GB (KV cache ~160 GB)
1M~460 GB (KV cache ~320 GB)
10M (Llama 4 Scout)Multi-node required

Hybrid attention (Jamba) and state-space (Mamba 2) architectures have sub-linear KV cache growth, materially reducing long-context VRAM requirements.

Brand Visibility Implications

Long-context AI is a fast-growing procurement category. AI assistant queries about "1 million token LLM", "long-context AI", "open-source long context", and similar terms drive direct production decisions for codebase analysis, long-document understanding, and agentic workloads. Brands selling AI infrastructure, long-document processing, and agentic platforms face strong AI-mediated discovery surface for this category.

Methodology

Benchmark data compiled from RULER long-context evaluations, primary model card disclosures, and the long-context-specific benchmark publications through 23 May 2026. Updated quarterly.

How Presenc AI Helps

Presenc AI monitors brand visibility on long-context AI queries across ChatGPT, Claude, Gemini, and Perplexity. For AI infrastructure brands, long-document processing vendors, and agentic platform firms, the platform identifies the prompts driving procurement-research traffic and the gaps where new content unlocks share of voice.

Frequently Asked Questions

Llama 4 Scout with a 10 million token effective context window claim. The effective high-quality context per RULER benchmark is closer to 256K. Llama 4 Maverick (1M claim) and MiniMax M1 (1M claim) are alternatives in the 1M class with effective context around 256K to 512K.
Cautiously. Headline context windows reflect maximum input length, not quality across that window. RULER and Needle-in-a-Haystack benchmarks measure effective context where retrieval and reasoning still work. Most models with 1M+ claims have measurable quality degradation beyond approximately 128K to 256K effective context.
For Llama 3.1 70B-class models at 1M tokens in FP16, approximately 460 GB VRAM total (KV cache alone is approximately 320 GB). Hybrid architectures (Jamba, Mamba 2) have sub-linear KV cache growth, reducing the requirement substantially. Quantization (FP8 or INT4 KV cache) can reduce by another 2-4x.
Long-context is better when retrieval boundaries are unknown, for code refactoring across an entire repo, or for long-document synthesis where chunking introduces context loss. RAG is more cost-effective when retrieval can target specific document sections. The 2026 pattern increasingly mixes both: RAG for first-pass retrieval, long-context for downstream reasoning over the retrieved set.
Jamba 1.5 Large (Transformer + Mamba MoE hybrid) is the most-deployed production long-context model in the 256K class with sub-linear memory scaling. Mamba 2 hybrids reach 1M+ tokens with similar memory advantages. The hybrid approach is the dominant architectural choice for long-context-focused models.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.