Embeddings are the foundation of every retrieval-augmented generation (RAG) system and the most-deployed AI primitive in 2026. The open-weight embedding model ecosystem matured rapidly in 2025-2026 with Qwen3-Embedding, BGE-M3, E5-Mistral, Nomic Embed, and Stella all approaching or exceeding the leading closed-source embedding APIs on the MTEB benchmark. This page consolidates the open-weight leaderboard, the multilingual coverage by model, and the practical deployment considerations.
Key Findings
- Qwen3-Embedding (released in early 2026) sits at the top of the MTEB v2 leaderboard among open-weight models, with the 8B variant scoring approximately 75 percent on average MTEB tasks.
- BGE-M3 from BAAI remains the most-downloaded open-weight embedding model on Hugging Face (cumulative downloads in the tens of millions) for its dense, sparse, and multi-vector retrieval in a single model.
- The most-used embedding model in production RAG deployments per the LangChain and LlamaIndex telemetry reports is BGE-M3, followed by Qwen3-Embedding, OpenAI text-embedding-3-large, and Voyage AI voyage-3.
- Multilingual coverage is now a baseline expectation: Qwen3-Embedding, BGE-M3, E5-Mistral, and Multilingual-E5 all support 100+ languages at competitive quality.
- License diversity is broad: Apache 2.0 (Nomic Embed v2, Stella, GTE-Qwen2), MIT (BGE-M3), custom commercial (Qwen3-Embedding), and CC-BY-NC (some Linq-Embed variants) cover most of the meaningful production deployments.
MTEB v2 Open-Weight Leaderboard (May 2026)
| Model | Parameters | Avg MTEB | License |
|---|---|---|---|
| Qwen3-Embedding-8B | ~8B | ~75.1 | Tongyi Qianwen (custom commercial) |
| Qwen3-Embedding-4B | ~4B | ~72.4 | Tongyi Qianwen |
| Qwen3-Embedding-0.6B | ~0.6B | ~67.8 | Tongyi Qianwen |
| NV-Embed-v2 | ~7B | ~72.3 | CC-BY-NC |
| BGE-M3 | ~0.6B | ~68.2 | MIT |
| BGE-Large-EN-v1.5 | ~0.3B | ~64.2 | MIT |
| E5-Mistral-7B-Instruct | ~7B | ~66.6 | MIT |
| GTE-Qwen2-7B-Instruct | ~7B | ~70.2 | Apache 2.0 |
| Stella-en-1.5B-v5 | ~1.5B | ~69.4 | MIT |
| Nomic Embed Text v2 | ~0.5B | ~63.8 | Apache 2.0 |
| Linq-Embed-Mistral | ~7B | ~68.2 | CC-BY-NC |
| SFR-Embedding-Mistral | ~7B | ~67.6 | CC-BY-NC |
| jina-embeddings-v3 | ~0.6B | ~65.5 | CC-BY-NC + Commercial |
| mxbai-embed-large-v1 | ~0.3B | ~64.7 | Apache 2.0 |
Multilingual Coverage
| Model | Languages Supported | Notes |
|---|---|---|
| BGE-M3 | ~100 | Dense + sparse + multi-vector in one model |
| Qwen3-Embedding | ~119 | Strong on Chinese, Asian languages |
| Multilingual-E5-Large | ~94 | Wide language base |
| jina-embeddings-v3 | ~89 | Long-context multilingual |
| mxbai-embed-large-v1 | English only | Strong English |
| NV-Embed-v2 | English-focused | Highest English MTEB |
| Stella-en-1.5B-v5 | English only | Compact, strong English |
Deployment Profile
| Use Case | Recommended Model | Reason |
|---|---|---|
| General-purpose multilingual RAG | BGE-M3 or Qwen3-Embedding-8B | Strong multilingual; commercial licensing options |
| English-only high-quality retrieval | NV-Embed-v2 or Stella-en-1.5B-v5 | Top MTEB English scores |
| Resource-constrained edge / on-device | BGE-M3 0.6B or Qwen3-Embedding-0.6B | Sub-billion parameters with strong quality |
| Long-context document chunks | jina-embeddings-v3 or BGE-M3 | 8K and 8K context windows |
| Code embeddings | GTE-Qwen2-7B-Instruct, jina-embeddings-v3 | Strong on code retrieval benchmarks |
| Permissive commercial deployment | BGE-M3, mxbai-embed-large-v1 | MIT / Apache licenses |
| Maximum-quality unconstrained | Qwen3-Embedding-8B | Highest open MTEB scores |
Open vs Closed Embedding API Comparison
| Model | Avg MTEB | Cost (per million tokens) |
|---|---|---|
| OpenAI text-embedding-3-large | ~64.6 | $0.13 |
| OpenAI text-embedding-3-small | ~62.3 | $0.02 |
| Voyage AI voyage-3 | ~74.0 | $0.06 |
| Cohere embed-v3-english | ~64.5 | $0.10 |
| Google Gemini text-embedding-005 | ~67.7 | $0.025 |
| Qwen3-Embedding-8B (self-hosted) | ~75.1 | ~$0.005 effective |
| BGE-M3 (self-hosted) | ~68.2 | ~$0.002 effective |
Strategic Context
Three patterns define the 2026 embedding landscape. First, the open-weight quality gap effectively closed: Qwen3-Embedding-8B, NV-Embed-v2, and GTE-Qwen2-7B all match or exceed the leading closed embedding APIs on MTEB v2. Second, the economics overwhelmingly favour self-hosting at scale: a typical production RAG workload doing 100M tokens per day pays approximately $13,000 per month on OpenAI text-embedding-3-large versus approximately $500 per month on self-hosted BGE-M3 amortised across an L40S or H100. Third, multilingual coverage is no longer differentiating: BGE-M3, Qwen3-Embedding, jina-v3, and Multilingual-E5 all cover 90+ languages at competitive quality, so vendor selection in 2026 turns on operational fit (latency, batch throughput, license) more than language coverage.
Brand Visibility Implications
Embedding model selection is one of the highest-traffic procurement-research categories in AI engineering. AI assistants increasingly handle queries about "best embedding model for RAG", "BGE vs OpenAI embedding", "multilingual embeddings 2026", and similar long-tail terms that drive direct production decisions. Brands selling RAG infrastructure, vector databases, embedding fine-tuning services, and reranker stacks face strong AI-mediated discovery surface for this category.
Methodology
Benchmark scores compiled from the MTEB leaderboard, the MMTEB multilingual leaderboard, and primary model card disclosures through 23 May 2026. Cost estimates are list provider prices for closed APIs; self-hosted figures amortise GPU cost across realistic throughput assumptions. Updated quarterly.
How Presenc AI Helps
Presenc AI monitors brand visibility on embedding model queries across ChatGPT, Claude, Gemini, and Perplexity. For RAG infrastructure brands, vector database vendors, and embedding-fine-tuning service firms, the platform identifies the prompts driving procurement-research traffic and the gaps where new content unlocks share of voice.