Research

Best Open-Weight Embedding Models 2026

Open-weight embedding model leaderboard 2026: Qwen3-Embedding, BGE-M3, E5-Mistral, Nomic, Stella, GTE-Qwen2, Linq-Embed. MTEB benchmarks, multilingual coverage, license analysis.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: May 2026

Embeddings are the foundation of every retrieval-augmented generation (RAG) system and the most-deployed AI primitive in 2026. The open-weight embedding model ecosystem matured rapidly in 2025-2026 with Qwen3-Embedding, BGE-M3, E5-Mistral, Nomic Embed, and Stella all approaching or exceeding the leading closed-source embedding APIs on the MTEB benchmark. This page consolidates the open-weight leaderboard, the multilingual coverage by model, and the practical deployment considerations.

Key Findings

  1. Qwen3-Embedding (released in early 2026) sits at the top of the MTEB v2 leaderboard among open-weight models, with the 8B variant scoring approximately 75 percent on average MTEB tasks.
  2. BGE-M3 from BAAI remains the most-downloaded open-weight embedding model on Hugging Face (cumulative downloads in the tens of millions) for its dense, sparse, and multi-vector retrieval in a single model.
  3. The most-used embedding model in production RAG deployments per the LangChain and LlamaIndex telemetry reports is BGE-M3, followed by Qwen3-Embedding, OpenAI text-embedding-3-large, and Voyage AI voyage-3.
  4. Multilingual coverage is now a baseline expectation: Qwen3-Embedding, BGE-M3, E5-Mistral, and Multilingual-E5 all support 100+ languages at competitive quality.
  5. License diversity is broad: Apache 2.0 (Nomic Embed v2, Stella, GTE-Qwen2), MIT (BGE-M3), custom commercial (Qwen3-Embedding), and CC-BY-NC (some Linq-Embed variants) cover most of the meaningful production deployments.

MTEB v2 Open-Weight Leaderboard (May 2026)

ModelParametersAvg MTEBLicense
Qwen3-Embedding-8B~8B~75.1Tongyi Qianwen (custom commercial)
Qwen3-Embedding-4B~4B~72.4Tongyi Qianwen
Qwen3-Embedding-0.6B~0.6B~67.8Tongyi Qianwen
NV-Embed-v2~7B~72.3CC-BY-NC
BGE-M3~0.6B~68.2MIT
BGE-Large-EN-v1.5~0.3B~64.2MIT
E5-Mistral-7B-Instruct~7B~66.6MIT
GTE-Qwen2-7B-Instruct~7B~70.2Apache 2.0
Stella-en-1.5B-v5~1.5B~69.4MIT
Nomic Embed Text v2~0.5B~63.8Apache 2.0
Linq-Embed-Mistral~7B~68.2CC-BY-NC
SFR-Embedding-Mistral~7B~67.6CC-BY-NC
jina-embeddings-v3~0.6B~65.5CC-BY-NC + Commercial
mxbai-embed-large-v1~0.3B~64.7Apache 2.0

Multilingual Coverage

ModelLanguages SupportedNotes
BGE-M3~100Dense + sparse + multi-vector in one model
Qwen3-Embedding~119Strong on Chinese, Asian languages
Multilingual-E5-Large~94Wide language base
jina-embeddings-v3~89Long-context multilingual
mxbai-embed-large-v1English onlyStrong English
NV-Embed-v2English-focusedHighest English MTEB
Stella-en-1.5B-v5English onlyCompact, strong English

Deployment Profile

Use CaseRecommended ModelReason
General-purpose multilingual RAGBGE-M3 or Qwen3-Embedding-8BStrong multilingual; commercial licensing options
English-only high-quality retrievalNV-Embed-v2 or Stella-en-1.5B-v5Top MTEB English scores
Resource-constrained edge / on-deviceBGE-M3 0.6B or Qwen3-Embedding-0.6BSub-billion parameters with strong quality
Long-context document chunksjina-embeddings-v3 or BGE-M38K and 8K context windows
Code embeddingsGTE-Qwen2-7B-Instruct, jina-embeddings-v3Strong on code retrieval benchmarks
Permissive commercial deploymentBGE-M3, mxbai-embed-large-v1MIT / Apache licenses
Maximum-quality unconstrainedQwen3-Embedding-8BHighest open MTEB scores

Open vs Closed Embedding API Comparison

ModelAvg MTEBCost (per million tokens)
OpenAI text-embedding-3-large~64.6$0.13
OpenAI text-embedding-3-small~62.3$0.02
Voyage AI voyage-3~74.0$0.06
Cohere embed-v3-english~64.5$0.10
Google Gemini text-embedding-005~67.7$0.025
Qwen3-Embedding-8B (self-hosted)~75.1~$0.005 effective
BGE-M3 (self-hosted)~68.2~$0.002 effective

Strategic Context

Three patterns define the 2026 embedding landscape. First, the open-weight quality gap effectively closed: Qwen3-Embedding-8B, NV-Embed-v2, and GTE-Qwen2-7B all match or exceed the leading closed embedding APIs on MTEB v2. Second, the economics overwhelmingly favour self-hosting at scale: a typical production RAG workload doing 100M tokens per day pays approximately $13,000 per month on OpenAI text-embedding-3-large versus approximately $500 per month on self-hosted BGE-M3 amortised across an L40S or H100. Third, multilingual coverage is no longer differentiating: BGE-M3, Qwen3-Embedding, jina-v3, and Multilingual-E5 all cover 90+ languages at competitive quality, so vendor selection in 2026 turns on operational fit (latency, batch throughput, license) more than language coverage.

Brand Visibility Implications

Embedding model selection is one of the highest-traffic procurement-research categories in AI engineering. AI assistants increasingly handle queries about "best embedding model for RAG", "BGE vs OpenAI embedding", "multilingual embeddings 2026", and similar long-tail terms that drive direct production decisions. Brands selling RAG infrastructure, vector databases, embedding fine-tuning services, and reranker stacks face strong AI-mediated discovery surface for this category.

Methodology

Benchmark scores compiled from the MTEB leaderboard, the MMTEB multilingual leaderboard, and primary model card disclosures through 23 May 2026. Cost estimates are list provider prices for closed APIs; self-hosted figures amortise GPU cost across realistic throughput assumptions. Updated quarterly.

How Presenc AI Helps

Presenc AI monitors brand visibility on embedding model queries across ChatGPT, Claude, Gemini, and Perplexity. For RAG infrastructure brands, vector database vendors, and embedding-fine-tuning service firms, the platform identifies the prompts driving procurement-research traffic and the gaps where new content unlocks share of voice.

Frequently Asked Questions

Qwen3-Embedding-8B leads the MTEB v2 leaderboard among open-weight models at approximately 75 percent average. BGE-M3 is the most-downloaded for its combination of dense, sparse, and multi-vector retrieval in a single model. NV-Embed-v2 is the strongest on English-only benchmarks.
On benchmark quality, the leading open-weight models (Qwen3-Embedding-8B, NV-Embed-v2) exceed OpenAI text-embedding-3-large by 7 to 10 points on MTEB v2. On operational deployment, OpenAI APIs are simpler; on cost at scale, self-hosted open weights are dramatically cheaper.
Qwen3-Embedding supports approximately 119 languages, BGE-M3 supports approximately 100, Multilingual-E5-Large supports approximately 94, and jina-embeddings-v3 supports approximately 89. All four cover the major commercial languages at competitive quality.
For a 100 million tokens per day production RAG workload, OpenAI text-embedding-3-large is approximately $13,000 per month. Self-hosted BGE-M3 amortised across an L40S or H100 is approximately $500 per month. The break-even point is roughly 5 to 10 million tokens per day.
For unrestricted commercial deployment without negotiation, MIT (BGE-M3) or Apache 2.0 (GTE-Qwen2, Nomic Embed v2, mxbai-embed-large-v1) are simplest. Qwen3-Embedding uses Tongyi Qianwen licence which permits commercial use but has scale and competitive-use restrictions. CC-BY-NC licences (NV-Embed-v2, Linq-Embed, SFR-Embedding) restrict commercial deployment.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.