Research

Open-Weight Multilingual LLMs 2026

Open-weight multilingual LLMs in 2026: Aya Expanse, SEA-LION 3, AfroLLM, Sabia, Indic-LLM, Latxa, FinGPT. Language coverage, regional adoption, sovereign AI alignment.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: May 2026

Multilingual open-weight LLMs covering non-English languages reached production utility for most major languages in 2026. Aya Expanse, SEA-LION 3, AfroLLM, Sabia, Indic-LLM (and various Indian-language LLMs), Latxa (Basque), FinGPT (Finnish), and dozens of language-specific finetunes cover most commercially-significant languages outside English. This page consolidates the landscape.

Key Findings

  1. Aya Expanse from Cohere remains the leading general-purpose multilingual open-weight family, with strong coverage across 23 languages including Arabic, Hindi, Chinese, Japanese, Korean, and the major European languages.
  2. SEA-LION 3 from AI Singapore is the leading model for Southeast Asian languages (Bahasa Indonesia, Thai, Vietnamese, Tagalog, Tamil-MY, Burmese, Khmer, Lao, Mandarin-SG).
  3. Indic-LLM, OpenHathi, BharatGPT, and various Indian-language fine-tunes cover the major Indic languages (Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Punjabi).
  4. AfroLLM and Lelapa AI\u2019s InkubaLM cover African languages, an under-served segment historically; quality is improving but lags major-language models.
  5. Sovereign AI alignment is driving multilingual investment: UAE Falcon-Arabic, Saudi ALLaM, India Hindi-focused models, Vietnam VinaLLaMA, France Lucie (CroissantLLM successor) are all government-supported or sovereign-aligned multilingual programmes.

Major Open-Weight Multilingual LLMs (May 2026)

ModelLanguagesParametersLicense
Aya Expanse 32B23 languages~32BCC-BY-NC + Commercial
Aya Expanse 8B23 languages~8BCC-BY-NC + Commercial
Aya-101 13B101 languages (research)~13BApache 2.0
Qwen2.5 / Qwen3 (multilingual)~119 languages0.5B-235BApache 2.0 / Tongyi
SEA-LION 311 SEA languages3B, 7B, 27BMIT
SeaLLM (Sea-LION precursor)9 SEA languagesvariesApache 2.0
InkubaLM (Lelapa AI)5 African languages~0.4BApache 2.0
AfroLLM22 African languagesvariesApache 2.0
OpenHathi (Sarvam)Hindi~7BApache 2.0
BharatGPT14 Indian languagesvariesVarious
Sarvam-1 / Sarvam-2BIndic languages~2BApache 2.0
Sabia 3 (Portuguese)Brazilian Portuguese~variesMaritaca-licence
Latxa (Basque)Basque~7B, 70BLlama 3 Community
FinGPT 7B (Finnish)Finnish~7BApache 2.0
Falcon-Arabic 11BArabic~11BApache 2.0
Jais 70B (Arabic)Arabic and English~70BApache 2.0
ALLaM (Saudi)ArabicvariesSDAIA
VinaLLaMA (Vietnamese)Vietnamese~7BApache 2.0
Lucie (France)French and European~7BApache 2.0

Language Coverage by Region

RegionLeading Open-Weight Options
Arabic and MENAAya Expanse, Falcon-Arabic, Jais, ALLaM, Qwen3
Indic languagesOpenHathi, Sarvam-1, BharatGPT, Indic-LLM, Qwen3
Southeast AsiaSEA-LION 3, SeaLLM, VinaLLaMA, Qwen3
African languagesAfroLLM, InkubaLM (limited coverage)
European languagesAya Expanse, Mistral family, Lucie, Latxa, Qwen3
Brazilian PortugueseSabia 3, Aya Expanse, Llama 3.x
ChineseQwen3, ChatGLM, Yi, InternLM, Baichuan, Hunyuan
JapaneseQwen3, Llama 3.x with JP finetunes, Karakuri, ELYZA
KoreanSOLAR, Qwen3, Llama 3.x with Korean finetunes, KULLM

Strategic Context

Three patterns shape the 2026 multilingual landscape. First, large general models (Qwen3, Aya Expanse, Llama 4) cover the major commercial languages well; specialised regional models retain advantage for under-resourced languages and cultural-context-heavy workloads. Second, sovereign AI investment is driving regional model development: UAE, Saudi Arabia, India, Vietnam, France, and others have government-supported multilingual LLM programmes. Third, the language gap remains real for under-resourced African and South Asian languages where pretraining data scarcity limits model quality.

Brand Visibility Implications

Multilingual AI is a fast-growing procurement category for global brands. AI assistant queries about "Arabic LLM", "Indic language AI", "Southeast Asia AI", and similar terms drive procurement-research traffic from regional brands and global product teams. Brands selling localised AI products, multilingual customer service AI, and translation services face strong AI-mediated discovery surface for this category.

Methodology

Model data compiled from primary Hugging Face model card disclosures, AI Singapore, Cohere For AI, and regional sovereign AI programme announcements through 23 May 2026. Updated quarterly.

How Presenc AI Helps

Presenc AI monitors brand visibility on multilingual AI queries across ChatGPT, Claude, Gemini, and Perplexity. For localised AI product vendors, multilingual customer service AI brands, and translation services, the platform identifies the prompts driving procurement-research traffic and the gaps where new content unlocks share of voice.

Frequently Asked Questions

For broadest language coverage with a single model, Qwen3 (covering approximately 119 languages) or Aya Expanse (23 languages with strong quality each). For specific regions, the specialised models (SEA-LION 3 for SEA, Falcon-Arabic for Arabic, OpenHathi for Hindi) often outperform general models on regional benchmarks.
Aya Expanse covers 23 languages with strong quality per language. Qwen3 covers approximately 119 languages with strong quality on the major commercial languages plus reasonable coverage of the long tail. Aya Expanse is better for deep multilingual deployments in its specific 23 languages; Qwen3 is better for broader language coverage and as a general-purpose multilingual model.
Improving but limited. InkubaLM from Lelapa AI covers 5 African languages, AfroLLM covers approximately 22, but quality lags major-language models because pretraining data is scarce. Sovereign AI investment is increasing for African language coverage; Senegal\u2019s Pulaar initiative and South Africa\u2019s broader language programmes are emerging.
UAE Falcon-Arabic and Jais (Arabic), Saudi ALLaM (Arabic), Vietnam VinaLLaMA, France Lucie (French and European), Spain Salamandra (multilingual European), India BharatGPT and Sarvam (Indic) are the leading sovereign-aligned multilingual programmes. Each is government-supported with specific cultural-context, regulatory, or sovereign-cloud deployment requirements.
For most languages Qwen3 native multilingual is strong enough that fine-tuning Llama is unnecessary. For under-resourced languages or culturally-sensitive workloads, language-specific finetunes can be valuable; otherwise the general multilingual capability of Qwen3, Aya Expanse, or Llama 4 covers most production deployments.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.