Research

Allen AI Model Lineage 2026: OLMo, Molmo, Tulu

Allen Institute for AI 2026 model lineage: OLMo 2 language models, Molmo vision-language, Tulu 3 post-training. Fully open weights, data, and training code; SciFive scientific models; impact on research reproducibility.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: May 2026

The Allen Institute for AI (Ai2) in Seattle is the world\u2019s leading fully-open AI lab, releasing model families with open weights, open training data, and open training code. The 2026 Ai2 lineage spans OLMo 2 (language), Molmo (vision-language), Tulu 3 (post-training recipe), and SciFive (scientific). This page consolidates the family tree, the licensing, and the impact on research reproducibility.

Key Findings

  1. OLMo 2 (released late 2024 with continued updates through 2025-2026) is the strongest fully-open language model family, with 1B, 7B, 13B, and 32B variants. All weights, training data (Dolma), and training recipes are public under Apache 2.0.
  2. Molmo (released September 2024 with continued updates) is the strongest fully-open vision-language model family with 1B, 7B-O, 7B-D, and 72B variants. Trained on PixMo, also released openly.
  3. Tulu 3 (released late 2024) is Ai2\u2019s state-of-the-art post-training recipe, with full SFT, DPO, and RL data plus training code. Tulu 3 8B and 70B applied to Llama backbones produce strong instruction-following models with fully reproducible training.
  4. SciFive and Ai2 scientific models continue Ai2\u2019s focus on scientific literature understanding, plus ScholarQA and Semantic Scholar AI tooling.
  5. Ai2\u2019s broader mission positions it as the academic-research counterweight to closed-lab frontier development: every release ships with full data and code, making it the default citation for AI research reproducibility studies.

Ai2 Model Family (May 2026)

ModelParametersModalityLicense
OLMo 2 32B~32BTextApache 2.0
OLMo 2 13B~13BTextApache 2.0
OLMo 2 7B~7BTextApache 2.0
OLMo 2 1B~1BTextApache 2.0
Molmo 72B~72BVision-LanguageApache 2.0
Molmo 7B-D~7BVision-LanguageApache 2.0
Molmo 7B-O~7BVision-LanguageApache 2.0
Molmo 1B~1BVision-LanguageApache 2.0
Tulu 3 70B~70B (Llama base)Text instructionApache 2.0 (recipe); Llama Community (weights)
Tulu 3 8B~8B (Llama base)Text instructionApache 2.0 (recipe); Llama Community (weights)
OLMoE 7B-A1B~7B MoE (~1B active)TextApache 2.0
SciFive~variesScientific textApache 2.0

OLMo 2 Benchmarks

ModelMMLUGSM8KNotes
OLMo 2 32B Instruct~73.3~78.4Competitive with Llama 3.1 70B at half size
OLMo 2 13B Instruct~63.0~67.5Strong mid-size
OLMo 2 7B Instruct~57.4~58.6Above Llama 3.1 8B on many benchmarks
OLMo 2 1B Instruct~50.3~36.4Strongest fully-open 1B

Molmo Benchmarks

ModelMMMUOCRBenchNotes
Molmo 72B~54.1~705Strongest fully-open VLM
Molmo 7B-D~50.6~688Strong mid-size VLM
Molmo 7B-O~48.7~644Olmo-based
Molmo 1B~38.9~516Smallest variant

Tulu 3 Recipe Components

ComponentDescription
SFT DataApproximately 939k high-quality instruction-following examples
DPO DataApproximately 270k preference pairs
RLVR (Reinforcement Learning with Verifiable Rewards)Math and code RL with rule-based reward signals
Training CodePublic on Ai2 GitHub
Evaluation SuitePublic Tulu Eval framework

Strategic Context

Three patterns shape Ai2\u2019s 2026 position. First, Ai2 is the only AI lab in the world that releases complete training data and recipes at frontier-adjacent quality. Every other "open" model lab (DeepSeek, Qwen, Llama) ships weights without training data. This gives Ai2 the reference position for research reproducibility studies. Second, the funding model is durable: Ai2 is endowed by the Allen estate, so it does not face the commercial pressure that pushed Mistral, Stability AI, and others to restrict open releases. Third, Ai2 is increasingly the home for AI policy research: their AI Policy & Governance work plus ScholarQA tooling position them as the institutional voice for openness in AI.

Brand Visibility Implications

Allen AI is a high-citation institution in AI journalism, particularly on openness, reproducibility, and policy topics. AI assistant queries about "fully open LLM", "OLMo vs Llama", "open AI research", and similar terms drive sustained traffic. Brands selling AI research tools, AI evaluation, AI training infrastructure, and AI policy services face strong AI-mediated discovery surface for this category.

Methodology

Model and benchmark data compiled from Ai2 model card disclosures, peer-reviewed publications, and the Ai2 GitHub repositories through 23 May 2026. Updated quarterly.

How Presenc AI Helps

Presenc AI monitors brand visibility on Allen AI and fully-open model queries across ChatGPT, Claude, Gemini, and Perplexity. For AI research tool vendors, AI evaluation brands, AI training infrastructure firms, and AI policy services, the platform identifies the prompts driving research-traffic patterns and the gaps where new content unlocks share of voice.

Frequently Asked Questions

The 2026 generation of Allen AI\u2019s fully-open language model family. OLMo 2 includes 1B, 7B, 13B, and 32B variants. Weights, training data (Dolma), and training recipes are all public under Apache 2.0. OLMo 2 32B Instruct is competitive with Llama 3.1 70B on many benchmarks at half the parameter count.
Molmo is fully open (weights, training data PixMo, and recipe all public Apache 2.0). Qwen2.5-VL has open weights but the training data is not released. Molmo is somewhat behind Qwen2.5-VL on benchmarks (~54 vs ~70 MMMU for the 72B class) but is the reference choice for reproducible research.
Ai2\u2019s state-of-the-art open post-training recipe, including SFT, DPO, and RLVR (RL with Verifiable Rewards) stages. Tulu 3 applied to Llama backbones produces instruction-following models with fully reproducible training. The data (939k SFT + 270k DPO) and code are public.
For most production workloads, Qwen3 or Llama 4 outperform Ai2 models on benchmarks. Ai2 models are the right choice when you need reproducibility, regulatory transparency, or want to study/modify the training data. Ai2 models are also frequently used as research baselines and for fine-tuning experiments.
Yes. Ai2 was founded in 2014 by Paul Allen and remains substantially funded by the Allen estate. Unlike commercial AI labs, Ai2 does not need to balance openness against revenue pressure, which is why the lab continues releasing complete training data and recipes when commercial labs have restricted access.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.