AI2’s new Olmo 3.1 extends reinforcement learning training for robust inference criteria

by SkillAiNest

AI2’s new Olmo 3.1 extends reinforcement learning training for robust inference criteria

The Allen Institute for AI (AI2) recently released what it calls its most powerful yet. Yet another family of models, Olmo 3. But the company kept iterating on the models, increasing its reinforcement learning (RL) runs, to create Olmo 3.1.

The new Olmo 3.1 models focus on efficiency, transparency and control for enterprises.

AI2 updated two of the three versions of Olmo 2: Olmo 3.1 Think 32B, the flagship model optimized for advanced research, and Olmo 3.1 Instruction 32B, designed for instruction following, multi-turn dialog, and tool use.

Olmo 3 is the third version, Olmo 3 Base for Programming, Comprehension and Mathematics. It also works well to continue fine tuning.

To upgrade Olmo 3 to Think 32B to Olmo 3.1, its researchers extended their best RL runs with a longer training schedule, AI2 said.

“After the original Olmo 3 launch, we restarted our RL training run for the Olmo 3 32B Think, with an additional 21 days of training on 224 GPUs with additional runs on our Dulce Think RL dataset,” AI2 said in A2. Blog post. “This led to the Olmo 3.1 32B, which achieves substantial gains in math, reasoning, and instruction-following benchmarks: 5+ points improvement on AIME, 4+ points on CyberLogic, 4+ points on IFBENCH, and 20+ points on IFBench, along with strong performance on coding and complex multistep tasks.”

To achieve the Olmo 3.1 instruction, AI2 said its researchers used the recipe on the larger model behind the smaller instruction size, 7B.

Olmo 3.1 is directive 32b "AI 2 is optimized for chat, tool use, and multi-turn dialog, all in one. Post to x.

For now, the new checkpoints are available on AI2 Playground or Hug Face, with API access coming soon.

Better performance on benchmarks

The Olmo 3.1 models performed well on benchmark tests, predating the Olmo 3 models.

Think Olmo 3.1 outperforms the QWEN 3 32B model in the AIME 2025 benchmark and performs close to the Gemma 27B.

The Olmo 3.1 instruction performed strongly against its open-source peers, even beating models like Gemma 3 on math benchmarks.

“About the Olmo 3.1 32b instruction, it is a large-scale instruction-toned model built for chat, tool use, and multi-turn dialog. The Olmo 3.1 32B instruction is our most capable fully open chat model to date and—in our assessment—the most robust fully open 32B scale instruction,” the company said.

AI2 also upgraded its RL Zero 7B models for math and coding. Both models benefit from longer and more stable training, the company said on X.

Commitment to transparency and open source

AI2 previously told VentureBeat that it designed the Olmo 3 family of models to offer enterprises and research labs greater control and understanding of the data and training that go into models.

Organizations can add to the model’s data mix and retrain it to learn from what has been added.

This has long been the promise for AI2, which also offers one A tool called Olmotrix It shows how well the LLM output matches its training data.

“Together, Olmo 3.1 Think 32B and Olmo 3.1 Instruct 32B show that openness and efficiency can go hand in hand. By extending the same model flow, we continue to improve capabilities while maintaining end-to-end transparency over data, code and training decisions,” AI2 said.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro