Bolmo’s architecture unlocks efficient byte-level LM training without sacrificing quality

by SkillAiNest December 15, 2025

written by SkillAiNest December 15, 2025

Enterprises that want a tokenizer-free multilingual model are moving to a byte-level language model to reduce fragility in noisy or low-resource text. To tap this niche – and make it practical at scale – the Allen Institute of AI (AI2) introduced Bolmo.a new family of models that take advantage of this Olmo 3 model By “bitifying” them and repurposing their backbone and abilities.

The company launched two versions, Bolmo 7b and Bolmo 1b, which are “the first fully open byte-level language model,” According to AI2. Both models performed competitively—and in some cases outperformed—other byte-level and character-based models, the company said.

Byte-level language models operate directly on RAW UTF-8 bytes, eliminating the need for predefined words or tokenizers. This allows them to handle misspellings, rare languages and unconventional texts more reliably.

For enterprises deploying AI in multiple languages, noisy user inputs, or constrained environments, tokenizers offer a way to reduce model-free operational complexity. AI2’s Bolmo is an attempt to implement this approach at scale – without training from scratch.

How Bolmo Works and How It Was Made

AI2 said it trained Bolmo models using its Dolma 3 data mix, which helped train it. Olmo flagship modeland some open code datasets and character level data.

The company said its goal is to “provide a reproducible, testable blueprint for bitifying robust sub-vocabulary language models in a way that the community can adopt and scale.” To accomplish this goal, AI2 will issue its own checkpoints, code, and A whole paper To help other organizations build byte-level models on top of its Olmo ecosystem.

Because training a completely byte-level model from scratch can be expensive, the AI2 researchers instead chose the existing Olmo 37B checkpoint to byteify in two steps.

In the first phase, AI2 froze Olmo 3 transformer so that they train only some parts, such as local encoder and decoder, boundary predictor, and language modeling head. It was designed to be “cheap and fast” and only required 9.8 billion tokens.

The next step denoises the model and trains it on additional tokens. AI2 said the byte-level approach allows Bolmo to avoid the word constraints that limit traditional subword models.

Strong performance among its peers

Byte-level language models are not as mainstream as small language models or LLMs, but are a growing field of research. Meta released its BLT architecture Research over the past year, aimed at presenting a model that is robust, processes raw data, and does not rely on fixed terms.

Other research models in this space Add BYT5for , for , for , . MRT 5 of Stanfordand Canine.

AI2 evaluated Bolmo using its assessment suite, which covered math, STEM reasoning, question answering, general knowledge, and code.

Bolmo 7B performed strongly, outperforming character-based benchmarks such as beauty and execution, and even improved accuracy over the base LLM Olmo 3.

Bolmo 7B outperformed models of comparable size in coding, math, multiple-choice QA, and character-level understanding.

Why Enterprises May Choose Byte-Level Models

Enterprises find value in hybrid model structures using a mix of models and model sizes.

AI2 makes the case that organizations should consider byte-level models not only for robustness and multilingual understanding, but because it “plugs naturally into existing model ecosystems.”

“One of the main benefits of a dynamic rating setup is that compression becomes a toggleable knob,” the company said.

For enterprises already running heterogeneous model stacks, Bolmo suggests that byte-level models may no longer be purely academic. By redeveloping a robust sub-word model rather than training it from scratch, AI2 is signaling a lower-risk path for organizations that want robustness without abandoning existing infrastructure.

Editor's pick

Get latest news

Bolmo’s architecture unlocks efficient byte-level LM training without sacrificing quality

How Bolmo Works and How It Was Made

Strong performance among its peers

Why Enterprises May Choose Byte-Level Models

Korean AI Startup Motif Reveals 4 Big Lessons From Training Enterprise LLM

How to perform secure hashing using Python’s hashlab module

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news