
When Liquid AI, a startup fIt was supported by MIT computer scientists in 2023introduced Its Liquid Foundation Model Series 2 (LFM2) in July 2025the pitch was straightforward: deliver the fastest on-device foundation models on the market using new "liquid" Architecture, training, and unique performance that make small models a serious alternative to cloud-only large language models (LLMs) such as OpenAI’s GPT series and Google’s Gemini.
Initial releases shipped dense checkpoints at 350 m, 700 m, and 1.2b parameters, a hybrid architecture weighted toward gated short convolutions, and benchmark numbers that put LFM2 ahead of similarly sized competitors such as Quen3, Llama 3.2, and GEMMA3 on both quality and CPU throughput. The message to businesses was clear: real-time, privacy-preserving AI, phones, laptops, and cars no longer need to sacrifice capability for latency.
In the months since that launch, Liquid has expanded LFM2 into a broader product line—including task- and domain-specific variants, a small video injection and analysis model, and an edge-based deployment stack called LEAP—and positioned the models as a control layer for on-device and on-prem agentic systems.
Now, with Publication of a detailed, 51-page LFM2 technical report on Arxiothe company is going one step further: the process of exploring the architecture behind these models, the combination of training data, the objective of Asawan, the curriculum strategy, and the post-training pipeline.
And unlike earlier open models, LFM2 is built around a repeatable instruction: a loop search process in hardware, a training curriculum that compensates for a small parameter budget, and a post-training pipeline for following and instrument usage instruction.
Rather than simply offering weights and an API, Liquid is effectively publishing a detailed blueprint that other organizations can use as a reference to train their own small, efficient models from scratch, tailored to their own hardware and deployment constraints.
A model family designed around real constraints, not GPU labs
The technical report begins with a foundation businesses are intimately familiar with: real AI systems hit many of the first benchmarks. Latency budgets, peak memory ceilings, and thermal throttling define what can actually run in production—especially on laptops, tablets, commodity servers, and mobile devices.
To identify this, Liquid AI looked directly at the architecture on target hardware, including Snapdragon mobile SoCs and Ryzen laptop CPUs. The result is a consistent result of size: a minimal hybrid architecture that dominates Gated short convolution blocks And a small number Grouping Attention (GQA) Layers This design was repeatedly chosen over more exotic linear participation and SSM hybrids because it provided a better quality latency-memory Pareto profile under real device conditions.
This matters to enterprise teams in three ways:
forecast The architecture is simple, parameter efficient and stable in model sizes from 350m to 2.6b.
Operational portability. The dense and MOE variants share the same structural backbone, which simplifies deployment in mixed hardware fleets.
On-device feasibility. Prefill and decode throughput on the CPU exceeds comparable open models by about 22%, in many cases, reducing the need to offload routine tasks to cloud inference endpoints.
Rather than improving educational innovation, the report notes that models are a systematic effort to design enterprises. actually ship.
This is notable and more practical for enterprises in a field where many open models silently access multi-H100 clusters during evaluation.
A training pipeline for enterprise-specific behaviors
LFM2 adopts a training approach that compensates for the small scale of its model by structure rather than brute force. Key elements include:
10–12t token pre-training And an extra one 32K-context intermediate training phasewhich expands the useful contextual window of the model without exploding the computational costs.
a Aim of Double-Top-K knowledge Standard KL addresses the instability of authentication when teachers provide only partial logins.
a Continuity of training after phase threes SFT, length-normalized preference alignment, and model integration—which are designed to produce more reliable inferences on post- and tool-use behavior.
For enterprise AI developers, the value is that LFM2 models behave less like “little LLMs” and more like practical agents capable of executing structured formats, adhering to JSON schemas, and managing multi-turn chat flows. Many open models of similar size fail not because of a lack of reasoning ability, but because of a breakable restriction on instruction templates. The LFM2 post-training instruction directly targets these rough edges.
In other words: Liquid AI improved the miniature model Operational reliabilitynot just the scoreboard.
Polygon designed for device constraints, not a lab demo
The LFM2-VL and LFM2-audio variants reflect another change: built around multimodality. Token performance.
Instead of embedding a large-scale vision transformer directly into the LLM, the LFM2-VL connects a SIGLP2 encoder via a connector that aggressively reduces the visual token count by pixelon shuffle. High-resolution inputs automatically trigger dynamic tiling, keeping even token budgets under control on mobile hardware. LFM2-Audio uses a two-way audio path.
For enterprise platform architects, this design points to a functional future where:
Document comprehension occurs directly at endpoints such as field devices.
Audio transcription and speech agents run locally for privacy compliance.
Multimodal agents operate in fixed-latency envelopes without streaming data off-device.
The throughline is the same: multimodal capability without the need for a GPU form factor.
Retrieval models designed for agent systems, not legacy searches
LFM2-Colbert extends late-interaction retrieval to a footprint large enough for enterprise deployments to require multilingual rigs without the overhead of small vector DB accelerators.
This is especially meaningful as organizations begin to orchestrate fleets of agents. Fast local retrieval – delivered on the same hardware as the reasoning model – reduces latency and provides a governance win: documents never leave the device boundary.
Taken together, VL, Audio, and Colbert’s variations show LFM2 as a modular system, not a model drop.
An emerging blueprint for hybrid enterprise AI architectures
In all variations, the LFM2 report clearly outlines what tomorrow’s enterprise AI stack will look like: Hybrid Local Cloud Orchestrationwhere small, fast models running on devices handle time-critical feedback, formatting, device request, and decision tasks, while large models in the cloud perform heavyweight reasoning when needed.
Several trends converge here:
Cost control. Routine evaluation avoids unpredictable cloud billing locally.
Determination of delay. TTFT and decode stability matter in agent workflows. On-device eliminates network jitter.
Governance and Compliance. Local implementation simplifies PII handling, data residency and auditability.
flexibility Agentic systems are gracefully compromised if the path to the cloud is not available.
Enterprises adopting these architectures will likely treat smaller on-device models as the “control plane” of agent workflows, with larger cloud models serving as on-demand accelerators.
LFM2 is one of the clearest open source foundations for this control layer to date.
Strategic Takeaway: On-device AI is now a design choice, not a compromise
For years, organizations building AI features have accepted that “real AI” requires cloud computing. LFM2 challenges this assumption. Models perform competitively in reasoning, following instructions, multilingual tasks, and ragging.
For CIOs and CTOs to finalize the 2026 roadmap, this directly means: Small, open, on-device models are now robust enough to carry meaningful chunks of production workloads.
LFM2 will not replace frontier cloud models for frontier-scale reasoning. But it offers something that businesses arguably need: a reproducible, open, and actionable foundation. Agentic systems that run anywherefrom phones to industrial endpoints to secure air-powered facilities.
In the broader landscape of enterprise AI, LFM2 is less a research milestone and more symbolic of an architectural transformation. The future isn’t the cloud or the edge – it’s the two working in concert. And releases like LFM2 provide the building blocks for organizations designed to build this hybrid future rather than by accident.