
For businesses come from the latest increase in small model wave AI21 LabsWhich is conditioned that bringing models into devices will free the traffic in data centers.
AI21’s jamba 3B, a “small” open source model that can run extended reasoning, code generation and respond based on ground truth. The reasoning reasoning handles more than 3B 250,000 tokens and can be estimated on the Edge devices.
The company said that the Jamba works on devices like 3B laptops and mobile phones.
AI11 co -CEO, Ori Goshan, told Venture Bat that the company looks more enterprise use issues for smaller models, the main reason is that transferring most indicators to devices is liberated by data centers.
“What we are now seeing in the industry is a economics problem where the data center is very expensive, and the revenue from data centers compared to the deprivation rate of all their chips shows that mathematics does not increase.”
He added that in the future, “the industry will be hybrid in the sense that some calculations will be locally on devices and other diagnosis will be transmitted to GPU.”
Tested on a Macbook
The Jubilee Argument 3B connects architecture and transformers to allow it to operate 250k token window on the equipment. AI21 said it could speed up 2-4x high-speed estimates. Goshan said that Mamba architecture contributed significantly in the speed of the model.
Jamba’s argument 3B’s hybrid architecture also allows it to reduce memory needs, and thus reduce its computing requirements.
AI21 tested the model on a standard MacBook Pro and found that it could take action at 35 tokens per second.
Goshan said the model works best for calling, policy -based generation and toll rooting works. He said that easy requests, such as seeking information about the next meeting and demanding a model to make an agenda for it, can be made on equipment. More complex reasoning for GPU clusters can be preserved.
Small models in the enterprise
Businesses have been interested in using a mixture of small models, some of which are specifically designed for their industry and some that are a thick version of LLM.
In September, Method Raded Moby Lim-R1, a family of reasoning models From 140m to 950M parameters. These models are designed for math, coding and scientific reasoning rather than chat applications. Moby Lilm-R1 can run on Computer Prosted Devices.
Google‘ Gym One of the first small models to come to the market was designed to run on portable devices such as laptops and mobile phones. JEMA is since Has been extended.
Companies like Ficu He has also started construction of his models. Fico launched Its fico -focused language and FICO have arranged small models that will only answer questions related to finance.
Goshan said the biggest difference offered by his model is that it is smaller than most models and still can work for reasoning without sacrifice.
Benchmark testing
In the benchmark testing, Jamba reasoning 3B performed stronger than other small models, including Kevin 4 b, MethodLama of 3.2b-3B, and Phi-4-Mini Microsoft.
It improved all the models of the Bench Test and the final examination of humanity, though it came in second in the MMLU Pro at Queen 4.
Another advantage of small models like Jamba Argument 3B is that they are extremely faster and provide better privacy options to businesses because the server is not sent anywhere else, Goshan said.
“I believe that there is a world where you can improve customer’s needs and experiences, and the models that will be placed on devices are a huge part of it,” he said.