Enterprise leaders have joined a reliable program for nearly two decades. VB transform brings people to develop real enterprise AI strategies. Get more information
French Aye Darling Mistress is releasing a new release this summer.
A few days after announcing a misconception calculation of your domestic AI-optimized cloud service, a well-financing company has a Its 24b parameter issued a refreshing to the Open Source Model Mistress SmallJump from 3.1 release to 3.2-24B Instruction -25506.
The new version is directly developed at the wrong small 3.1, which aims to improve specific behavior such as instructions, output stability, and function calling strengthening. Although there is no change in the overall architectural details, this update introduces targeted refining revenue that affects both internal diagnosis and public standards.
According to Mr AI, the small 3.2 is better in following the exact instructions and reduces the chances of infinite or repeated generations.
Similarly, especially in a framework such as VLM, the function calling template has been upgraded to support the use of more reliable tool.
And at the same time, it can run on setup with a single Nvidia A100/H100 80GB GPU, which can open faster options for strict computing resources and/or budget business.
Just a latest model after 3 months
In March 2025, Mr. Small 3.1 was announced as a flagship open release in the 24B parameter range. It offered full multi -modal capabilities, multi -linguistic understanding, and long contexts up to 128K tokens.
The model was clearly positioned against proprietary colleagues like GPT-4O OO Money, Cloud 3.5 Hyco, and JEMA 3-this-and, according to Mistal, left them behind in many tasks.
Small 3.1 also emphasized the effective deployment, which claims to run 150 tokens per second and help the use of on -device with 32GB RAM.
The release has come up with both base and instruction checkpoints, offering fine toning flexibility in legal, medical and technical fields such as domains.
On the contrary, the small 3.2 concentration and reliability focuses on surgical improvement. The purpose is not to introduce new abilities or architecture changes. Instead, it acts as a rehabilitation release: cleaning of edge issues in output generation, tightening instructions, and improving the system’s immediate interaction.
Small 3.2 vs. Small 3.1: What changes?
Instruction followers show a small but measurement improvement. Mr.’s internal accuracy increased from 82.75 % to 3.2 to 84.78 % in small 3.1.

Similarly, performance on outdoor datases such as Wild Bench V2 and Arena Hard V2 increased significantly – the Wild Bench increased by about 10 % points, while Arena was more difficult than double, which increased from 19.56 % to 43.10 %.
The internal matrix also recommends low output repetition. The rate of infinite breeds decreases in small 3.1 to less than 1.29 % to 3.2 – about 2 × decreases. This makes the model more reliable to developers building applications, which requires a permanent, binding response.
The performance in the text and coding benchmark offers a more proportional image. Small 3.2 showed the benefits of Humanol Plus (88.99 % to 92.90 %), MBPP Pass@5 (74.63 % to 78.33 %), and simple. It has also improved the MMLU Pro and mathematical results.

With slight fluctuations, vision standards are mostly permanent. The charter and Dakuva saw the slightest benefits, while AI2D and Mathesta reduced less than two percent points. The average vision performance dropped slightly from 81.39 % to 81.00 % to 81.00 % in small 3.1.

This is compatible with Mr. Real’s intentions: the small 3.2 model is not over -hall, but a dispersion. Similarly, most of the standards are within the expected change, and some regresses show business relationships to improve the target.
However, as AI Power User and Impact @Chat GPT 21 Posted on X: “It got worse on MMLU,” which means a large -scale multi -task language benchmark, a multi -sophisticated test that contains 57 questions that are designed to evaluate the wide LLM performance in domains. In fact, the small 3.2 scored 80.50 %, which is slightly below 80.62 % of small 3.1.
Open Source License will make it more aware and custom -focused users will appeal to
Both small 3.1 and 3.2 are available under Apache 2.0 licenses and can be accessed through popular. AI code sharing repository The hugs face (A startup in France and NYC itself).
Small 3.2 is supported by framework such as VLM and Transformers and requires about 55 GB GPU RAM to run BF16 or FP16 precision.
For developers seeking applications or services, model storage provides system indicators and diagnostic examples.
Although Mr. Small 3.1 is already connected to the platform like Google Cloud Vertex AI and has been scheduled for deployment on NVIDIA NIM and Microsoft Azure, the small 3.2 is currently limited to access to self -service through a sore throat and direct deployment.
What businesses should know when considering the misunderstanding of small 3.2 in cases of their use
Invalid Small 3.2 cannot change competitive positioning in the open -weight model, but it represents the wrong AI commitment to the refreshing model.
With significant improvement in reliability and task handling – especially around the precision of the instructions and the use of the device – offers a clean user experience to build small 3.2 developers and enterprises.
The fact is that it is made in accordance with a French start and EU rules such as the GDPR and EUAI Act, and also appeals to businesses working in this part of the world.
Nevertheless, for those who want the biggest jump in the performance of the benchmark, the small 3.1 is a reference point – especially in some cases, such as MMLU, a small 3.2 does not improve its predecessor. Depending on the issue of use, it updates a stability option than pure upgrade.