Researchers have learned that re -training of only small parts of AI models can reduce costs and avoid forgetting

by SkillAiNest October 13, 2025

written by SkillAiNest October 13, 2025

Businesses often find out when They fix the modelsA large language model (LLM) includes an effective approach and data to make the Fit Fit of the purpose so that the model loses some of its abilities. After fine toning, some models forget how to perform some tasks or other tasks “they have already learned.

The University of Illinois Urbana Champion’s research suggests a new method for training models, which avoids “destructive forgetfulness”, in which the model loses its first knowledge. This article focuses on two specific LLMs that react to the images: Llava and Qi van 2.5-VL.

The approach encourages businesses to re -train only the narrow parts of LLM to avoid the training of the entire model and increase the computing costs significantly. The team claims that destructive forgetfulness is not a lack of real memory, but rather a side effect of the flow of prejudice.

The team wrote, “A new LMM training can have millions of dollars, weeks of time, and hundreds of tonnes of CO2, so finding ways to update existing models more efficient and effectively is a worrying concern.” Paper. “By this result, we look for tuning recipes that preserve learning by limiting output shifts.”

Researchers focused on the model’s internal decision -making component, multi -layer prescription (MLP).

Destructive forgetfulness

Researchers first wanted to confirm the reason for existence and destructive forgetting in models.

To do this, they, they created a set of target tasks to complete the models. The models were then fixed and estimated to determine whether they could forget enough to forget. But as this process continues, researchers found that models were recovering some of their abilities.

He said, “We also saw an amazing result, that after counting work training, the model’s performance in the Holdout Benchmark will be significantly reduced, it will mostly recover on the path VA, which is another special task that is not represented in the benchmark.” “Meanwhile,” by performing experiences that forget the experiences of discharge, we just tried to separate the self-made projection (SA projection) or MLP layers, which shows that only LLM was better than the only tuning full model, which led to a very surprising result. Sending was that it targeted the good Gooding, which led to the targeting of self -proclaiming layers very good, but it causes a lot of good good work which only has good benefits to the self -made projection layers. Work in a sequence. “

Researchers said they believe that “forgetting or interfering after a fine toning on a tight target work seems to be prejudiced in the distribution of production due to change in work distribution.”

Tight training

This search proved to be the key to experiment. Researchers noted that tuning the MLP increases the chances of “output of a numerical token and a highly associated drop in task accuracy”. What he has shown is that it is only temporary to forget some of his knowledge, not a long -term matter.

Researchers said, “Avoid the distribution of output, we tune the MLP -up/gating estimates while keeping the projection below frozen, and shows that it gets a similar education in full MLP tuning with a little forgetting.”

This allows the model to improve the more straight and more reproductive method.

Instead of wholesale training, you can reduce costs by focusing on a narrow segment of the model. It also allows for better control of the output.

However, research is focused on only two models, especially those who deal with vision and language. Researchers noted that due to limited resources, they were unable to try experience with other models.

However, their results can be extended to other LLMs, especially in different ways.

Editor's pick

Get latest news

Researchers have learned that re -training of only small parts of AI models can reduce costs and avoid forgetting

Destructive forgetfulness

Tight training

Self -improving language models are becoming a reality with MIT’s latest seal technique

AI and Creative Capacity: Can machines really think like artists? | Shama by ideas | October, 2025

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news