
A new study by a new Shanghai Jiao Tong University And SII Generative AI Research Lab (Geair) shows that large -scale datases are not required for complex, independent tasks (LLMS) training (LLMS).
Their framework, Lemi (Is less for intelligent agency), LLM develops similar work in other fields of research and shows that “the machine emerges from the strategic curse of high quality agents, not the most frequently of autonomous data.”
In other words, it is data StandardNot MassIt is important.
In experiments, researchers found that with a Small, but carefully produced, only 78 examples, they can train LLM to improve trained models on thousands of examples. Key industry standards from a lot of margins.
This discovery can lead to important implications for enterprise applications where data is reduced or expensive.
The challenge of construction of working agents
Researchers describe the agency as an emerging ability to operate as an independent agents of the AI ​​system. In other words, these are the AI ​​systems that “not just think, but also work.”
The problem is that the current training framework assumes that high agent intelligence requires a lot of data, as shown in the classical scabling rules of language modeling. Researchers argue that rapidly complex training pipelines and resources from this point of view lead to certain requirements for resources. In addition, in many fields, data is difficult to get, not in large quantities, and very expensive.
However, research in other domains suggests that you do not necessarily need further data to achieve training goals in LLM training.
For example, LimaIn a dissertation of 2023, a model can be effectively linked to only 1,000 curse examples. Right now, Lemon It was shown that the complex mathematical reasoning can only be revealed by 817 training samples.
With Lemmy, researchers tried to implement the same “less” principle in the complex world of AI agents.
How does Lymy work
The Lemmy Framework shows that sophisticated agent intelligence can come out of a minimal but strategy -made demonstrations of independent behavior. The key to the framework is a pipeline for collecting high quality demonstrations of agent works.
Each demonstration consists of two parts: one inquiry and one speed. Question is a natural language application by the user, such as the need for software development or scientific research purpose.
At this speed, there is a series of Takes steps for AI’s inquiry, including its internal reasoning, its code translator, such as external tools, and observations from the environment. For example, a question can be "Make a simple chat application," And at this speed, the agent’s internal reasoning and action plan, the code that writes it and implements it, will result in output or errors.
This pace may include a number of repetitions of planning, implementation and reflection unless it achieves the desired purpose.
To build their dataset, researchers launched 60 questions from the real world scenario facing professional developers and researchers. Then they used this pool to use GPT-5 Recipe additional questions from Gut Hub Bridge requests.
He employed a team of four computer science PhD students to test the quality of these questions and selected 18 examples to create a high quality set of 78 questions focused on software development and research workflows.
To create speed, the same PhD students cooperated with a CLI coding agent operated by GPT5 to complete 78 tasks.
He followed a remedy process, unless each task was successfully completed until the whole conversation was completed, obtained the full arc of the realistic human AI’s cooperation, which included communication and upsetting. Collectted for more complex questions, the accumulated path can increase to more than 152,000 tokens.
Researchers write, “This approach guarantees that our models learn not only from successful results but also from the full process of solving the problem, including how to adopt strategies and withdraw from failures while implementing mutual cooperation.”
In the Lemi process
To test their framework, the team reviewed the model Agency benchA benchmark that is designed to measure the skills of agents, as well as the other set of benchmarks for the use of the device and coding.
They said right GLM-4.5A powerful open source model, using his 78 sample dataset and compared its performance against several Frontier models, including twenty GLM-4.5, Kimi-K2-InstructAnd DPSEC-V 3.1. The Lemi-trained model scored an average of 73.5 % on the agency bench, which significantly better performed all baseline models, of which the best of which (GLM-4.5) scored 45.1 %.
This edge extended the use of the device, coding, and other standards covering scientific computing, where Lemi also performed well to all basins.
More importantly, this study shows that the model provided, providing trained models trained at only 78 examples, trained, trained with 10,000 samples from other datasets. High performance with 128 times less data.
Researchers write, “This discovery basically gives a new look at how we develop an independent AI system, which suggests that the mastering agency needs to understand its essence, not scaling training data.” “When industries have a transfer from thinking about AI, Lemmy really provides a sample for sustainable cultivation of agent intelligence.”
Researchers have released Code Statistics synthesis and training and Model weight. For the enterprise, this approach offers a practical way towards the development of high -profile AI agents.
Instead of launching large -scale data collecting plans, organizations can take advantage of their domestic skills and articles experts to create small, high quality datases for basipok agent works. This reduces admission obstruction and enables businesses to build customs AI agents who can provide competitive edge over work flows that are most important to them.