
Researchers University of Illinois Urbana Champion And Google Cloud AI Research Has developed a framework that enables large language model (LLM) agents to manage their experiences in the memory bank, which helps them in complex tasks over time.
Framework, called ArgumentationThe successful and unsuccessful attempts to solve problems eliminate the “general reasoning strategy”. The agent then uses this memory to avoid repeating past mistakes and make better decisions as he faces new problems. Researchers show that when together with Test time scaling techniquesWhere an agent makes a number of efforts, the reasoning bank significantly improves the performance and performance of LLM agents.
Their results show that the reasoning bank permanently improves the classic memory mechanism in the web browsing and software engineering benchmark, which offers a practical path to building more adaptive and reliable AI agents for enterprise applications.
LLM Agent Challenge of Memory
Since LLM agents are deployed in long -running applications, they face a permanent series of tasks. One of the important limits of current LLM agents is that they have a failure to learn from this accumulated experience. By approaching everything in isolation, they inevitably repeat past mistakes, waste valuable insights from relevant issues, and fail to promote skills that make them more capable over time.
The solution to this limit is to give agents some kind of memory. The previous efforts to memorize agents have focused on reusing past interaction by organizing information in different forms from simple text to structural graph. However, these views often decrease. Many people use raw interaction logs or just store examples of successful work. This means that they cannot eliminate high levels of transfer reasoning samples and, importantly, they cannot remove valuable information from agent failures and not use. As researchers note in their dissertation, “existing memory designs are often limited to future decisions, rather than providing general guidance, instead of providing general guidance.”
How the reasoning bank works
The reasoning bank is a memory framework designed to remove these limits. Its main idea is to put indicators from the use of useful strategies and previous experiments in the structure memory items that can be stored and reused.
According to June Yan, a Google research scientist and co -author of the article, it has identified a fundamental change in agents’ working procedures. "Traditional agents work as stable – every job is done in isolation," Yan explained. "The reasoning bank changes every work experience (successful or unsuccessful) into structural, reusable reasoning memory. As a result, the agent does not start with every user. It remembers and shields a proven strategy with such matters of such past."
The framework implements both successful and unsuccessful experiences and turns them into a reserve of useful strategies and lessons. The agent decides through success and failure LLM-AS-A-Judge Schemes Eliminating the need for human labeling.
Yan offers a practical example of this process. An agent entrusted to find a Sony headphones may fail because its extensive search inquiry returns more than 4,000 irrelevant products. "The reasoning bank will first try to find out why this approach failed," Yan said. "Then it will eliminate strategies such as ‘improve search inquiries’ and ‘limit category filtering products.’ That strategy will be extremely useful to successfully perform similar tasks in the future."
This process runs in a closed loop. When an agent faces a new task, he uses embedded search -based search to guide the bank -related memories to guide his actions. These memories are inserted into the agent’s system prompt, which provides context for its decision -making. Once the work is completed, the framework produces new memory items to remove insights from successes and failures. This new knowledge is then analyzed, sleeve and reasoning into the bank, allowing the agent to permanently develop and improve his abilities.
Super charging memory with scaling
Researchers found a powerful harmony between memory and Test time scaling. The classic test time scaling includes creating numerous independent answers to the same question, but researchers say that “vanilla appearance is the highest because it does not take advantage of the hereditary contradictory signal that arises from useless research on the same issue.”
To identify this, they suggest memory -famous test time scaling (Mattis), which connects scaling with the bank. Matt comes in two forms. In “parallel scaling”, the system produces more than one multiple speed of the same query, then compares and contradicts them to identify samples of permanent reasoning. In the sequence scaling, the agent improves its reasoning within the same effort, in which intermediate notes and improvements also work as a valuable memory signal.
This creates a good cycle: The reasoning attracts the existing memory agent in the bank to more promising solutions, while the diverse experiments created by scaling enable the agent to create high quality memories for storage in the bank.
Researchers write, “This positive opinion is a Loop memory -driven experience as a new dimension of scaling for agents as a scaling.”
The reasoning bank in the process
Researchers experienced their framework Webrina (Web browsing) and SWE Bench Confirmation – They compared the argument bank compared to basins, including memory -free agents and agents, which used speed -based or workflow -based memory framework.
The results show that the reasoning bank permanently improves these basins and LLM backbones permanently. On webrina, it improved the overall success rate of success compared to the memory -free agent by 8.3 percent. It also made it more difficult, cross domain tasks, while also reduced the number of interaction measures needed to complete the tasks. When combined with the mats, parallel and setting scaling increased further performance, while improving the standard test time scaling permanently.
The benefit of this performance directly affects operational costs. Yan points to a matter where the memory -free agent took eight trials and error measures to find the correct product filter on a website. "The reasoning can be avoided by taking advantage of the bank -related insights, these trials and error costs," He noted. "In this case, we save double the operational costs," Which also improves the user’s experience by solving problems faster.
Businesses can help developing effective agents at the Bank cost -based bank costs that can learn from experience in areas such as complex workflows and software development, customer support, and data analysis. As at the end of the dissertation, “our searches suggest an practical way towards adaptation and lifetime learning agents.”
Yan confirmed that their searches really point to the future of structural intelligence. For example, a coding agent can learn unique skills such as API integration and database management from separate tasks. "Over time, this modular skill … Become Building Blocks Agent can re -work flexible to solve more complex tasks," He said, suggesting a future where agents can collect their knowledge independently to handle the entire workflow with minimal human surveillance.