
Agent memory remains a problem that enterprises want to fix, as agents tend to forget certain instructions or conversations while they’re on the go.
Anthropic Hope that solves the problem for him Cloud Agent SDKcreating a two-fold solution that allows an agent to operate in different context windows.
“The main challenge of long-running agents is that they must operate in discrete sessions, and each new session starts with the memories of the previous one,” Anthropic wrote. A blog post. “Because context windows are limited, and because most complex projects cannot be completed in a single window, agents need a way to bridge the gap between coding sessions.”
Anthropic engineers proposed a two-fold approach to their Agent SDK: an initialization agent to set up the environment, and a coding agent to make incremental progress each session and leave samples for the next.
Agent memory problem
Because agents are built on foundational models, they are limited, though constantly expanding, by context windows. For long-running agents, this can cause a major problem, causing the agent to forget instructions and behave abnormally when performing a task. Extending agent memory It becomes essential for consistent, business-safe performance.
A number of approaches have emerged over the past year, all trying to bridge the gap between context windows and agent memory. Lingchenof the LangMem SDK, Memobase And Open EyeThere are many examples of companies offering memory solutions. Research on agentic memory has also recently exploded, suggesting that frameworks such as MAMP And An example of nested learning from Google Introducing new alternatives to enhance memory.
Many existing in-memory frameworks are open source and ideally adaptable to different language models (LLMS) powering agents. Anthropic’s approach improves its Cloud Agent SDK.
How it works
Anthropic pointed out that although the Cloud Agent SDK had context management capabilities and “it should be possible for an agent to continue doing useful work for a long time,” this was not enough. The company said in its blog post that a model Like Ops 4.5 Running the Cloud Agent SDK “can reduce production-quality web app builds if it is only given a high-level prompt, such as ‘create a clone of Cloud.'”
Anthropic said the failures showed up in two patterns. First, the agent tried to do too much, causing the model to go out of context in the middle. The agent then has to guess what happened and cannot give clear instructions to the next agent. The second failure occurs later, after some features have already been built. The agent sees that progress has been made and simply announces that the task is done.
Anthropic researchers broke down the solution: establishing an initial environment to lay the foundation for the features and prompting each agent to make incremental progress toward a goal, while still leaving a clean slate at the end.
This is where the two-part solution of Anthropic’s Agent comes in. The initial agent configures the environment, including what agents do and what files are included. The coding agent will then tell the models to make incremental progress and skip texture updates.
“The inspiration for these methods came from learning what effective software engineers do every day,” said Entropic.
The researchers said they added testing tools to the coding agent, improving the ability to identify and fix bugs that weren’t obvious from the code alone.
Future research
Anthropic noted that its approach is “a potential set of solutions in long-acting agent use.” However, this is just the beginning of what could become a vast research area for many in the AI ​​space.
The company said its experiments in developing long-term memory for agents have not shown whether a single general-purpose coding works best in an agent context or in a multi-agent structure.
His demo also focused on full-stack web app development, so other experiments should focus on generalizing the results to different tasks.
“It’s likely that some of these or some of these lessons could be applied to the kinds of tasks that agents do with long runs, for example, scientific research or financial modeling,” Entropik said.