AI agents fail 63% of the time on complex tasks. Petronus Ai says his new 'living' training worlds can fix this.

AI agents fail 63% of the time on complex tasks. Petronus Ai says his new ‘living’ training worlds can fix this.

Patronus AIArtificial Intelligence Assessment was launched 20 million including from investors Lightspeed Venture Partners And Datadogon Tuesday unveiled a new training architecture that says AI agents learn to perform complex tasks.

Technology, called a company "Generative simulatorsfor , for , for , ." Adaptive simulation creates an environment that continuously generates new challenges, dynamically updates rules, and evaluates an agent’s performance as it learns—all in real time. The approach is an opportunity to depart from static benchmarks that have long served as the industry standard for measuring AI capabilities but are increasingly coming under fire for failing to predict real-world performance.

"Traditional standards measure isolated abilities, but they miss the constraints, context switches, and layers of decision-making that define real work." In an exclusive interview with VentureBeat, Patronus AI Chief Executive and Co-Founder Anand Kanpan said: "For agents to perform at a human level, they need to learn the way humans do through dynamic experience and continuous feedback."

The announcement comes at a critical moment for the AI industry. AI agents are changing software development, from writing code to executing complex instructions. Yet LLM-based agents are prone to errors and often perform poorly on complex, multi-objective tasks. Research published earlier this year found that an agent with just one 1% error rate per step A 63% chance of failure can be achieved by the 100th stage.

Why static AI benchmarks are failing – and what comes next

Petronas AI’s approach is marked by what the company describes as an increasing similarity between testing AI systems and how they actually perform in production. Traditional benchmarks, the company says, work like standardized tests: They measure specific abilities at a fixed point in time but struggle to capture the messy, unpredictable nature of real work.

new Generative simulators Architecture reverses this model. Instead of presenting agents with a fixed set of questions, the system generates assignments, environmental conditions, and monitoring processes on the fly, then adapts based on the agent’s behavior.

"Over the past year, we have seen a shift away from traditional static standards towards more interactive learning foundations," Rebecca Qian, chief technology officer and co-founder of Patronus AI, told VentureBeat. "This is partly due to the innovation we’ve seen from model developers—reinforcement learning, post-training, and toward continuous learning, and away from supervised-directed tuning. This means that the gap between training and evaluation has been eliminated. Benchmark environments have been created."

The technology is based on reinforcement learning – an approach where AI systems learn through trial and error, receiving correct actions and penalty rewards for mistakes. Reinforcement learning is an approach where AI systems learn to make optimal decisions by receiving rewards or penalties for their actions, improving through trial and error. RL can help improve agents, but it usually requires developers to rewrite their code extensively. This discourages adoption, although the data generated by these agents can significantly boost performance through RL training.

Patronus AI also introduced a new concept called "Open again and again to improve yourselffor , for , for , ." Or else – environments where agents can continuously improve through interaction and feedback without the need for a full training cycle between attempts. The company positions this as critical infrastructure for the development of AI systems that are capable of continuous learning rather than being frozen in time.

Inside the ‘Goldilocks Zone’: How Adaptive AI Training Finds the Sweet Spot

In the heart of Generative simulators lies called Patronus ai a "Curriculum Adjuster" – A component that analyzes agent behavior and dynamically changes the difficulty and nature of training scenarios. The approach is influenced by how human teachers adapt their instruction based on student performance.

Kian explained the approach using an analogy: "You can think of it as a teacher-student model, where we are training the model and the professor continuously adapts the curriculum."

This adaptive approach solves the problem identified by Kanppen "The Goldilocks Zone" In the training data – ensuring that the examples are neither too simple nor too simple for a given model to learn efficiently.

"What matters is not whether you can train on a data set, but whether you can train your model on a high-quality data set—one that can actually learn from it," Kanpan said. "We want to make sure that the model is not too hard, and not too easy, for example."

The company says initial results show significant improvements in agent performance. According to the company, training mentor AI environments in real-world tasks including software engineering, customer service, and financial analysis has increased task completion rates by 10% to 20%.

AI Cheating Problem: How ‘Moving Target’ Environments Prevent Reward Hacking

One of the most persistent challenges in training AI agents through reinforcement learning is a trend researchers call "Reward Hacking"Wherever systems learn to exploit flaws in their training environment rather than actually solving problems. Famous examples include early agents who learned to hide corners of video games instead of actually playing them.

Generative simulators address this by making the training environment itself a moving target.

"Reward hacking is mainly a problem when the system is stable. It’s like students learning to cheat on a test," Kian said. "But when we continuously develop the environment, we can actually see parts of the system that need to be adapted and developed. Static standards are fixed goals. The generative simulator environment is moving targets."

Patronus AI reports 15x revenue growth as enterprise demand for increased agent training

Petronas positions AI generative simulators as the basis of a new product line it calls "RL Environ" – Foundation model laboratories for specific domains and training ground designed for enterprises building agents. The company says the offering represents a strategic expansion beyond its original focus on diagnostic tools.

"We’ve grown 15x revenue this year, largely due to the high-quality environment we’ve created that has been shown to be highly learnable by a variety of frontier models," Kanpan said.

The CEO declined to specify absolute revenue figures but said new products have allowed the company "Move up the stack in terms of where we sell and who we sell to." The company’s platform is used by several Fortune 500 enterprises and leading AI companies worldwide.

Why OpenAI, Anthropic, and Google Can’t Build Everything in-House

A central question is faced Patronus AI This is why deep-pocketed laboratories develop frontier model-systems Open Eyefor , for , for , . Anthropicand Google DeepMind – License the training infrastructure instead of building it yourself.

Kanapan admitted that these companies "Making significant investments in the environment" But it argued that the breadth of domains that require specialized training creates a natural opening for third-party providers.

"They want to improve agents across many different domains, whether it’s coding or using tools or navigating browsers or workflows in finance, healthcare, energy, and education." He said. "It is very difficult for a single company to solve all these different operational problems."

The competitive landscape is intensifying. Microsoft recently released Agent Lightingan open-source framework that makes reinforcement learning work for any AI agent without rewriting it. nvidia’s Nemo Gym Agentic offers a modular RL infrastructure for developing agentic AI systems. In November, Meta researchers released DreamGym, a framework that simulates an RL environment and dynamically adjusts task difficulty as agents improve.

‘Environment is the new oil’: Petronus AI’s bold bet on the future of AI training

Looking ahead, Patronus AI formulated its mission in clean terms. The company wants "Environmentalize all the world’s data" – Transforming human workflows into structured systems that AI can learn from.

"We believe that everything should have an environment – internally, we joke that environment is the new oil," Kanpan said. "Reinforcement learning is only one training method, but building the environment is what really matters."

Kian elaborated on the occasion: "This is a completely new field of research, which does not happen every day. Generative simulation is inspired by early research in robotics and embodied agents. It’s been a pipe dream for decades, and we’re only now able to achieve these ideas because of the capabilities of today’s models."

The company launched in September 2023 with a focus on diagnostics — helping businesses identify fraud and security issues in AI outputs. This mission has now extended itself into training. Patronus AI has argued that the traditional separation between evaluation and training is collapsing — and that whoever controls the environment in which AI agents learn will shape their capabilities.

"We really are at this critical point, this inflection point, where what we do now will affect what the world looks like for generations to come." Kian said.

whether Generative simulators Whether they can fulfill this promise remains to be seen. The company’s 15x revenue growth shows that enterprise customers are hungry for solutions, but deep-pocketed players are. Microsoft to Meta Racing to solve the same basic problem. If the past two years have taught the industry anything, it’s that in AI, the future has a habit of arriving ahead of schedule.

Editor's pick

Get latest news

AI agents fail 63% of the time on complex tasks. Petronus Ai says his new ‘living’ training worlds can fix this.

Why static AI benchmarks are failing – and what comes next

Inside the ‘Goldilocks Zone’: How Adaptive AI Training Finds the Sweet Spot

AI Cheating Problem: How ‘Moving Target’ Environments Prevent Reward Hacking

Patronus AI reports 15x revenue growth as enterprise demand for increased agent training

Why OpenAI, Anthropic, and Google Can’t Build Everything in-House

‘Environment is the new oil’: Petronus AI’s bold bet on the future of AI training

Why they matter and how to use them

Bolmo’s architecture unlocks efficient byte-level LM training without sacrificing quality

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news