
Engineering teams are developing more code with AI agents than ever before. But when it comes to producing code, they hit a wall.
The problem isn’t necessarily the AI-generated code itself. This is because traditional monitoring tools typically provide the granular, function-level data AI agents need to understand how code actually behaves in complex production environments. Without this context, agents cannot detect problems or create improvements that account for the reality of production.
It is a challenge that startups HUD Looking to help solve that with the release of their runtime code sensor on Wednesday. The company’s anonymous sensor runs alongside production code, automatically detecting how each function behaves, giving developers a vantage point on what actually happens in a deployment.
"Every software team building at scale faces the same fundamental challenge: building high-quality products that work well in the real world," Roe Adler, CEO and founder of HUD, told VentureBeat in an exclusive interview. "In the new era of AI-accelerated development, not knowing how code behaves in production becomes another big part of the challenge."
What software developers are struggling with
The pain points faced by engineering organizations are fairly constant. Moshek Alvin, Group Tech Lead at Peer.com, oversees 130 engineers and describes a familiar frustration with traditional monitoring tools.
"When you get an alert, you’re usually checking an endpoint that has a high error rate or high latency, and you want to drill down to see downstream dependencies," Alvin told VentureBeat. "A lot of times it’s the actual application, and then it’s a black box. You only get 80% downstream latency on request."
The next step usually involves manual detective work in multiple tools. Check the logs. Try reconstructing the request associated with the timestamps to see what the request is doing. For novel cases deep in a large codebase, teams often lack precise data.
Daniel Marshall, CTO and co-founder of Drita, saw his engineers spend hours on what he called "Investigative tax" "They were mapping a generic alert to a specific code owner, then digging through the logs to reconstruct the state of the application," Marshallian told VentureBeat. "We wanted to eliminate this so our team could fully focus on fixes rather than discovery."
Drata’s architecture adds to the challenge. The company integrates with a number of external services to deliver automated compliance, which creates sophisticated investigations when issues arise. Engineers track behavior in threat, compliance, integration, and reporting modules spanning a very large code base.
Marshallian identified three specific issues that led Dreta to invest in runtime sensors. The first issue was the cost of context switching.
"Our data was fragmented, so our engineers had to act as human bridges between disconnected tools," He said.
Another problem, he noted, is alert fatigue. "When you have a complex distributed system, general alert channels become a constant stream of background noise, which our team describes as the ‘ding, ding, ding’ effect that is eventually ignored," said the marshal.
A third key driver was the need to integrate with the company’s AI strategy.
"An AI agent can write code, but it can’t fix a production bug if it can’t see the runtime variable or root cause." said the marshal.
Why Traditional APMs Can’t Easily Solve the Problem
Enterprises have long relied on a class of tools and services known as application performance monitoring (APM).
With the current pace of agentic AI development and modern development workflows, both Peer.com and Drata simply weren’t able to achieve the required visibility with existing APM tools.
"If I wanted to get this information from Datadog or from CoreLogics, I would just have to put tons of logs or tons of spans, and I would pay a lot of money," Alvin said.
Alvin noted that Peer.com used very low sampling rates due to cost constraints. This meant that they were often missing the precise data needed to debug problems.
Traditional application performance monitoring tools also require forecasting, which is a problem because sometimes a developer simply doesn’t know what they don’t know.
"Traditional observation requires you to guess what you will need to debug," said the marshal. "But when doing a novel release, especially deep within a large, complex codebase, you often lose precise data."
Drita reviewed a number of solutions in the AI ​​site reliability engineering and automated incident response categories and found no need.
"Most of the tools we reviewed were good at handling incident processes, routing tickets, summarizing Slack threads, or graphing." He said. "But they often stop short of the code itself. They can tell us that ‘service A is down’, but they can’t tell us specifically why."
Another common capability in some tools including error monitors like Sentry is the ability to catch exceptions. The challenge, according to Adler, is that it’s good to be aware of exceptions, but it doesn’t connect them to the business impact or provide the execution context that AI agents need to suggest improvements.
How Runtime Sensors Work Differently
Runtime sensors push intelligence to the edge where code is executed. HUD’s sensor runs as an SDK that integrates with a single line of code. It watches each function execute but only sends lightweight aggregate data unless something goes wrong.
When errors or slowdowns occur, the sensor automatically collects deep forensic data including HTTP parameters, database queries and responses, and the full execution context. The system establishes baselines of performance within a day and can alert on both dramatic slowdowns and outliers that 100%-based monitoring lacks.
"Now we just get all this information for all these functions regardless of what level they are, even for base packages." Alvin said. "Sometimes you can have a problem that is very deep, and we still see it very quickly."
The platform provides data through four channels:
Web application For centralized monitoring and analysis.
IDE extension For VS Code, JetBroids and Cursors that surface output metrics directly where the code is written
MCP server which feeds structured data to AI coding agents
Warning system which indicates problems without manual configuration
MCP server integration is critical for AI-supported development. Peer.com engineers now query output behavior directly within the cursor.
"I can only ask the cursor one question: Hey, why is this endpoint slow?" Alvin said. "When this HUD uses MCP, I get all the granular metrics, and the function is 30% slower after this deployment. Then I can also find the root cause."
It serves as an incident response. Instead of starting at Datadog and digging through the layers, engineers ask an AI agent to diagnose the problem. The agent has immediate access to function-level production data.
From voodoo events to minute-long improvisations
The shift from theoretical potential to practical impact becomes evident in how engineering teams actually use runtime sensors. What used to take hours or days of detective work is now solved in minutes.
"I’m used to these Voodoo cases where there’s a CPU spike and you don’t know where it’s coming from," Alvin said. "A few years ago, I had such an event and had to create my own tool that takes a CPU profile and memory dump. Now I just have all the function data and I’ve seen the engineers solve it so fast."
In Drata, the effect of quantity is dramatic. The company has created an internal /triage command that helps engineers running within their AI assistants to identify root causes. Manual triage work reduced from 3 hours to less than 10 minutes per day. Mean time to resolution is about 70%.
The team also prepares daily "head up" Quick win error report. Since the root cause is already caught, developers can fix these issues in minutes. Support engineers now perform forensic diagnostics that previously required a senior developer. Increased input through tickets without expanding the L2 team.
Where this technology fits
Runtime sensors occupy a distinct niche from traditional APMs, which excel at service-level monitoring but struggle with granular, cost-effective function-level data. They differ from error monitors that catch exceptions without business context.
The technical requirements to support AI coding agents are different from human facial recognition. Agents need structured, function-level data about which they can reason. They cannot analyze and associate raw logs like humans. Traditional observation also assumes that you can predict what you will need to debug and tool accordingly. This approach breaks down with AI embedded code where engineers don’t deeply understand every function.
"I think we’re entering a new era of AI-enfolded code and that puzzle, a new stack emerging, that’s emerging," Adler said. "I just don’t think the cloud computing observational stack is going to fit neatly into what the future looks like."
What does this mean for businesses?
For organizations already using AI coding assistants like GitHub Coplot or Cursor, Runtime Intelligence provides a layer of security for production deployments. This technology enables calls to Peer.com "Agentic investigation" Instead of manual tool hopping.
The broader implications are related to trust. "With AI-inflated code, we’re getting a lot of AI-inflated code, and engineers start not knowing all the code," Alvin said.
Runtime Sensor bridges this knowledge gap by providing a production context directly in the IDE where the code is written.
For enterprises looking to scale AI code generation beyond pilots, runtime intelligence solves a fundamental problem. AI agents generate code based on assumptions about the system’s behavior. Production environments are complex and unpredictable. Automatically capturing function-level behavior data from production gives agents the context they need to produce reliable code at scale.
Organizations should assess whether their current observation stack can cost-effectively provide the AI ​​agents they need. If achieving function-level visibility requires dramatically increased input costs or manual tools, runtime sensors can offer a more sustainable architecture for the AI-seclerated development workflows already emerging across the industry.