Lack of AI Capability: Risk of Delays, Rising Costs, and the Breakpoint of Upcoming Incremental Prices

by SkillAiNest

Lack of AI Capability: Risk of Delays, Rising Costs, and the Breakpoint of Upcoming Incremental Prices

The latest big headline in AI isn’t model size or multimodality — it’s lack of capacity. Val Bercovici, Chief AI Officer, at VentureBeat’s latest AI Impact stop in NYC Wiccajoins Matt Marshall, CEO of VentureBeat, to discuss what it really takes to scale AI amid rising latency, cloud lock-in, and runaway costs.

These forces, Bercovici argued, are driving AI toward its own version of incremental pricing. Uber famously introduced surge pricing, bringing real-time market rates to rider-sharing for the first time. Now, Bercovici argued, AI is moving toward the same economic calculus—specifically—when the focus shifts to profit.

"We don’t have the actual market rate today. We have subsidized rates. It’s important to enable a lot of innovation that’s happening, but sooner or later – considering the trillions of dollars of capex we’re talking about right now, and the limited energy opex – the true market rates are going to show. Maybe next year, definitely by 2027," He said. "When they do, it will fundamentally change the industry and drive a deeper, deeper focus on efficiency."

The economics of token explosion

"The first rule is that this is an industry where there is more. More tokens equals more business value," Bercovici said.

But until now, no one has figured out how to make it sustainable. The classic business triad – cost, quality and speed – translates to latency, cost and accuracy (especially in output tokens) in AI. And accuracy is non-negotiable. This is not only for user interactions with agents like ChatGPT, but also for high-stakes use cases like drug discovery and business workflows in heavily regulated industries like financial services and healthcare.

"It is non-negotiable," Bercovici said. "You need to have a lot of tokens for high uniqueness accuracy, especially when you add security to the mix, guardrail models, and quality models. Then you’re trading off latency and cost. That’s where you have some flexibility. If you can tolerate high latency, and sometimes you can for consumer use cases. If you can, you can have free tiers and lower cost with more tiers."

However, latency is a major hurdle for AI agents. “These agents no longer work in a singular sense. You either have agent congestion or no agent activity,” notes Bercovaci.

In a swarm, groups of agents work in parallel to accomplish a larger goal. An orchestrator agent—the smartest model—sits at the center, determining subtasks and key requirements: architecture choices, cloud versus on-prem implementation, performance constraints, and security considerations. The swarm then executes all subtasks, effectively spinning multiple concurrent users in parallel sessions. Finally, the evaluation model decides whether the overall task was completed successfully.

“These sheep go through what’s called multiple turns, hundreds if not thousands of cues and reactions until the sheep converges on the response,” Bercovici said.

“And if you have a certain amount of delay in those thousand turns, it becomes unsustainable. So the delay is really, really important. And that means that it’s generally subsidized today, and that’s going to come down over time.”

Reinforcement learning as a new paradigm

As of around May of this year, agents weren’t going to perform, Bercovici explained. And then context windows got big enough, and GPUs enough available, to support agents that could accomplish advanced tasks like writing reliable software. It is now estimated that in some cases, 90% of software is developed by coding agents. Now that agents have essentially come of age, Bercovici notes, reinforcement learning is a new conversation among data scientists at some well-known labs, such as OpenAI, Entropic, and Gemini, who see it as an important path forward in AI innovation.

"The current AI season is reinforcement learning. “It combines many elements of training and estimation into a unified workflow,” Bercovici said. This is the latest and greatest scaling law of the mythical milestone we are all trying to reach called AGI – Artificial General Intelligence. "What’s interesting to me is that you have to apply all the best practices for training models, in addition to how you train the models, to be able to repeat these thousands of reinforcement learning loops and advance the whole field."

The path to AI profitability

There’s no one answer when it comes to building the infrastructure foundation to make AI profitable, Bercovici said, because it’s still an emerging field. There is no cookie-cutter approach. Going all-prem may be the right choice for some—especially frontier model builders—while going cloud-native, or running in a hybrid environment, may be a better path for organizations that want to innovate aggressively and responsively. Regardless of which path they initially choose, organizations will need to adapt their AI infrastructure strategy as their business needs evolve.

"Unit economics mainly matters here," Bercocci said. "We’re definitely in a boom, or even a bubble, you might say, in some cases, since the underlying AI economics are being subsidized. But that doesn’t mean that if tokens become more expensive, you’ll stop using them. Depending on how you use them, you’ll get great grains."

Berkovisi concluded that leaders should focus less on individual token prices and more on transaction-level economics, where efficiency and impact become visible.

The key question businesses and AI companies should be asking, Bercovici said, is “What is the real cost for my unit economics?”

Seen through this lens, the way forward isn’t about doing less with AI — it’s about doing it better and more efficiently.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro