Anthropic's Cloud Ops 4.5 is here: cheap AI, unlimited cheats, and coding skills that beat humans

Anthropic’s Cloud Ops 4.5 is here: cheap AI, unlimited cheats, and coding skills that beat humans

Anthropic Claiming to be the latest in software engineering tasks, it released its most capable artificial intelligence model yet on Monday, raising prices by nearly two-thirds.

new model, Cloud Ops 4.5According to materials reviewed by VentureBeat, Anthropic scored higher on the toughest internal engineering assessment than any human job candidate in the company’s history. This in turn highlights the potential of AI systems to advance rapidly and raises questions about how the technology will reshape white-collar occupations.

The Amazon-backed company is pricing in Cloud Ops 4.5 5 per million input tokens And 25 per million output tokens – A dramatic reduction from the $15 and $75 rates for its predecessor, Cloud Ops 4.1released earlier this year. The move makes Frontier AI capabilities accessible to a wider segment of developers and enterprises while putting pressure on competitors to match both performance and pricing.

"We want to make sure it really works for people who want to work with these models," In an exclusive interview with VentureBeat, Alex Albert, Anthropic’s head of developer relations, said: "That’s really our focus: How can we make Cloud better at helping you do things that you don’t necessarily want to do in your work?"

The announcement comes as Anthopak races to maintain his position in an increasingly crowded field. Recently released by Openai GPT-5.1 and is called a special coding model Codex Max which can operate autonomously for extended periods. Google unveiled Gemini 3 Just last week, Concerns also signaled from Openee According to a recent information report, about the progress of the search giant.

Developers say that Opus 4.5 demonstrates better judgment on real-world tasks

Anthropic’s internal testing revealed what the company describes as a qualitative leap in CloudOps 4.5’s reasoning capabilities. The model achieved 80.9% accuracy SWE Bench CertificationA benchmark that measured real-world software engineering tasks, it outperformed OpenAI’s Sonnet 4.5 (77.2%) and Google’s Gemini 3 Pro (76.2%), according to company data.

But technical standards tell only part of the story. Albert said that employee testers have consistently reported that the model demonstrates better judgment and intuition in a variety of tasks. The transformation that the model describes has significance in a real-world context that makes sense.

"The model becomes just one type," Albert said. "It’s just developed such intuitiveness and judgment on so many real-world things that it feels competently like a huge leap from past models."

He pointed to his own workflow as an example. Previously, Albert said, he would ask AI models to gather information but was hesitant to trust their synthesis or preference. With Opus 4.5, he’s assigning more complete tasks, connecting them to Slack and internal documents to create cohesive summaries that match his priorities.

Ops 4.5 outsources all human candidates to the company’s toughest engineering test

The model’s performance on Entropic’s internal engineering assessment marks a significant milestone. Designed for prospective performance engineering candidates, the take-home exam aims to assess technical ability and judgment under time pressure in a prescribed two-hour time limit.

Using a technique called parallel test-time compute—which collects multiple attempts from the model and selects the best result. Ops 4.5 According to the company, scored higher than any human candidate who has taken the test. Without the time limit, the model matched the performance of the best human candidate ever when used within CloudCode, Entropic’s coding environment.

The company acknowledged that the test does not measure other important professional skills such as interpersonal skills, communication, or instincts that develop over years of experience. Still, Anthropic concluded "Raises questions about how AI will change engineering as a profession."

Albert emphasized the importance of exploration. "I think it’s a sign, maybe, of what’s around how useful these models can really be in work contexts and for our jobs." He said. "Of course, it was an engineering job, and I would say that the models are relatively advanced compared to other fields, but I think it’s a really important signal to pay attention to."

Dramatic performance improvements reduce token usage by 76% on key benchmarks

Beyond raw performance, Anthropic is betting that performance improvements will make a difference Cloud Ops 4.5 The company says in the market that the model uses dramatically fewer tokens — the units of text that the AI system processes — to achieve similar or better results than its predecessors.

At the medium effort level, Opus 4.5 matches the previous one Sonnet 4.5 According to Entropic, the model’s excellent score on the SWE bench confirmed this. At high effort levels, Ops 4.5 outperforms Sonnet 4.5 by 4.3 percentage points while still using 48 percent fewer tokens.

To give developers more control, Anthropic introduced a "The effort parameter" This allows users to adjust how much computational work the model applies to each task – balancing efficiency against latency and cost.

Enterprise users provided early validation of performance claims. "Ops 4.5 beat Sonnet 4.5 and the competition on our internal benchmarks, using fewer tokens to solve the same problems." Michele Catasta, president of cloud-based coding platform Riplet, said in a statement to VentureBeat. "At scale, those performance compounds."

Mario Rodriguez, GitHub’s chief product officer, said that initial testing shows that Ops 4.5 "Internal Coding outperforms benchmarks while cutting token usage in half, and is particularly well-suited for tasks like code migration and code refactoring."

Early users report AI agents that learn from experience and improve their skills

One of the most amazing capabilities demonstrated by early users involves anthropic calls "Self-improving agents" – AI systems that can improve their performance through iterative learning.

Rakutenthe Japanese e-commerce and Internet company tested Cloud Ops 4.5 on the automation of office tasks. "Our agents were able to improve their skills autonomously – achieving high performance in 4 iterations while other models could not match this standard after 10," AIKAI General Manager for Business, Yusuke Kaji said.

Albert explained that the model is not updating its weights — the fundamental parameters that define the behavior of an AI system — but iteratively improving the tools and approaches it uses to solve problems. "It was iteratively improving a skill for a task and looking to improve the skill to achieve better performance so it could accomplish that task," He said.

The ability extends beyond coding. Albert said Anthropic has seen significant improvements in creating professional documents, spreadsheets and presentations. "They’re saying it’s the biggest jump they’ve seen between model generations." Albert said. "So even going from Sonnet 4.5 to Opus 4.5, a big leap back from either of the two models in the past."

Basic research labsa financial modeling firm, reported this "Accuracy on our internal evals increased by 20%, efficiency by 15%, and complex tasks that once seemed out of reach became achievable," According to co-founder Nico Christie.

New features target Excel users, Chrome workflows and remove chat length limits

Along with the release of the model, Anthropic rolled out a suite of product updates aimed at enterprise users. Cloud for Excel became available for Mac, Team, and Enterprise users in general with new support for pivot tables, charts, and file uploads. The Chrome browser extension is now available for more users.

Perhaps most notably, Anthropic introduced "Unlimited chats" – A feature that eliminates context limitations by automatically summarizing the first parts of conversations as they grow longer. "Within CloudAI, within the product itself, you effectively get this kind of infinite context window because of the compression, plus some of the memory stuff that we’re doing," Albert explained.

For developers, Anthropic was released "Programmatic tool calling," which allows Claude to write and execute code that calls functions directly. Cloud Code got an update "Plan mode" And Research became available on the desktop in preview, enabling developers to run multiple AI agent sessions in parallel.

The market heats up as Google races to compete on performance and pricing, OpenAI

Anthropic has arrived 2 billion in annual revenue During the first quarter of 2025, more than doubling from $1 billion in the previous period. The number of users spending more than 100,000 annually has increased eightfold year over year.

Rapid release of Ops 4.5 – just weeks later Haiku 4.5 And in October Sonnet 4.5 in September—reflecting broader industry dynamics. OpenAI released several GPT5 variants in 2025, including a special Codex Max model in November which can work autonomously for 24 hours. Google shipped Gemini 3 in mid-November after months of development.

Albert attributes Anthropic’s rapid growth in part to using Claude to accelerate its development. "We’re seeing a lot of support and acceleration from Claude himself, whether it’s on the actual product building side or the model research side." He said.

A price cut for Ops 4.5 could put pressure on margins while potentially expanding the addressable market. "I’m expecting a lot of startups to start incorporating it into their products and feature it prominently," Albert said.

Yet profits are elusive for leading AI labs because they invest heavily in computing infrastructure and research capabilities. The AI market has the potential to reach a trillion dollars in revenue Within a decade, but no single provider has established a dominant market position—even as models reach a threshold where they can automate meaningfully complex knowledge work.

Michael Trowell, CEO of Cursor, an AI-powered code editor called Opus 4.5 "A significant improvement over previous Cloud models within the cursor with improved pricing and intelligence on difficult coding tasks." AI Coding Startup, provides said model "Strong results of our most rigorous assessment and consistent performance through 30-minute independent coding sessions."

For businesses and developers, competition translates to rapidly improving capabilities at declining costs. But because AI’s performance on technical tasks approaches — and sometimes exceeds — the level of a human expert, the impact of technology on professional work is less theoretical.

When asked what he indicated about the engineering test results and the path to AI, Albert was direct: "I think paying attention is a really important signal."

Editor's pick

Get latest news

Anthropic’s Cloud Ops 4.5 is here: cheap AI, unlimited cheats, and coding skills that beat humans

Developers say that Opus 4.5 demonstrates better judgment on real-world tasks

Ops 4.5 outsources all human candidates to the company’s toughest engineering test

Dramatic performance improvements reduce token usage by 76% on key benchmarks

Early users report AI agents that learn from experience and improve their skills

New features target Excel users, Chrome workflows and remove chat length limits

The market heats up as Google races to compete on performance and pricing, OpenAI

What’s next for Alphafold: A conversation with the Google DeepMind Nobel laureate

My Honest Review on Abacus AI: Chatlam, Depagent and Enterprise

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news