
Zoom video communicationthe company best known for keeping remote workers connected during the pandemic, announced last week that it had achieved the highest score on record on one of the most demanding tests of artificial intelligence. This claim sent waves of wonder, skepticism and genuine curiosity through the technology industry.
The San Jose-based company said its AI system scored 48.1 percent But Humanity’s final testa benchmark designed by subject matter experts from around the world, to benchmark even the most advanced AI models. As a result, Google exits Gemini 3 Prowhose previous record was 45.8 percent.
"Zoom has achieved a new state-of-the-art result on the full set of the challenging Humanity Endpoints benchmark, scoring 48.1 percent, which represents a substantial 2.3 percent improvement over the previous SOTA results." Zoom’s Chief Technology Officer, Zoom’s Chief Technology Officer wrote Blog POSt.
The announcement raises a tantalizing question that has consumed AI watchers for days: a video conferencing company — one with no public history of training large language models. Suddenly Vault Past Googlefor , for , for , . Open Eyeand Anthropic On a benchmark designed to measure the frontiers of machine intelligence?
That answer says as much about where AI is headed as it does about Zoom’s own technological ambitions. And depending on who you ask, it’s either an ingenious display of practical engineering or a hollow claim that appropriates credit for the work of others.
How Zoom built an AI traffic controller instead of training its own model
Zoom did not train its large language model. Instead, the company developed what it calls "Federated AI approach" – A system that queries multiple existing models from OpenAI, Google, and Entropic, then uses proprietary software to select, combine, and refine their results.
At the heart of this system sits what Zoom calls it "Z Scorerfor , for , for , ." A method that evaluates the responses of different models and selects the best one for any given task. The company associates with what it describes as "-Explore approved-federal strategies," An agent workflow that balances exploratory reasoning with verification across multiple AI systems.
"Our federated approach combines Zoom’s own miniature language model with modern open source and closed source models," Huang wrote. Framework "Orchestrates diverse models to generate, challenge, and improve reasoning through dialectical collaboration."
Simply put: Zoom built a sophisticated traffic controller for the AI, not the AI ​​itself.
That distinction makes a huge difference in an industry where bragging rights — and valuations of billions — are often up for grabs over who can claim to be the most capable model. Major AI labs have spent millions of dollars training frontier systems on vast computing clusters. In contrast, Zoom’s achievement rests on the clever integration of these existing systems.
Why AI researchers are divided over what counts as true innovation
The response from the AI ​​community was swift and sharply divided.
Max Rumpfan AI engineer who says he has trained sophisticated language models, posted a critical critique on social media. "Zoom together with API calls Gemini, GPT, Claude et al. And slightly improved on a benchmark that offers no value to its customers," He wrote. "They then claim Suta."
Rumpf did not reject the technical approach itself. He said that he is using multiple models for different tasks "Actually pretty smart and most applications should do this." He points to Sierra, an AI customer service company, as an example of how this multi-model strategy has been executed effectively.
His objection was more specific: "He didn’t train the model, but hinted at the fact in the tweet. The injustice of taking credit for the work of others sits deep with people."
But other observers saw the success differently. Hongcheng Zhuone developer, offered a more measured evaluation: "To level up the AI, you’ll likely need model federation, like Zoom did. An analogy is that every Kaggle competitor knows that you have to connect models to win the competition."
Comparison Cagle – competitive data science platforms where combining multiple models is standard practice among winning teams – rejects Zoom’s approach as industry best practice rather than sleight of hand. Academic research has long established that ensemble methods routinely outperform individual models.
Still, the debate exposed a fault line in how the industry understands growth. Ryan Premfounder of Axoria AI, was fired: "Zoom is just using and reporting one around another LL.M. It’s just noise." Another commenter captured the sheer unexpectedness of the news: "That video conferencing app Zoom produced a SOTA model that achieved 48% HLE was not on my bingo card."
Perhaps the most important criticism priorities. Rumpf argued that Zoom could have diverted its resources to its customers’ actual problems. "Further retrieval from call transcripts is not ‘resolved’ by SOTA LLMS," He wrote. "I think Zoom users will care more about this than HLE."
Microsoft veterans are betting their reputation on a different kind of AI
If Zoom’s benchmark results come out of nowhere, its chief technology officer didn’t.
Xudong Huang joins Zoom from Microsoft, where he spent decades building the company’s AI capabilities. He founded Microsoft’s Speech Technology Group in 1993 and led teams that developed what he described as the human equivalent of speech recognition, machine translation, natural language understanding, and computer vision.
Huang holds a Ph.D. in Electrical Engineering from the University of Edinburgh. He is an elected member National Academy of Engineering And American Academy of Arts and Sciencesas well as companions of both IEEE And ACM. His credentials make him one of the most successful AI executives in the industry.
Its presence at Zoom signals that the company’s AI ambitions are serious, even if its methods differ from the research labs that dominate the headlines. In his tweet celebrating the benchmark results, Huang framed the achievement as a validation of Zoom’s strategy: "We have unlocked strong capabilities in exploration, reasoning and multi-model collaboration, surpassing the performance limitations of any model."
That last clause – "Exceeding the performance limits of any model" – may be the most important. Huang isn’t claiming that Zoom has created a better model. He is claiming that Zoom has created a better system for using models.
Inside a test designed to stump the world’s smartest machines
At the center of this controversy is quality, Humanity’s final testwas designed to be exceptionally difficult. Unlike earlier tests that AI systems learned to play through pattern matching, the HLE presents problems that require real understanding, multilevel reasoning, and synthesis of information across complex domains.
The exam is based on questions from experts from around the world, spanning from advanced mathematics to philosophy to specialized scientific knowledge. A score of 48.1 percent may seem unimpressive to someone accustomed to school grading curves, but in the context of HLE, it represents the current ceiling of the machine’s performance.
"The benchmark was developed by subject matter experts worldwide and has become an important metric for measuring AI’s progress toward human-level performance on challenging intellectual tasks." Zoom’s announcement is noted.
The company’s improvement of 2.3 percentage points over Google’s previous best may look modest in isolation. But in competitive benchmarking, where gains often come in fractions of a percent, such a jump commands attention.
What Zoom’s Vision Reveals About the Future of Enterprise AI
Zoom’s approach has implications that go beyond benchmark leaderboards. The company is signaling a vision for enterprise AI that is fundamentally different from the model-centric strategies it has pursued. Open Eyefor , for , for , . Anthropicand Google.
Instead of betting everything on building the most capable model, Zoom is positioning itself as the orchestration layer — a company that can integrate the best capabilities from multiple providers and deliver them through products that businesses already use every day.
This strategy runs up against a significant uncertainty in the AI ​​market: No one knows which model will be the best next month, let alone next year. By building infrastructure that can be exchanged between providers, Zoom avoids vendor lock-in while theoretically offering users the best available AI for any task.
Announcement of Openai’s GPT-5.2 The next day pointed out this dynamic. Openai’s own communication named a Zoom colleague who reviewed the new model’s performance "Their AI workloads and saw measurable benefits across the board." Zoom, in other words, is a customer of Frontier Labs and now a competitor on their standards – using their own technology.
This arrangement can be sustainable. Large model providers have every incentive to sell API access widely, even to companies that can aggregate their results. The more interesting question is whether Zoom’s orchestration capabilities constitute real intellectual property or just sophisticated prompt engineering that others can copy.
The real test comes when Zoom’s 300 million users start asking questions
Zoom titled its announcement section about industry relations "A collaborative futurefor , for , for , ." And Huang struck thank-you notes. "The future of AI is collaborative, not competitive." He wrote. "By combining the best innovations from across the industry with our own research advances, we create solutions that are greater than the sum of their parts."
This framing position zooms in as a beneficial integrator, bringing together the industry’s best work for the benefit of enterprise customers. Critics see something else: a company claiming the prestige of an AI laboratory without the underlying research that backs it up.
This debate will likely be decided by products, not by leaderboards. when AI Companion 3.0 As Zoom reaches hundreds of millions of users in the coming months, they’ll make their judgments — not on benchmarks they’ve never heard of, but on whether the meeting summary actually matters, whether the action items made sense, whether the AI ​​saved them time or wasted it.
In the end, Zoom’s most outrageous claim may not be that it topped a benchmark. There may be an implicit argument that in the age of AI, the best model isn’t the one you build – it’s the one you know how to use.