Beidou Unveils Proprietary Ernie 5

by SkillAiNest

Beidou Unveils Proprietary Ernie 5

Just hours after Openai updated its flagship foundation model GPT5 to GPT5.1, the Chinese search giant, to reduce overall token usage and have a friendlier personality. Beidou has unveiled its next-generation foundation model, the Ernie 5.0. With a suite of AI product upgrades and strategic international expansion.

The goal: to position itself as a global contender in the increasingly competitive enterprise AI market.

Announced at the company’s Baidu World 2025 event, Ernie 5.0 is a proprietary, native omni-modal model designed to jointly process and produce content in text, images, audio and video.

Unlike Baidu’s recently released Ernie-4.5-VL-28B-A3B-Thinking, which is open source under an enterprise-friendly and permissive Apache 2.0 license, Ernie 5.0 is a proprietary model and is only available through Ernie Boot of Bedouin website (I need to manually select from the model picker dropdown) and Qianfan Cloud Platform Application Programming Interface (API) for enterprise users.

Along with the model launch, Baidu introduced major updates to its Digital Human platform, no-code tools, and general-purpose AI agents—all aimed at expanding its AI footprint outside of China.

Along with the general preview model, the company also introduced a different term for text-based tasks, which balance the modes.

Baidu emphasized that Ernie 5.0 represents a shift in how intelligence is deployed at scale, with CEO Robin Lee saying: “When you internalize AI, it becomes a native capability and transforms intelligence from a cost to a source of productivity.”

Where the Ernie 5.0 beats out the GPT-5 and the Gemini 2.5 Pro

Ernie 5.0’s benchmark results show that Baidu has achieved parity with Western Foundation’s top model across a wide range of tasks.

In public benchmark slides shared during the Baidu World 2025 event, the Ernie 5.0 preview matches or matches OpenAI’s GPT-5 Hi and Google’s Gemini 2.5 Pro. Multimodal reasoning, document understanding, and image-based QAwhile Demonstrated strong language modeling and code execution capabilities.

The company emphasized its ability to deal with common inputs and outputs, rather than relying on post-hoc fusion, which it developed as a technological differentiator.

On visual tasks, the Ernie 5.0 achieved leading scores on Acrbench, DOCVQA, and ChartQA, three benchmarks that test document recognition, understanding, and structured data reasoning.

Baidu claims the model beat both the GPT 5-High and Gemini 2.5 Pro on these document- and chart-based benchmarks, in areas it describes as enterprise applications such as automated document processing and financial analysis.

In image generation, Ernie 5.0 tied or exceeded Google’s VEO3, including semantic alignment and image quality, according to an evaluation based on Baidu’s internal benchmarks. Baidu claimed that the model’s multimodal integration allows it to produce and interpret visual content with greater context awareness than models that rely on model-specific encoders.

For audio and speech tasks, Ernie 5.0 demonstrated competitive results with the MM-AU and TUT2017 audio comprehension benchmarks, as well as query response from spoken language inputs. Its audio performance, while not as heavily emphasized as vision or text, suggests a broad capability footprint aimed at supporting full-spectrum multimodal applications.

In language tasks, the model showed robust results on following instruction, factual question answering, and mathematical reasoning. These are the areas that define the enterprise utility of major language models.

A preview of 1022 variants of Ernie 5.0, designed for textual performance, appeared in early developer access beyond language-specific results. Although Baidu does not claim vast superiority in general language reasoning, its internal evaluation shows that the Ernie 5.0 preview closes the gap with 1022 high-level English language models and outperforms them in Chinese language performance.

Although Beidou hasn’t publicly released full benchmark details or raw scores, its performance positioning makes the Ernie 5.0 a deliberate effort not as a niche multimodal system, but as a flagship model competing with the biggest closed models in general-purpose reasoning.

Where Baidu claims the clear superiority lies in document understanding, visual chart reasoning, and the integration of multiple methods into a single, spatial modeling architecture.. These results are pending independent validation, but the breadth of claimed capabilities positions Ernie 5.0 as a serious alternative in the multimodal foundation model landscape.

Enterprise Pricing Strategy

Ernie is positioned in 5.0 Premium end Biido’s model pricing structure. The company has released specific pricing for API usage on its Qianfan platform, and it’s cost-competitive with other high-end offerings from Chinese rivals like Alibaba.

Model

Input Cost (per 1K Token)

Output cost (per 1K tokens)

Source

Ernie 5.0

85 0.00085 (Â¥0.006)

$0.0034 (Â¥0.024)

Qianfan

Ernie 4.5 Turbo (eg)

$0.00011 (Â¥0.0008)

$0.00045 (Â¥0.0032)

Qianfan

Qwen3 (coder prefix)

85 0.00085 (Â¥0.006)

$0.0034 (Â¥0.024)

Qianfan

The contrast in cost between the Ernie 5.0 and earlier models such as the Ernie 4.5 Turbo marks Baidu’s strategy to differentiate between high-volume, low-cost models designed for complex tasks and multimodal reasoning, and high-capacity models.

Compared to other US alternatives, it’s mid-range in price:

Model

Input (/1 meter token)

Output (/1 meter token)

Source

GPT-5.1

$1.25

$10.00

Open Eye

Ernie 5.0

85 0.85

40 3.40

Qianfan

Ernie 4.5 Turbo (eg)

1 0.11

45 0.45

Qianfan

Cloud Ops 4.1

.00 15.00

.00 75.00

Anthropic

Gemini 2.5 Pro

$1.25 (≤200K) / 50 2.50 (>200K)

$10.00 (≤200K) / $15.00 (>200K)

Google Vertex AI Pricing

Grok 4 (Grok-4-0709)

$3.00

.00 15.00

Xy API

Global expansion: products and platforms

In conjunction with the release of the model, Baidu is expanding internationally:

  • Gene Flow 3.0now with 20M+ users, is the company’s largest general-purpose AI agent and features enhanced memory and multimodal task handling.

  • Fimoa self-evolving agent capable of dynamically solving complex problems, is now commercially available by invitation.

  • Meadowthe international version of Baidu’s new code builder Miyoda, goes live globally Meadow.Dev.

  • Oreta productivity workspace with document, slide, image, video, and podcast support, has reached over 1.2 million users worldwide.

Baidu’s Digital Human platform, already developed in Brazil, is also part of the global push. According to company data, 83% of direct shoppers used Baidu’s digital human tech during this year’s “Double 11” shopping event in China, which contributed to a 91% increase in GMV.

Meanwhile, Baidu’s autonomous ride-hailing service Apollo Go has crossed 17 million rides, operating a driverless fleet in 22 cities and claiming the title of the world’s largest robotaxi network.

The open-source Vision Language model gains industry attention

Two days before the flagship Ernie 5.0 event, Baidu also released an open-source multimodal model under the Apache 2.0 license: the Ernie-4.5-VL-28B-A3B Souch.

As my colleague Michael News reported at VentureBeat, the model activates only 3 billion parameters while maintaining a total of 28 billion, using a mixture-of-experts (MOE) architecture for efficient modeling.

Key technological innovations include:

  • “Thinking with images”, which enables dynamic zoom-based visual analysis

  • Support for chart interpretation, document comprehension, visual grounding, and temporal awareness in video

  • Runtime on a single 80 GB GPU, making it accessible to medium-sized organizations

  • Full compatibility with Transformers, VLLM, and Baidu’s FastDaily toolkits

This release adds pressure on closed source competitors. With Apache 2.0 licensing, Ernie-4.5-VL-28B-A3B-Thinking becomes a viable foundation model for commercial applications without licensing restrictions.

Community feedback and Baidu’s response

Following the release of Ernie 5.0, developer and AI reviewer Lison Al Geb (@scaling01) Posted a mixed review on X. Although initially impressed with the model’s benchmark performance, they reported a persistent issue where Ernie 5.0 would repeatedly request tools – even when not explicitly instructed to – during SVG generation tasks.

“The Ernie 5.0 benchmark looked crazy until I tested it…unfortunately it’s either RL brain damned or they have a serious issue with their chat platform/system prompt,” Lison wrote.

Within hours, Baidu’s developer-based support account, @Anfordis replied:

“Thanks for the feedback! This is a known bug – some syntax can permanently trigger this. We’re working on a fix. You can try changing the hint or hint to avoid it now.”

The quick turnaround reflects Baidu’s growing emphasis on developer communications, especially as it courts international customers through both proprietary and open-source offerings.

Outlook for Baidu and its Ernie Foundation LLM family

Baidu’s Ernie 5.0 Global Foundation marks a strategic addition to the model race. With performance claims that place it on par with OpenAI and Google’s cutting-edge systems and a mix of premium pricing and open access alternatives, Baidu is signaling its ambitions to become not only a domestic AI leader, but a trusted global infrastructure provider.

At a time when enterprise AI users are increasingly demanding multimodal performance, flexible licensing, and deployment efficiency, Baidu’s two-track approach—premium hosted APIs and open source releases—may broaden its appeal to both the corporate and developer communities.

Whether the company’s performance claims are subject to third-party scrutiny. But in a landscape shaped by rising costs, model complexity, and compute constraints, Ernie 5.0 and its supporting ecosystem give Baidu a competitive position in the next wave of AI deployment.

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro