Meta Intelligence Lalama API Openi runs 18x faster than: Serbras Partnership provides 2,600 tokens per second

Join our daily and weekly newsletters for the latest updates and special content related to the industry’s leading AI coverage. Get more information

Method Announced a contribution with today Serbras system For its new power Lalama apiOffer 18 times faster access to developers, traditional GPU -based solutions.

This announcement has been made at the opening of Meta llamacon Developer Conference in Menlo Park, is in position to compete the company directly Open IFor, for, for,. AnthropicAnd Google In the rapidly growing AI inauguration service market, where developers buy tokens by billions to provide electricity to their requests.

During a press briefing, Serbras’s chief marketing officer, Julie Shin Choi, said, “Meta has chosen Serbara for mutual cooperation to provide high -speed interference that she needs to serve the developers through her new Lama API.” “We are really, really excited in Serbas, to announce our first CSP Hypersclar Partnership to evaluate all developers very fast.”

The partnership identifies a formal admission to Meta’s AI computation selling business, and its famous open source lama models have been converted into a commercial service. While the Meta Lama models have gathered One billion downloadsSo far, the company has not offered the first party cloud infrastructure to make applications with the developers.

“This is especially interesting without talking about Serbras,” said James Wang, a senior executive in Serbras. “Openi, Anthropic, Google – He has created a new business from the beginning, which is a business of AI Incration. Developers who are making AI apps will sometimes buy billions of tokens through millions. And they are similar to the new computing instructions that people need to make AI applications.”

In a benchmark chart, Serbas Processing Lama 4 is shown in 2,648 tokens per second, which dramatically competing rivals Sambanova (747), Grook (600) and GPU-based services are by Google and others-selection of Meta’s hardware for Meta’s new API. (Credit: Serbras)

Break Speed Obstacle: How Serbara Supercharges Lama Models

The thing that separates the Meta’s offer is the increase in dramatic speeds that are provided by specialized AI chips of serbras. The Serbras system expires 2,600 tokens per second According to the benchmark, for the Lama 4 scout, about 130 130 tokens per second for Chat GPT and 25 seconds per second for deep sack. Artificial analysis.

Wang explained, “If you only compare Gemini and GPT, based on the API-to-Api, they are all great models, but they all run at the pace of GPU, which is about 100 tokens per second.” “And 100 tokens per second are fine for chat, but it’s too slow to argue. It is very slow for agents. And people are struggling with it today.”

The advantage of this speed fully enables new types of applications that were previously non-practical, including real-time agent, discussed low-Latin sound system, interactive code generation, and instant multi-phase reasoning-all of them need to be chained to several large language model calls.

Lalama api Meta’s AI represents a significant change in strategy, mainly from the model provider to a full service to become a AI infrastructure company. The API is receiving tax from its AI investment by offering the service, keeping the commitment to open the meta models.

Wang noted during a press conference, “Meta is now in the business of selling token, and it is great for the American type of AI environmental system.” “They bring a lot to the table.”

API will offer fine toning and diagnosis tools tools, with which will start Lalama 3.3 8b modelTo allow developers to produce data, train on it and test the quality of their customs model. Meta emphasizes that she will not use customer data to train her models, and models made using Lama API can be transferred to other hosts.

Serbras will provide the new Meta’s new service through a network of data centers located all over North America, including facilities in Dallas, Oklahoma, Minnesota, Montreal and California.

“All our data centers that are currently in North America,” said Choi. “We will serve Meta with full potential for Serbras. The burden of work in all these data centers will be balanced. “

Business arrangements follow that Choi describes a hypersclar as a classic computing provider, “as well, how NVIDIA provides hardware to large cloud providers. “They are saving our computer blocks that they can serve their developer population,” he said.

Beyond the Serbira, Meta has also announced a partnership with GROQ To provide high -speed options, developers provide alternative alternatives to higher performance than traditional GPU estimates.

The enrollment in the API market with high performance measurement of meta can potentially disrupt the order set. Open IFor, for, for,. GoogleAnd Anthropic. Meta is positioning itself as a strong competitor in the commercial AI’s place, connecting the popularity of its open source models with dramatically high -speed individualism.

According to Serbras’s presentation materials, “Meta 3 billion users, hyper scale data centers, and a huge developer are in a unique position with ecosystem.” The integration of serbras technology “Meta Lipfrig helps Openi and Google with about 20x in performance.”

This partnership of Serbras, this partnership represents an important milestone and verification of its special AI hardware approach. Wang said, “We have been building the engine for years on a scale for years, and we always knew that the first rate of technology, but eventually it would have to end as part of someone else’s Hyper Scale Cloud. It was the last target from a business strategy, and we finally reached this milestone.”

Lalama api Currently available as a limited preview, Meta is planning a broader rollout in the coming weeks and months. Developers Ultra Fast Lalama 4 are interested in accessing the 4 -in -conference;

Wang explained, “If you imagine a developer who knows nothing about serbros because we are a relatively smal company, they can click on only two buttons on the standard software SD of Meta, can create an API key, select Serbras on the flag, and then on a large scale, on the flag. “In this way we are just great for us to stay at the back of the entire developer environment system.”

The choice of Meta’s special silicon makes some deep indication: In the next step of AI, this is not the only what your model knows, but also how quickly they can think of it. In this future, speed is not just a feature – that’s the whole thing.

Daily Insights on Business Use Matters with Daily VB

If you want to impress your boss, the VB Daily covers you. We give you internal scope what companies are doing with Generative AI, from regulatory shifts to practical deployments, so that you can share insights for more and more ROIs.

Read our privacy policy

Thanks for subscribing. Check more VB Newsletter here.

There was a mistake.

Break Speed Obstacle: How Serbara Supercharges Lama Models

Editor's pick

Get latest news

Meta Intelligence Lalama API Openi runs 18x faster than: Serbras Partnership provides 2,600 tokens per second

Break Speed ​​Obstacle: How Serbara Supercharges Lama Models

Top 5 Art styles to produce with AI for business and earning ability | Yes Vertra | April, 2025

Hyper WHO? What hypertension? | Grind start -up

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news

Break Speed Obstacle: How Serbara Supercharges Lama Models