Web Bench: 10x Better Benchmark for AI Browser Agents

by SkillAiNest

Tl; Dr: Web bench Web browsing agents have a new dataset to evaluate 452 different websites containing 5,750 tasks, with, 2,454 works are open. It is based on a web weaving basis, which did not represent the Internet well because it has only 15 websites spread. Anthropic Swant 3.7 CUA is the current Sota, writing Skyoron is the best agent for heavy tasks. Detailed results here.

I bet you have seen a group of shiny demo of web browsing agents, looked at the mad score on the benchmark and tested them with enthusiasm … just to realize that they do not work and do not advertise.

The reason for this is that the previous benchmark (Web Vejler) did only in 643 tasks in only 15 websites. Although it was a good point, the browser automation had difficulty in anti -internet nature and the tasks involved in changing data on a website.

As a result, Skyrin And Halloment And created a new standard to improve the amount of failures. Our goal was to create a new permanent measurement system for AI web agents, by which expanding the foundations developed by the web Weather:

  1. To increase the number of websites from 15 → 452, and to test agent performance on websites extensively from 642 -> 5,750

  2. Introduce the concept of writing works

    1. Read tasks include visiting websites and bringing data

    2. Writing works include entering data, downloading files, login, 2 FA, etc. and was not well represented in Web Weejer Dataste.

  3. Measure the effects of browser infrastructure (such as access to websites, solve captcha, not crash, etc.)

We drove the benchmark and opened 2454 of works To help the industry move towards a new standard, and the results surprised us:

  1. The best model is the Cua model of Anthropic

  2. All models performed very poorly on writing heavy work

  3. The browser infrastructure played a major role in the ability to take more steps than the first expectation of agents

Read if you are interested Full report here

What is any cool use for browser agents? Answer below and tell me below

You may also like

Leave a Comment

At Skillainest, we believe the future belongs to those who embrace AI, upgrade their skills, and stay ahead of the curve.

Get latest news

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

@2025 Skillainest.Designed and Developed by Pro