- The throat face has launched an AI tool to navigate to the web by you
- Open computer agent uses a real web browser such as getting instructions or booking tickets to complete tasks
- The agent and his open source demo can see what is on the screen, click on the buttons, fill out the form, and move step by step through tasks like humans.
The throat face has introduced its tech on the growing number of semi -independent AI agents that can operate online for people. New and free (if a limited) open computer agent lives in your web browser personal auxiliary residence.
Part of the company’s ongoing “assessment” move, open computer agent may engage with websites and apps like you, can handle a hidden mouse and keyboard to complete applications. AI can open a browser, type things in shapes, click on buttons, and more. Say to find instructions, and it will go to Google’s maps, enter the original and destination, and show you the way like a digital shaffer.
You can try it directly from the demo. Fair warning, its popularity is making some delays and mistakes due to the back blog.
We are launching computer use in assets! 🥳-> As the vision models become more capable, they are able to power the complex agent’s work flow. Particularly the QWen-VL model, which supports built-in grounding, ie the ability to find any element in any picture through its points, thus… pic.twitter.com/mi8muwzkisMay 6, 2025
Agent AI
Open computer agent is a different philosophy of the idea that has produced similar tools like Openi operator, browser use, proxy 1.0, and opera’s browser operator. Like these tools, embracing a facial AI agent is also about being an active partner rather than an inactive source of information.
Like the use of a browser, the open computer agent is open source, meaning anyone can see how it works and how it can build on it, or at least gives it a combination for niche use matters. The agent is the beginning of a more flexible thing, not a product manufactured with a million legal withdrawal. This also means that the demo is exactly the same, not a demonstration, polished package. This can make things wrong and you need to jump into login and captcha tests.
Booking tickets, checking stores, searching, searching for instructions, and clicking through menus are all things that many people will be able to do with the same natural language indicators. It is a thing to ask Chattagpat how to find cheap flights. It is another thing to go to a troll website, scroll through the listing, and try to click on the “Book” right now.
This may be poor and faster, but open computer agents represents AI’s approach, which can now be like AI image generators everywhere.