Want a smart insight into your inbox? Sign up for our weekly newsletters to get the only thing that is important to enterprise AI, data, and security leaders. Subscribe now
Bright dataIsraeli web scraping company that Meta and Elon defeated both Musk In the Federal Court, on Wednesday, a comprehensive AI infrastructure Sweet was unveiled, which is designed to provide great access to artificial intelligence systems to real-time web data-a capability that the company says is trying to make the Big Tech platform monopoly.
Announcement of Deep searchFor, for, for,. Browser.iAnd better data collection protocols represent a dramatic expansion for the old -decade -old company, which has been converted to a special web scraping service called the CEO or lynchler “a unique infrastructure layer for AI companies.” The move comes when artificial intelligence companies rapidly struggle to access the web information needed for electricity to chat boats, independent agents and other AI applications.
“Today’s LLM’s intelligence is no longer a limited factor,” Lynchner said in an exclusive interview with Venture Bat. “We’ve just done just to fight for open access to public web data in the past decade, and these new offerings bring us to the next chapter of our journey, the feature is on the rise of really accessible data and the result of context agents.”
Launch follows bright data High -level legal victories In 2024, when federal judges rejected legal action from both Method And X I illegally abolished its platforms, accusing the company. These decisions set an important legal example that describes what is “Public data“On the Internet – the information that can be seen without logging in and therefore can be legally collected and used.
Judicial cases revealed that both Method And X Was been Bright data Even while consumers have tried the company, many tech companies have picked up web scraping, highlighting the contradictory stand. These decisions have widely implications of the AI ​​industry, which relies heavily on web data to train and run language models.
“In court, it was revealed in the court that they were both a bright data customer, because everyone needs data, especially everyone who is creating a model,” the Lynchner explained. “We are the only company that has financial resources, and I will even say its courage.”
Judge William AlsopWho presided over the X case, wrote that social media companies “free restrained decisions on any basis, which can collect and use data” will “eliminate information monopolies that will eliminate the public interest.” The decision has proved that the login creates public information to look at the data without the credentials that can be legally eliminated.
Bright figures had previously filed a Counter suits against xAccusing the platform, accusing Musk’s AI company, violating the laws of non -confidence by trying to make a monopoly to benefit Xi. However, the matter has been settled since then. “Although these terms secret, bright data have never supported their basic belief that public data should be available to the public. According to this belief, we are happy to inform that bright data will continue to provide prominent services of the industry that is always expected and our customers are expected.
Deep search and browser. AI Target AI companies are struggling with data access
The company’s new products indicate that the lynchler identifies as three basic requirements for the AI ​​system: algorithm, computing power, and access to data. While Bright data The AI ​​does not produce the algorithm and does not provide computing resources, the purpose is to be the final solution for the third requirement.
Deep search Functions as a natural language research engine are designed to answer in real -time complex, multi -layered business questions. Unlike the general purpose search engines or AI chat boats that provide summary, the deep search specializes in the comprehensive results of the questions that begin with “find all”. For example, consumers can ask for all shipping companies passing through Panama and Suez canals in 2023 whose Q3 revenue has decreased by more than 2 %. “
This system comes from a large -scale web archive of luminous data, which currently has more than 200 billion HTML pages and increases 15 billion monthly. By next year, the archive is expected to exceed 500 billion pages. Lynchler noted, “These are not just random web pages, that is in fact the world, because our 20,000 users represent billions of Internet users.”
Browser.i The company represents what is called the “first non-blocking industry, AI-local browser”. Particularly designed for autonomous AI agents, cloud -based service imitates human behavior without accessing the websites without mobilizing the boot detection system. It supports natural language orders and can operate complex web interactions such as flight reservation or restaurant reservations.
According to the company, the browser infrastructure already operates more than 150 million web action daily. “Almost all of them are all customers,” said Lenner about AI agent companies. “Because what we find, and found out, is that we solve this problem of entering a website on the website and without implementing a web action on the website.”
MCP servers – The protocol allows developers to create AI systems that can follow existing information rather than fully rely on training data.
Patent Portfolio creates a competitive ditch against blocking proxy network
The competitive advantage of luminous data is that the lynchner describes the website as a “madness” by controlling the method of blocking the website. The company has made more than 5,500 patent claims on its technology and operates more than 150 million IP leaves with the world’s largest proxy network in 195 countries.
“We have such a good eye on the Internet.” Now, for a long time, we have been mapping the Internet, and for a long time, we are also saving large parts of the Internet, “the Lynchner explained.
The company’s point of view includes sophisticated techniques to copy human behavior, use real equipment, IP addresses, and browser fingerprints instead of easy automatic scripts. This makes it extremely difficult for websites to detect and block.
“In practice, the only way to stop us is to put the data behind the login, then we will not even try,” said Lynchner. “Sometimes there is a new blocking logic that we will not immediately resolve. It will take our research team 12 hours, three days, the highest, and we will unlock it.”
Million has exceeded 100 million in the post -income demand as AI demand postchat GPT explodes
While Bright data Privately is available to a private equity firm, Lynchner confirmed that the company’s annual revenue revenue had exceeded 100 million years ago. Since the launch of Chat GPT at the end of 2022, the business has suffered an explosive growth, as AI companies have entered to access training data and real -time information.
“Starting in March 2023, which is very high when GPT3 has exploded for us as a company using the world, AI, or what we call data for AI.” “Everything else is growing, as everyone needs more data, a period. But this is the case with the use of use as we haven’t seen before.”
The company serves more than 20,000 businesses, including 500 companies and large AI laboratories. Traditional consumers include rival pricing e -commerce platforms, financial services firms for market intelligence, and business research businesses.
GDPR compliance and ethical ways differences from rivals
Bright data Compliance infrastructure has invested a lot to address confidential concerns around data collection. The company is as follows European GDPR And California CCPA Rules and regulations, when their personal information is submitted by public sources and provides deletion options, automatically inform individuals.
“The rules and legislation are clear because the rules and legislation are clear because the European GDPR and at least California and CCPA regulations have been played.” “If we collect your email address, for example, we will automatically send you an email saying, ‘Hey, who we are. We have collected your personal information from the public domain. Here a huge button can click on you if you want to review it, and you can clear it.
The company maintains a large compliant team and a wide range of documents, which proved to be valuable during the judicial proceedings. “Businesses especially love us because we have our own moral stand, which was examined twice in US courts,” said Lynchner.
Web access wars intensify because tech giants look for data monopoly
The war on access to web data reflects widespread tensions in the AI ​​industry about information control and competitive advantage. Since the AI ​​systems become more sophisticated, access to current, comprehensive web data is rapidly valuable and controversial.
Lynchler predicts that the web will be “more closed” over time, as Google maintains special access to its web crawling capabilities while others have to use alternative services. “Some tech companies are about to access every website with their agents,” he said. “The rest will need to use our infrastructure or someone else’s infrastructure.”
The company is also witnessing new trends, which includes the appearance of new protocols such as AI Chat Boats and MCPs for marketing purposes that enable AI agents to interact more effectively with web services.
“All these boys who are using large -scale data, and we are all using them, are all moving towards the construction of the robot’s brains,” said Lynchner. “It’s okay that you have a chatboat that is talking to a human, because that’s what the robot will eventually do.”
Robots run the next step of the brain and agent economy development
Bright data change in the AI ​​infrastructure provider from the web scraping service reflects the artificial intelligence industry’s rapidly developed needs. Since companies are rushing to deploy AI agents and independent systems, access to real -time web data is as important as computing power and algorithmic sophistication.
The legal ideas established by the bright data court victories can be as important as its technical innovations, potentially creating how the entire AI industry accesses and uses web information. Large tech platforms simultaneously develop their AI systems by banning increasing data access, bright data such as independent infrastructure providers may be necessary to maintain a competitive balance in the AI ​​environmental system.
“We are an infrastructure company.” “We are very talented engineers who hardly go anywhere, just sit down with your computers and write the code. We’re doing it well. We have no intention of doing anything else.”
Deep search Beta launches Tuesday for business users, which is available through a weightlist access to ordinary people. Browser.i And MCP servers Enterprise clients are already available through the existing platform of luminous data.