Enterprise leaders have joined a reliable program for nearly two decades. VB transform brings people to develop real enterprise AI strategies. Get more information
Anthropic CEO Dario Amody made a Instant push In April, you need to understand how AI model thinks.
It comes at an important time. As an anthropic Battles In the global AI ranking, it is important to note that it is separated from the second high AI labs. Since its formation in 2021, when seven Open I Employees Broken On concerns about AI safety, Anthropic has created AI models that follow a set of human value principles, a system that they call. Constitution Ai. These principles make sure the models are “Helpers, honest and harmless“And generally work in the best interests of society. At the same time, the research arm of Anthropic is deeply diving to understand what his models think about the world, and Why? They prepare (and sometimes harmful) answers.
At the launch in February, Anthropic’s flagship model, Claude 3.7 Sonate, dominated the coding benchmark, which proves that AI’s model can perform well on both performance and safety. And Claude 4.0 Ops and Swant’s recent release put the cloud on the Lord once again Top of coding benchmark. However, in today’s fast and extremely competitive AI market, Google’s Gemini 2.5 Pro and Open AIKO3 rivals, such as Anthropic’s rivals, have their own impressive exhibition of coding capacity, while they are. Already gaining dominance Claude, creative writing in mathematics and overall argument in many languages.
If Amodi’s views are an indication, he is planning for the future and its implications in important fields such as anthropic medicine, psychology and the law, where model protection and human values ​​are essential. And it shows: Anthropic is a well -known AI lab that is strictly focused on the development of the “interpretation” AI, which are models that are certainly allowed us to understand, what the model is thinking and how it comes to a particular conclusion.
Amazon and Google have already invested billions of dollars until they produce their AI model, so perhaps the competitive advantage of anthropic is still emerging. The interpretation model, as shown by humanity, can significantly reduce the long -term operational costs associated with debugging, auditing and diminishing risks in complex AI deployments.
Asia KapoorAn AI Safety researcher, suggests that although interpretation is valuable, it is one of the many tools to handle the risk of AI. In his view, “interpretation is neither necessary nor sufficient” to treat models safely-when folds with filters, certificates and human center design, it makes the most difference. It views more broad theory translation as part of a major ecosystem of control strategies, especially in the real world AI deployments where models contain components in broader decision -making systems.
Required AII AII
Until recently, many people believed that AI is still from years of progress that is now helping Claude, Gemini and Chattagot. Proud Adopt the extraordinary market. Although these models are already pushing the fronts of human knowledge, their widespread use is the reason why they are good at solving many practical issues that require a creative problem or a detailed analysis. Since the models are rapidly worked on important issues, it is important that they give the right answers.
Amodi fears that when an AI responds to an indicator, “We have no idea … why it chooses some words than others, or sometimes makes a mistake even though it is usually true.” Such errors – misinformation, or reactions that are not in line with human values ​​- will prevent the AI ​​model from reaching its full potential. In fact, we have seen many examples of AI with whom the struggle is continuing. Intrigue And Immoral behavior.
The best way to solve these issues is to understand how an AI thinks: “Understanding the internal mechanisms of models, our incompetence means that we cannot predict such (harmful) behaviors, and therefore we cannot struggle to reject them, if we do not want to look at it. “
Amodi also views the blurry of existing models as a hindrance to the deployment of AI models in high steaks financial or safety critical settings, as we cannot fully fix their behavior, and many mistakes can be very harmful. ” In decision -making that directly affects humans, such as medical diagnosis or mortgage reviews, legal Condition AI needs to explain its decisions.
Imagine a financial institution using a large language model (LLM) to detect fraud – interpretation can mean that a user’s refusal to deny the law can be explained by the law. Or a manufacturing firm that improves supply chain – why AI advises a special supplier to unlock utility and prevent unexpected obstacles.
Because of this, Amodi explains, “is doubled on anthropic interpretation, and by 2027, we have the purpose of ‘interpreting interpretation of most models reliably.”
For this purpose, Anthropic recently participated in $ 50 million Investment I GoodfireAn AI Research Lab AI makes progress on “brain scans”. Their model is the inspection platform, amber, a engineostatic tool that identifies the concepts within models and allows users to manipulate them. In a recent DEMOThe company showed how Amber could recognize individual visual concepts within the Generation AI and then allow consumers Painted These concepts on Canvas to create new images that follow the user’s design.
The investment of anthropic in Amber indicates the fact that the interpretation model is so difficult to develop that anthropic does not have the manpower to seek interpretation itself. Creative Explanatory models require new new tool chains and skilled developers to build them
Widely context: an AI researcher’s view
To break the vision of Amody and add the most essential context, Venture Bat interviewed the AI ​​Safety Researcher in Princeton. Kapoor wrote a joint book of this book AIA critical test of exaggerated claims around the capabilities of well -known AI models. He is also the co -authored of “AI as a normal technology“In which he advocates for the treatment of AI as a standard, change tool such as the Internet or electricity, and promotes a realistic approach to its integration into everyday systems.
Kapoor does not dispute that interpretation is valuable. However, she is suspected of treating her as the main pillar of AI alignment. “This is not a silver pill,” Kapoor told Venturebat. He said that many effective techniques after safety, such as filtering after the reaction, do not need to open the model at all.
He has also warned that researchers call “misunderstanding of unrelated”. The idea that if we do not fully understand the internal of a system, we cannot use or manage it responsibly. In practice, complete transparency is not how most technologies are evaluated. The important thing is whether a system performs reliably in real circumstances.
This is not the first time that Amody has warned of AI’s risks to advance our understanding. In October 2024 Post“Loved Grace Machines,” he quickly outlines the vision of capable models that can take the meaningful steps of the real world (and probably double our lives).
According to Kapoor, here is to create an important distinction between a model Capability And her Strength. The model’s capabilities are undoubtedly increasing rapidly, and they can soon develop enough intelligence to find solutions to many complex problems challenging humanity. But a model is just as powerful as we provide to interact with the real world, including where and how the models are deployed.
Amody has argued separately that the United States should maintain the lead in the development of AI, partially Export controls It limits access to a powerful model. The idea is that authoritarian governments can use the Frontier AI system irresponsibly – or occupy the geographical political and economic edge that comes with their deployment first.
For Kapoor, “even the biggest export control supporters agree that it will give us in a year or two.” He thinks that we should treat AI as.Normal technology“Like electricity or the Internet. While revolutionary, it took decades to fully understand both technologies throughout society. Kapoor believes that it is the same for AI: the best way to maintain the geographical political edge is to effectively use AI to use the” longer game “to use AI to effectively use.
Other emodes criticize
Kapoor is not the only one who criticizes Amody’s position. In the Vivateck of Paris last week, Johnson Huang, CEO of Ann Woodia, Announced its disagreement With the thoughts of Amody. Huang asked whether the authority to develop AI should be limited to a few powerful institutions like anthropic. He said: “If you want things to be done safely and responsibly, you open it up … Don’t do it in a dark room and tell me that it’s safe.”
In response, anthropic Described: “Dario has never claimed that ‘only anthropic’ can form a safe and powerful AI. As public records show, Dario has advocated the quality of national transparency for AI developers (including anthropic), so the public and policy -makers are aware of the skills and dangers.”
It is also worth noting that Anthropic is not alone in achieving its interpretation: Google’s Deep Mind interpretation team, led by Neil Nanda, has also been done. Serious contributions To research interpretation.
After all, high AI labs and researchers are providing strong evidence that interpretation in the competitive AI market can be an important discrimination. Enterprises that prefer the interpretation initially can achieve a more reliable, compliant, and a major competitive edge by building a compliant AI system.