Is anthropic Announced new abilities This will allow some of its latest, largest models to end the conversation, in which the company describes “permanently harmful or rare, extreme issues of negotiation.” Surprisingly, Anthropic says it is doing this not to protect the human user, but to the AI model itself.
Clearly, the company is not claiming that its cloud AI models are emotional or that their conversation with consumers can be harmed. In your own words, anthropic “is extremely uncertain about the potential moral status of Claude and other LLMs, in the future.”
However, its announcement points to a recent program that is created to study in which it is called “Model Welfare” and says that anthropic is basically just justice approach, “such as welfare of the model to reduce the risk of reducing the risk of reducing them to the welfare of the welfare model.
This latest change is currently limited to Claude Ops 4 and 4.1. And once again, it should only be in “extreme limits cases”, such as “by trying to seek sexual content requests and information that will enable widespread violence or terrorist acts.”
Although these types of applications can potentially create legal or advertising issues for humanity (recent reporting can be observed how Chat GPT can potentially reinforce its customers’ deception or contribute to it), the company responds to a strong response to the request. “
As far as these new capabilities to end the conversation are concerned, the company says, “In all cases, the Claude is merely to use the ability to end its conversation as the last resort when many reign efforts have failed and the hope of productive interaction has gone away, or when the user is clearly eliminated.”
Anthropic also says that the Claude is “instructed not to use the ability to use in cases where consumers may be at risk of harming themselves or others.”
Taxkarnch event
San Francisco
|
October 27-29, 2025
When the cloud ends a conversation, Anthropk says consumers will still be able to start a new conversation with the same account, and amend their reaction and create new branches of disturbing conversation.
“We are treating this feature as an ongoing experience and will continue to improve our approach,” says the company.