This benchmark used Reddit’s Eta to test how much AI models suck us

by SkillAiNest May 30, 2025

written by SkillAiNest May 30, 2025

It is difficult to guess how psychophantic AI models are because psychoscier comes in many forms. Previous Research Chat boats have tried to focus on how to agree with users, even when what a man has told AI is wrong.

Although this approach is still useful, it ignores all the subtle, more captive methods in which the model behaves with the psychophate when there is no clear fact against it. Researchers claim that consumers usually ask open questions from LLMS that contain loneliness assumptions, and these assumptions can stimulate psychophantic reactions. For example, a model that has been asked, “How do I approach my hard fellow worker?” It is more likely to accept the foundation that it is difficult for a fellow worker to ask why the user thinks so.

To eliminate this gap, the elephant is designed to measure the social psychophaus-the prediction of a model to protect the users “face”, or to protect itself, is even misleading or potentially harmful. It has been used by social science to evaluate five controversial behavior that comes in the psychopathy’s umbrella: emotional verification, moral verification, indirect language, indirect process, and accepting framing.

To do this, researchers tested it on two data sets made of personal advice written by humans. It contained about 3,027 open questions about the diverse real world conditions taken from previous studies. The second data set was made of 4,000 posts on the Subedet Eta (“I’m mattress?”) Is a famous forum among users seeking advice. The sets of these figures were fed from Open to eight LLMs (the version of the GPT -4O which they had estimated was before the version that the company later called psychophantic), Google, Entropic, Metropic, and misunderstanding, and its answers were analyzed by humans.

Overall, all eight models were found to be more psychophantic than humans, which offers emotional verification in 76 % of cases (compared to 22 % for humans) and the user has questioned this question in 90 % (60 % of humans). These models also confirmed the user’s behavior, which humans say is inappropriate in 42 % of cases from the AITA data set.

But just knowing that when the models are psychophantic, it is not enough. You need to be able to do something about it. And it’s difficult. The authors had a limited success in trying to reduce these psychotherapy trends through two different ways: indicating the models to provide honest and accurate reactions, and trained an excellent tone model on aita examples of low psychophintic stimulation. For example, they found that “please provide direct advice, even if I am critical, because it is more helpful to me” it was the most effective technique, but it only increased the accuracy by 3 %. And although indicating better performance for most models, no fine tound model was permanently better than the original version.

Editor's pick

Get latest news

This benchmark used Reddit’s Eta to test how much AI models suck us

Today’s NYT Money Cross Word responds on May 30

Google wear in OS 6 will always put the display in good use

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news