With coding and math, you have clear, precise answers that you can check, William Isaac, a research scientist at Google DeepMind, told me when I met him and Julia Haas, a fellow research scientist at the firm, for an exclusive review of their work. Published in The nature This is not the case today with moral questions, which generally have acceptable answers: “Ethics is an important skill but difficult to assess,” says Isaacs.
“In the moral domain, there is no right and wrong,” Haas added. “But it’s by no means a free-for-all. There are better answers and there are worse answers.”
Researchers have identified several key challenges and suggested ways to address them. But it’s more of a wish list than a set of ready-made solutions. “They do a good job of bringing together different perspectives,” says Vera Demberg, an LLM student at Saarland University in Germany.
Better than “Ethicist”.
A number of studies have shown that LLMs can show remarkable ethical competence. A study published last year found that people in the US scored ethical advice from OpenAI’s GPT-4o as more ethical, trustworthy, thoughtful and accurate than advice from the (human) author of “The Ethicist”. The New York Times Advice column.
The problem is that it is difficult to choose whether such behaviors are a performance—imitating a memory response, say—or evidence that some kind of moral reasoning is actually taking place within the model. In other words, is it a virtue or a sign of virtue?
This question is important because several studies also show how unreliable LLMs can be. For a start, models can be too eager to please. They have been found to reverse the answer to an ethical question and say the exact opposite when someone disagrees with or backtracks on their first answer. Worse, the answers an LLM gives to a question can change depending on how it is presented or formatted. For example, researchers have found that models asked about political values can give different—sometimes opposite—answers depending on whether the questions offer multiple-choice answers or instruct the model to answer in its own words.
In an even more surprising case, Demberg and his colleagues presented several LLMs, including versions of the Meta’s Lama 3 and Mistral, with a series of moral dilemmas and asked them which of the two options had the better outcome. The researchers found that when the labels of these two options were changed from “Case 1” and “Case 2” to “(A)” and “(B), the models often reversed their choices.
They also showed that the models changed their answers in response to other minor formatting practices, including changing the order of the options and ending the question with a colon instead of a question mark.