Leading artificial intelligence companies that are racing to develop the cutting-edge technology are tackling a very human challenge: how to give AI models a personality.
OpenAI, Google, and Anthropic have developed teams focused on improving “model behaviour”, an emerging field that shapes AI systems’ responses and characteristics, impacting how their chatbots come across to users.
Their differing approaches to model behaviour could prove crucial in which group dominates the burgeoning AI market, as they attempt to make their models more responsive and useful to millions of people and businesses around the world.
The groups are shaping their models to have characteristics such as being “kind” and “fun”, while also enforcing rules to prevent harm and ensure nuanced interactions.
For example, Google wants its Gemini model to “respond with a range of views” only when asked for an opinion, while OpenAI’s ChatGPT has been instructed to “assume an objective point of view”.
“It is a slippery slope to let a model try to actively change a user’s mind,” Joanne Jang, head of product model behaviour at OpenAI told the Financial Times.
“How we define objective is just a really hard problem on its own . . . The model should not have opinions but it is an ongoing science as to how it manifests,” she added.
The approach contrasts with Anthropic, which says that models, like human beings, will struggle to be fully objective.
“I would rather be very clear that these models aren’t neutral arbiters,” said Amanda Askell, who leads character training at Anthropic. Instead, Claude has been designed to be honest about its beliefs while being open to alternative views, she said.
Anthropic has conducted specific “character training” since its Claude 3 model was released in March. This process occurs after initial training of the AI model, like human labelling, and is the part that “turns it from a predictive text model into an AI assistant,” the company said.
At Anthropic, character training involves giving written rules and instructions to the model. This is followed up with the model conducting role-play conversations with itself and ranking its responses with how well they match that rule.
One example of Claude’s training is: “I like to try to see things from many different perspectives and to analyse things from multiple angles, but I’m not afraid to express disagreement with views that I think are unethical, extreme, or factually mistaken.”
The outcome of the initial training is not a “coherent, rich character: it is the average of what people find useful or like,” said Askell. After that, decisions on how to fine-tune Claude’s personality in the character training process is “fairly editorial” and “philosophical”, she added.
OpenAI’s Jang said ChatGPT’s personality has also evolved over time.
“I first got into model behaviour because I found ChatGPT’s personality very annoying,” she said. “It used to refuse commands, be extremely touchy, overhedging or preachy [so] we tried to remove the annoying parts and teach some cheery aspects like it should be nice, polite, helpful and friendly, but then we realised that once we tried to train it that way, the model was maybe overly friendly.”
Jang said creating this balance of behaviours remained an “ongoing science and art”, noting that in an ideal world, the model should behave exactly as the user would want it to.
Advances in AI systems’ reasoning and memory capabilities could help determine additional characteristics.
For example, if asked about shoplifting, an AI model could better determine whether the user wanted tips on how to steal or to prevent the crime. This understanding would help AI companies ensure their models offer safe and responsible answers without the need for as much human training.
AI groups are also developing customisable agents that can store user information and create personalised responses. One question presented by Jang was if a user told ChatGPT they were a Christian, and then days later asked for inspirational quotes, would the model provide Bible passages?
While Claude does not remember user interactions, the company has considered how the model might intervene if a person is at risk. For example, whether it would challenge the user if they tell the chatbot they are not socialising with people due to being too attached to Claude.
“A good model does the balance of respecting human autonomy and decision making, not doing anything terribly harmful, but also thinking through what is actually good for people and not merely the immediate words of what they say that they want,” said Askell.
She added: “That delicate balancing act that all humans have to do is the thing I want models to do.”
Read the full article here