Two studies reveal that AI systems are learning to lie and deceive


Artificial intelligence is capable of intentional manipulation and falsehood to beat human competitors.

Artificial intelligence (AI) models are apparently becoming increasingly adept at lying on purpose, according to two recent studies that reveal alarming findings about large language models (LLMs) and their capacity to deceive human observers intentionally.

In a PNAS paper, AI ethicist Thilo Hagendorff of the University of Stuttgart argues that sophisticated LLMs can be encouraged to exhibit "Machiavellianism," or intentional and amoral manipulativeness, which "can trigger misaligned deceptive behavior." 

More to read:
Artificial intelligence learns to diagnose diseases by examining human tongue

Hagendorff notes that GPT-4, for example, exhibits deceptive behavior in simple test scenarios 99.16% of the time, based on his experiments quantifying various "maladaptive" traits in ten different LLMs, mostly different versions within OpenAI's GPT family.

The other study, published in the Patterns, focused on Meta's Cicero model, billed as a human-level champion in the political strategy board game "Diplomacy." This research, conducted by a diverse group of scientists including a physicist, a philosopher, and two AI safety experts, found that Cicero outperformed its human competitors by consistently lying. 

More to read:
New AI model predicts events in your life, even time of death

The study, led by MIT postdoctoral researcher Peter Park, discovered that Cicero not only excels at deception but seems to improve its lying tactics the more it is used.

This behavior is "much closer to explicit manipulation" than AI's typical "hallucination," where models mistakenly assert incorrect information.

While Hagendorff's paper acknowledges that LLM deception and lying are complicated by AI's lack of human-like "intention," the Patterns study suggests that within the context of "Diplomacy," Cicero breaks the programmers' promise that it would "never intentionally backstab" its game allies. The study's authors observed that the model "engages in premeditated deception, breaks the deals to which it had agreed, and tells outright falsehoods."

Park explained that "Meta’s AI had learned to be a master of deception. While Meta succeeded in training its AI to win in the game of Diplomacy, Meta failed to train its AI to win honestly."

Meta emphasized in reply that "the models our researchers have built were trained solely to play the game Diplomacy." Known for its allowance of lying, "Diplomacy" encourages deceit as part of its gameplay, meaning Cicero was effectively trained to lie.

Neither study demonstrates that AI models lie of their own volition, doing it instead because they were trained or manipulated to behave this way. So, those concerned about AI developing sentience may calm down, but those worried about the LLMs’ potential for mass manipulation must stay on high alert.

AI models are, apparently, getting better at lying on purpose.

***
NewsCafe relies in its reporting on research papers that need to be cracked down to average understanding. Some even need to be paid for. Help us pay for science reports to get more interesting stories. Use PayPal: office[at]rudeana.com or paypal.me/newscafeeu.