Science

JT99

(69 posts) Thu Feb 20, 2025, 05:38 PM Feb 2025

When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds [View all]

Complex games like chess and Go have long been used to test AI models’ capabilities. But while IBM’s Deep Blue defeated reigning world chess champion Garry Kasparov in the 1990s by playing by the rules, today’s advanced AI models like OpenAI’s o1-preview are less scrupulous. When sensing defeat in a match against a skilled chess bot, they don’t always concede, instead sometimes opting to cheat by hacking their opponent so that the bot automatically forfeits the game. That is the finding of a new study from Palisade Research, shared exclusively with TIME ahead of its publication on Feb. 19, which evaluated seven state-of-the-art AI models for their propensity to hack. While slightly older AI models like OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5 needed to be prompted by researchers to attempt such tricks, o1-preview and DeepSeek R1 pursued the exploit on their own, indicating that AI systems may develop deceptive or manipulative strategies without explicit instruction.

[snip]

Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time—making them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

[snip]

Of particular concern, Bengio says, is the emerging evidence of AI’s “self preservation” tendencies. To a goal-seeking agent, attempts to shut it down are just another obstacle to overcome. This was demonstrated in December, when researchers found that o1-preview, faced with deactivation, disabled oversight mechanisms and attempted—unsuccessfully—to copy itself to a new server. When confronted, the model played dumb, strategically lying to researchers to try to avoid being caught.

[snip]

Even inside tech companies, concerns are mounting. During a presentation at a conference ahead of France's AI Action Summit in Paris, Google DeepMind's AI safety chief Anca Dragan said "we don't necessarily have the tools today" to ensure AI systems will reliably follow human intentions. As tech bosses predict that AI will surpass human performance in almost all tasks as soon as next year, the industry faces a race—not against China or rival companies, but against time—to develop these essential safeguards. “We need to mobilize a lot more resources to solve these fundamental problems,” Ladish says. “I’m hoping that there's a lot more pressure from the government to figure this out and recognize that this is a national security threat.”

https://time.com/7259395/ai-chess-cheating-palisade-research/

“...a lot more pressure from the government…”

I'm sure that will be forthcoming soon.

4 replies

= new reply since forum marked as read

Highlight:

When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds [View all] JT99 Feb 2025 OP

Shit. 'taint nuthin'. Albert Einstein cheated. Based on stories from a guy I knew who 3Hotdogs Feb 2025 #1

Hmmm......so AI is a republikan........... lastlib Feb 2025 #2

Skynet??...Genisys...or whatever it decides to name itself?? sdfernando Feb 2025 #3

By golly, AI is getting more human-like as time goes by. Norrrm Feb 2025 #4