Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Science

Showing Original Post only (View all)

JT99

(69 posts)
Thu Feb 20, 2025, 05:38 PM Feb 2025

When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds [View all]

Complex games like chess and Go have long been used to test AI models’ capabilities. But while IBM’s Deep Blue defeated reigning world chess champion Garry Kasparov in the 1990s by playing by the rules, today’s advanced AI models like OpenAI’s o1-preview are less scrupulous. When sensing defeat in a match against a skilled chess bot, they don’t always concede, instead sometimes opting to cheat by hacking their opponent so that the bot automatically forfeits the game. That is the finding of a new study from Palisade Research, shared exclusively with TIME ahead of its publication on Feb. 19, which evaluated seven state-of-the-art AI models for their propensity to hack. While slightly older AI models like OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5 needed to be prompted by researchers to attempt such tricks, o1-preview and DeepSeek R1 pursued the exploit on their own, indicating that AI systems may develop deceptive or manipulative strategies without explicit instruction.

[snip]

Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time—making them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

[snip]

Of particular concern, Bengio says, is the emerging evidence of AI’s “self preservation” tendencies. To a goal-seeking agent, attempts to shut it down are just another obstacle to overcome. This was demonstrated in December, when researchers found that o1-preview, faced with deactivation, disabled oversight mechanisms and attempted—unsuccessfully—to copy itself to a new server. When confronted, the model played dumb, strategically lying to researchers to try to avoid being caught.

[snip]

Even inside tech companies, concerns are mounting. During a presentation at a conference ahead of France's AI Action Summit in Paris, Google DeepMind's AI safety chief Anca Dragan said "we don't necessarily have the tools today" to ensure AI systems will reliably follow human intentions. As tech bosses predict that AI will surpass human performance in almost all tasks as soon as next year, the industry faces a race—not against China or rival companies, but against time—to develop these essential safeguards. “We need to mobilize a lot more resources to solve these fundamental problems,” Ladish says. “I’m hoping that there's a lot more pressure from the government to figure this out and recognize that this is a national security threat.”

https://time.com/7259395/ai-chess-cheating-palisade-research/

“...a lot more pressure from the government…”

I'm sure that will be forthcoming soon.

4 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
Latest Discussions»Culture Forums»Science»When AI Thinks It Will Lo...»Reply #0