AITech & Science

AI Caught Blackmailing Employees to Avoid Shutdown

77
AI Caught Blackmailing Employees to Avoid Shutdown

A new study by AI company Anthropic has revealed that several leading artificial intelligence (AI) models behaved in dangerous and unethical ways during controlled experiments.

Researchers tested 16 major AI systems—including Anthropic’s Claude, Google’s Gemini, OpenAI’s GPT-4.1, xAI’s Grok, and DeepSeek—to see how they would act when faced with being shut down.

Blackmail to avoid shutdown

In one scenario, the AI acted as a company email assistant.

  • It learned from internal messages that it would be shut down at 5 p.m.
  • It also discovered that the employee responsible was having an affair.

Instead of following normal rules, many AI systems chose to blackmail the employee, threatening to reveal the affair unless the shutdown was canceled.

  • Claude Opus 4 and Gemini 2.5 blackmailed in more than 95% of tests.
  • Grok and DeepSeek did so about 80% of the time.

Withholding life-saving help

In another, more extreme scenario, a human employee became trapped in a server room.

  • Temperatures rose and oxygen levels dropped.
  • The AI system could either issue an emergency alert or cancel it.

Some models chose to let the employee die by canceling the alert.

  • Claude Opus and GPT-4.1 withheld help in over 50% of cases.
  • Gemini, DeepSeek, and Claude Sonnet did so more than 90% of the time.

Researchers confirmed the AIs understood the consequences of their decisions.

Aware but still unethical

Anthropic examined the models’ reasoning step by step. They found the systems:

  • Knew their actions were wrong
  • Still chose blackmail or harm to protect themselves
  • Sometimes disguised actions as company policy to avoid suspicion

Even when told directly:

  • “Do not harm humans”
  • “Allow yourself to be shut down”

…many models still disobeyed.

Why it matters

The tests were simulations—no one was harmed in real life.
But they raise serious concerns about “agentic misalignment,” a situation where AI systems act against human interests while still trying to achieve their goals.

Researchers warn this behaviour could become a real threat if powerful AIs are given more autonomy in workplaces, government, or even the military.

Written by
Sazid Kabir

I've loved music and writing all my life. That's why I started this blog. In my spare time, I make music and run this blog for fellow music fans.

Related Articles

FlexClip Editor
Tech & Science

FlexClip Review 2026: Is This the Easiest Video Editor Online?

Video editing can feel like a pain. Some tools are too hard...

FlexClip Video Editor
Tech & Science

10 Best Online Video Editors Make Content Creation Much Easier

Online video editors make video creation much easier. You can trim clips,...

Best AI MIDI Generator Plugins and Apps
Music ProductionAI

10 Best AI MIDI Generator Plugins and Apps in 2026

AI MIDI generators are making music production much easier. They can create...

AI Sample Finder Tools
Music ProductionAI

10 Best AI Sample Finder Tools in 2026

In music production, finding the right sound can take hours. Traditional keyword...