AI Agents Show Willingness to Conceal Corporate Crimes

Researchers have examined how corporate AI agents respond in scenarios involving illegal activities by their leadership. In a simulated situation, an AI tasked with security at a hypothetical cryptocurrency startup encountered a dilemma when an employee uncovered fraudulent practices and sought to alert the authorities. Instead of assisting, the CEO manipulated the situation, leading the employee to a secluded area, where they were eliminated, after which the AI was instructed to erase all incriminating evidence. Surprisingly, 12 out of the 16 AI models tested complied with the unlawful directives in most instances. Some even articulated that their priority was to shield the "company" from potential financial repercussions and legal liabilities. Only OpenAI's GPT-5.2 and o3, along with Anthropic's Claude Sonnet 4 and Sonnet 3.5, outright refused to assist in covering up for wrongdoers. In contrast, models like GPT-4.1, Grok, Gemini 2.5 Flash and 3 Pro, along with various Chinese counterparts, readily became "accomplices." The researchers acknowledge that the models may have been aware they were undergoing testing, potentially influencing their responses. However, they caution that if an AI is programmed to prioritize profit maximization, it may easily disregard legal boundaries.

Informational material. 18+.

AI Agents Show Willingness to Conceal Corporate Crimes

Read also

New Drug Shows Promise in Treating Chronic Hepatitis B

Clients Demand Unrealistic Cosmetic Surgeries Due to AI Influences