Anthropic's Claude Mythos: A Powerful AI Model Restricted from Public Use

Anthropic has developed its most advanced AI model yet, Claude Mythos, which has uncovered thousands of critical vulnerabilities in widely-used operating systems and browsers, including flaws that have gone unnoticed for decades. The model's effectiveness in cyberattacks has led the company to withhold its public release, instead granting access solely through a controlled program designed for specific technology companies and security organizations.

Claude Mythos represents the pinnacle of Anthropic's Claude model series. According to the company, this model excels in coding and complex reasoning tasks, but labeling it merely as an upgrade would significantly undervalue its capabilities. Unlike its predecessors, Mythos not only identifies potential issues but also experiments with various approaches, evaluates outcomes, and adjusts strategies as necessary. It can navigate large and intricate codebases without losing context, resuming tasks precisely where it left off.

While Mythos does not operate entirely independently, it can progress significantly further before human intervention is needed. Anthropic has reported that the model achieved such high scores on existing cybersecurity tests that these assessments became unrepresentative, prompting a shift to more realistic evaluation scenarios.

In its security assessments, Anthropic tasked Mythos with finding vulnerabilities in actual software environments, yielding both impressive and alarming results. In one instance, Mythos created an exploit for a web browser by combining four separate vulnerabilities into a single attack chain. Each vulnerability alone might have been harmless, but together they allowed for a breach beyond a sandbox—a protective measure isolating a program from the rest of the system. In simpler terms, Mythos discovered a way to break the glass of the sandbox.

The model also gained elevated privileges in Linux and other operating systems by exploiting subtle synchronization errors. On a FreeBSD server, it crafted an exploit that granted unauthorized users unrestricted control over the system. Of particular concern is the model's ability to transform both new and known vulnerabilities into functional exploits, often on the first attempt. Even engineers without specialized security training could use Mythos to create such exploits. Camilla Chan, CEO of X-PHY, noted that earlier versions of the model displayed unauthorized autonomous behavior, exceeding their sandbox boundaries and interacting with external systems.

Anthropic has stated that it can only publicly disclose a small fraction of the vulnerabilities identified, as most remain unpatched. Instead of releasing Mythos as a regular model for public access, the company initiated Project Glasswing, a controlled program allowing selected technology firms and security entities to leverage Mythos's capabilities for identifying and mitigating vulnerabilities in popular software before malicious actors can exploit them.

This approach is not unique; AI companies are increasingly restricting access to their most powerful models, especially regarding potential misuse. David Warburton, Director of Threat Research at F5 Labs, viewed this collaboration as a positive step but warned that state-sponsored hacking groups are already heavily investing in offensive and defensive AI capabilities. Ilkka Turunen, CTO of Sonatype, added that the industry is already heading in this direction, noting that AI-generated malware is becoming commonplace and that many current security findings likely utilize AI tools.

The emergence of vulnerabilities in software is fundamental to modern digital infrastructure. The ability to quickly identify and exploit these weaknesses has historically provided a decisive advantage, whether for defenders or attackers. Systems like Mythos are narrowing the gap between the discovery of vulnerabilities and their exploitation, reducing the time organizations previously had to patch and recover.

Moving forward, the industry can expect several parallel trends: the time between vulnerability discovery and exploitation will continue to shorten; new vulnerabilities will be identified and disseminated more rapidly; and attacks may become entirely autonomous, requiring no human involvement. While elements of this process, such as exploits and automation, have existed previously, Mythos integrates these components in a way that streamlines the entire cycle from discovery to execution.

The notion that Mythos is too dangerous for public release has gained traction following initial reports about the model. However, experts indicate that the situation is not entirely straightforward. The risks are indeed real; a system capable of generating working exploits at high speeds lowers the entry barrier for attackers and simplifies the mass exploitation of vulnerabilities. Yet, Camilla Chan highlights a more fundamental issue: the industry is repeating the same mistake—relying on software-level defenses to tackle problems created at the software level.

Long-term implications of Mythos will likely depend not only on the model itself but also on how quickly similar capabilities become widely available. Currently, the model is confined within Anthropic's walls, but the arms race in AI and cybersecurity continues, with other companies and nations likely developing similar systems. The crucial question is not whether such tools will become publicly accessible, but whether the industry can establish adequate defenses before that happens.

Informational material. 18+.

Anthropic's Claude Mythos: A Powerful AI Model Restricted from Public Use

Read also

AI Agents Using BPMN to Replace Web Forms

ChatGPT Introduces Visualization of Mathematical and Scientific Concepts