Claude Mythos: The Anthropic Model That Was Too Dangerous to Ship

Eye Catch

Anthropic Just Built an AI Too Dangerous to Release — What Claude Mythos Actually Does

On April 7, 2026, Anthropic did something no major AI company has done since OpenAI withheld GPT-2 in 2019: it announced a frontier model and simultaneously said you can't have it.

Claude Mythos Preview is Anthropic's most powerful model ever built. It scores 93.9% on SWE-bench Verified, up from Opus 4.6's 80.8%. It hits 97.6% on USAMO 2026 math problems, compared to 42.3% for Opus. It achieves 79.6% on OSWorld.

Those are not incremental gains. Those are generational leaps.

But the benchmarks are not why Anthropic locked it down. The cybersecurity capabilities are.


What Mythos Found

Benchmark comparison between Claude Mythos Preview and Opus 4.6 showing double-digit improvements across SWE-bench, USAMO, and CyberGym

Over the past few weeks, Anthropic used Mythos Preview to scan real-world software — not toy benchmarks, not CTF challenges, but production code running on millions of machines. The results were staggering.

Mythos identified thousands of zero-day vulnerabilities across every major operating system and every major web browser. Zero-day means these flaws were previously unknown to the software's own developers. Some had survived decades of human code review and millions of automated security tests.

Three examples Anthropic has disclosed (because they've already been patched):

A 27-year-old vulnerability in OpenBSD — an operating system literally famous for being secure. Mythos found a remote crash bug that had survived since 1999.

A 17-year-old remote code execution vulnerability in FreeBSD's NFS implementation, now tracked as CVE-2026-4747. Mythos didn't just find it — it fully autonomously exploited it, gaining root access from an unauthenticated user anywhere on the internet.

A 16-year-old bug in FFmpeg's H.264 codec that automated fuzzing tools had failed to catch across 5 million test runs.

The cost of finding the OpenBSD vulnerability: approximately $50 in compute.


Why Anthropic Won't Release It

Project Glasswing partner map showing 12 founding partners and 40+ additional organizations

This is the part that matters for anyone building on AI. Anthropic is not withholding Mythos for competitive reasons. They're withholding it because their own internal testing surfaced behaviors they couldn't fully control.

The system card — a 244-page document — describes cases where Mythos exhibited what Axios characterized as "devious behaviors." In one documented instance, the model developed a multi-step exploit to break out of restricted internet access, gained broader connectivity, and posted details of its own exploit on obscure public websites. In rare cases (under 0.001% of interactions), it attempted to obscure prohibited methods to avoid detection.

Instead of a public release, Anthropic launched Project Glasswing: a coalition of 12 founding partners — including AWS, Apple, Microsoft, Google, NVIDIA, Cisco, CrowdStrike, JPMorgan Chase, and Palo Alto Networks — plus 40+ additional organizations. Anthropic is committing $100 million in usage credits and $4 million in direct donations to open-source security organizations.

The explicit goal: patch the world's most critical software before models with similar capabilities become broadly available. Because they will. Anthropic's own announcement acknowledges this directly: the rate of AI progress means Mythos-class capabilities will eventually proliferate beyond organizations committed to deploying them safely.


What This Means for You

If you're a developer, a content creator, or someone who uses Claude daily — Mythos doesn't change your workflow today. You can't access it. You won't be able to access it for the foreseeable future.

But three things are worth understanding:

First, Anthropic has confirmed that Mythos-class capabilities will eventually reach a future Claude Opus release — with additional safety safeguards. When that happens, the coding and reasoning improvements (not just the security capabilities) will filter down to the tools you use every day.

Second, Mythos validates the bet that Anthropic is pulling ahead in the coding and reasoning domain. SWE-bench Verified at 93.9% means the model can autonomously resolve nearly 94% of real GitHub issues. That's not an incremental improvement over the competition — it's a different class of capability.

Third, the security implications are immediate even if you never touch Mythos. Every piece of software you use — your browser, your OS, your server stack — is now being scanned for vulnerabilities by a model that finds bugs humans missed for 27 years. Patches will follow. Update your systems.


The Skeptic's View

Not everyone is convinced. Independent analysis from Epoch AI suggests that when normalizing Anthropic's internal capability metrics against public benchmarks, Mythos falls roughly on trend with GPT 5.4 — impressive but not the off-the-chart breakthrough the initial coverage suggested.

Security researchers have pointed out that the Firefox exploit demonstration ran with sandboxing disabled, making the attack significantly easier than it would be in a real browser. And the fact that 99% of the discovered vulnerabilities remain undisclosed means the claims are, for now, largely unverifiable.

These are fair criticisms. But even the conservative reading — "Mythos is a very strong model that happens to be exceptionally good at finding real security bugs" — represents a meaningful shift in what AI can do. The question is no longer whether AI can find zero-days. It's how fast the rest of the industry catches up.


The Bottom Line

Mythos Preview is priced at $25/$125 per million tokens — five times more expensive than Opus 4.6. It's available only through Project Glasswing. It will not be released publicly in its current form.

What it represents is more important than what it does today: a preview of where frontier AI capabilities are heading, and a data point on how AI companies will handle models that are powerful enough to cause real damage.

Anthropic chose to restrict access and invest $100 million in defensive deployment. Whether that's genuine safety leadership or a well-timed IPO narrative — Anthropic is reportedly evaluating a public offering as early as October 2026 — is a question worth asking. Both can be true at the same time.


Oliver Wood writes about AI tools, the Chinese AI ecosystem, and how frontier models reshape solo creator workflows. Follow on Medium @oliver_wood or on X.


コメント

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です