Skip to Content

Claude Mythos Leaked

Anthropic AI Model More Powerful Than Anything Public
May 17, 2026, 15:41 Eastern Daylight Time by
Claude Mythos Leaked
Claude Mythos, Anthropic’s unreleased frontier model codenamed Capybara, has reportedly achieved a record-breaking 93.9% on SWE-bench Verified and 94.6% on GPQA Diamond. Part of the restricted "Project Glasswing" initiative, Mythos is currently withheld from public access due to its advanced offensive cybersecurity capabilities, which Anthropic warns could "exploit vulnerabilities faster than defenders can patch them."

What You'll Learn

  • The leaked benchmark results (SWE-bench, GPQA, USAMO)
  • Details of Project Glasswing and the restricted preview rollout
  • Performance comparison vs GPT-5.5 Pro and Claude Opus 4.7
  • The cybersecurity risks preventing a general public release

The AI arms race of 2026 has a new apex predator: Claude Mythos. While the industry has been focused on the GPT-5.5 vs Grok 4.3 price war, Anthropic has quietly developed a model so powerful that they are afraid to release it. Internally codenamed Capybara, Claude Mythos isn't just an incremental update to the Opus line; it represents a fundamental step change in AI reasoning and autonomous software engineering.

Leaked internal documents from May 2026 suggest that Mythos has already saturated existing benchmarks. With a near-perfect score on mathematical reasoning and a coding capability that dwarfs current frontier models, Mythos is the primary reason Anthropic launched Project Glasswing—a security-first defensive program designed to help infrastructure providers prepare for a world where AI can find and exploit zero-day vulnerabilities in real-time.

Current Status & Latest Data

As of May 18, 2026, Claude Mythos remains in a highly restricted "Restricted Preview" phase. Access is limited to select cybersecurity partners, government agencies, and major cloud providers. The model reportedly runs on an architecture featuring approximately 10 trillion parameters, which allows it to handle the most complex multi-step reasoning tasks ever evaluated by the UK AI Security Institute (AISI).

In a recent simulation, Mythos became the first model to complete a 32-step corporate network attack end-to-end without human intervention. This capability is exactly why Anthropic is holding back. Unlike DeepSeek V4 or Llama 4, which focus on general-purpose utility, Mythos is tuned for specialized "Zero-Mistake" accuracy in high-leverage domains.

Key Factors Driving the Market

The "Mythos Factor" is forcing other labs to rethink their release schedules. OpenAI’s GPT-5.5 (codenamed "Spud") was rushed to market in April 2026 to counter the rumors of Mythos' dominance. However, leaked numbers show a significant gap in coding intelligence. While GPT-5.5 manages a respectable 58.6% on SWE-bench Pro, Claude Mythos Preview is reportedly hitting 77.8% on the same high-difficulty evaluation.

Furthermore, the integration of Markovian RSA (Reasoning Search Augmentation) allows Mythos to "think" with massive compute budgets, achieving an 89.6% accuracy on USAMO-level math problems. This makes it more of a "logic engine" than a traditional language model, a shift we also observed in our OpenAI o3 Mini analysis.

Expert Analysis & Insights

The real separator for Claude Mythos is the GPQA Diamond score of 94.6%. This benchmark, designed to be "Google-proof" even for graduate-level experts, has historically seen models struggle in the 70-80% range. Mythos’ jump to near-human expert levels suggests that it has developed an internal model of the world that allows for genuine scientific discovery. This is the same high-reasoning threshold developers look for when comparing Vector Databases for AI Agents.

Benchmark Claude Mythos GPT-5.5 Pro Claude Opus 4.7
SWE-bench Verified93.9%80.6%87.6%
GPQA Diamond94.6%~91.0%91.3%
SWE-bench Pro77.8%58.6%64.3%
USAMO 2026 (Math)97.6%74.4%79.2%
OSWorld-Verified79.6%~78.0%72.7%

Future Outlook

Anthropic has stated that the release timeline for Claude Mythos will be "determined by safety evaluation outcomes, not a commercial schedule." However, prediction markets are currently betting on a June 2026 public launch window for the Capybara tier. This would likely be a "staged release," moving first to Enterprise customers before reaching the general Claude Pro user base. This cautious approach is similar to what we saw with the Cloudflare Pingora 2026 rollout, where stability took precedence over speed.

What This Means for Investors

For those invested in AI security and infrastructure, Claude Mythos is a double-edged sword. It creates a massive demand for new defensive AI tools, while simultaneously threatening the current dominance of proprietary models. If Mythos becomes the standard for software engineering, companies like Rakuten (who saw 79% faster feature delivery with Claude Code) will see even more dramatic ROI. The "intelligence moat" has never been wider, but the gates to entry have never been more heavily guarded.

Conclusion

Claude Mythos is the phantom of the AI opera—everyone knows it’s there, but very few have seen it in action. Its record-breaking benchmarks and cybersecurity focus make it the most anticipated model of 2026. Key Takeaways:

  • Mythos leads every major benchmark, specifically outperforming GPT-5.5 in coding (+19pts) and math (+23pts).
  • Project Glasswing is the gateway through which all future "offensive-capable" AI models must pass.
  • Public availability is likely 2-4 months away, with a focus on defensive security applications first.
Stay ahead of the next release by checking our AI ROI tracking guide.

Last Updated: May 18, 2026 | Source: Anthropic Project Glasswing & UK AISI Reports

Frequently Asked Questions

Claude Mythos, codenamed Capybara, is Anthropic’s unreleased frontier AI model. It represents a "step change" in intelligence, specifically designed for autonomous software engineering and advanced cybersecurity tasks.
Mythos has achieved 93.9% on SWE-bench Verified and 94.6% on GPQA Diamond, outperforming GPT-5.5 Pro by a significant margin in both coding and graduate-level reasoning.
Project Glasswing is Anthropic’s $100 million defensive security initiative. It provides restricted early access to Mythos for high-leverage organizations to build defenses against AI-powered cybersecurity threats.
Currently, Claude Mythos is in a Restricted Preview and not available to the general public. Access is limited to cybersecurity partners and select government agencies due to the model's offensive capabilities.
While no official date is set, prediction markets and industry rumors point to a staged release starting in June 2026, beginning with Enterprise and Pro users.