Exclusive: Anthropic is testing ‘Mythos,’ its ‘most powerful AI model ever
The amount of hype Mythos got from what is essentially a PR marketing post is insane.
Independent testing rather shows an iterative increase in capability compared to previous SOTA models, not some new paradigm or “game changer”:
AI Security Institute
Our evaluation of Claude Mythos Preview’s cyber capabilities | AISI Work
We conducted cyber evaluations of Anthropic’s Claude Mythos Preview and found continued improvement in capture-the-flag (CTF) challenges and significant improvement on multi-step cyber-attack simulations.
However, LLMs are advancing at a rapid pace and keep getting better at cybersecurity tasks - with Mythos being the top one for now:
Mythos Preview’s success on one cyber range indicates that it is at least capable of autonomously attacking small, weakly defended and vulnerable enterprise systems where access to a network has been gained. However, our ranges have important differences from real-world environments that make them easier targets. They lack security features that are often present, such as active defenders and defensive tooling.
Discussion in the ATmosphere