When I first read what Mythos was capable of, it was clear we’ve reached an inflection point sooner than expected. Anthropic has demonstrated something the industry can’t ignore: vulnerability discovery has scaled beyond human constraints. It can identify and exploit vulnerabilities at a level that was previously impractical. And more impressive is the fact that Mythos was not trained to be a hacker. These offensive capabilities are a “downstream consequence” of improved reasoning and autonomy.
Even with all of this in mind, there’s no denying that Mythos does have tremendous potential. Rather than releasing Claude Mythos publicly, Anthropic has made it available to a limited set of partners—large technology companies, security vendors, and select open source contributors.
After reading articles about the death of AppSec tools and services, I asked a few CISOs their perspective on the impact of Mythos. The answer was clear: they want the benefits of Mythos, without the associated risks and management overhead. Most are not prepared to run Mythos or similar models in-house to find vulnerabilities.
There are three primary reasons for this:
- The AI models themselves pose security risks. Data and IP leakage, operational disruption, adversarial manipulation, and hallucination are all very real issues that could cause the very AI model used to test for vulnerabilities to create them as well.
- AI models today do not have sufficient business context to find most complex vulnerabilities. Each industry has its unique context, regulations, and data flows. While the models can certainly find many vulnerabilities, human understanding of a business and human creativity are very much still required.
- According to Anthropic, the model can produce high false positives and struggles with complex tasks. Most companies do not have the internal staffing required to validate and triage the vast amount of findings.
Enterprises want the speed and scale these models offer—without the security risks that come with them. But that tradeoff doesn’t hold. The same advances that make these systems powerful also make them risky.
The conclusion: the very AI tools that find vulnerabilities can also cause security and safety issues.
The Evolution of Pentesting as a Service
At Cobalt, we are using AI to automate many parts of the pentest process. Our AI models are continuously learning from each pentest we do (~5,000 per year). We are human-led, AI-powered, meaning that we bring our customers the benefit of both. AI finds vulnerabilities, and humans refine them with their understanding of the business logic, exploitability, and impact. In addition, our PTaaS platform helps enterprises manage their entire offensive security program so that they can reduce exposure time and minimize the likelihood of a breach.
What does this mean for the industry?
Is this a labor replacement? Obviously not in the very short term. This validates the need for access to teams of high-talent ethical hackers to help you find your exposures. But what about six months from now?
The only way to stay ahead is to be more aggressive than the adversary. This doesn't mean just running more scans; it means continuous, high-context offensive testing that maps your actual risk before an autonomous agent does it for you.
Imagine this high-performance engine in the hands of an elite driver. In less experienced hands, systems like Mythos can be unpredictable and risky. But in the hands of a highly skilled pentester or hacker? The impact is significant.
Used effectively, this kind of offensive capability can reduce reliance on traditional AST tooling by shifting focus toward real, exploitable risk rather than theoretical findings. The result is a more efficient workflow, one where developers can spend more time addressing what actually matters.
This is exactly what we foresaw at Cobalt.
At Cobalt, we view these models as high-performance vehicles requiring an elite driver.
Our platform uses AI to automate the rote reconnaissance at machine speed. Our toolchain is a direct output of our years of pentesting experience, so that every pentest learns and informs the next.
This gives our 500+ elite pentesters, who average 11 years of experience, the bandwidth to focus on chained exploitation, business logic abuse, and sophisticated lateral movement. Safety and accountability are non-negotiable in offensive ops, and those are human-led traits.
