The speed and scale of generative AI adoption are staggering. Our latest research, compiled in the State of LLM Application Security Report, found that 94% of organizations have seen a significant increase in the adoption of this technology in the last year alone. Much like the rush to the cloud, this surge in development is creating a massive security oversight gap.
The data paints a stark and frankly alarming picture. While security leaders are right to be concerned—72% rank AI-related attacks as their top IT risk—the reality of our collective preparedness falls dangerously short. This isn't just a gap; it's a growing blind spot, and it’s expanding with every new AI feature that gets pushed to market.
For our new report, we analyzed data from thousands of penetration tests, including a dedicated focus on LLM applications, and surveyed 450 security leaders and practitioners. The findings should be a wake-up call for every CISO and security team.
Nearly one-third (32%) of all vulnerabilities discovered in LLM application pentests are classified as "serious," meaning they pose a high or critical risk. This is the highest proportion of serious findings we've seen across any asset type since we began testing LLM applications.
What’s even more concerning is what happens—or rather, what doesn’t happen—after these critical flaws are found.
This is an example of how a quotation would be displayed in the Learning Center, but will also be used in the Cobalt blogs! Scott Schlee❞
Let that sink in. To put that number in perspective, our data shows that across all types of penetration tests, organizations manage to fix 69% of serious vulnerabilities. The 21% remediation rate for serious LLM flaws isn't just low; it's a stark outlier that signals a profound failure in how we are managing AI risk. Nearly 80% of the most critical security flaws in this new generation of applications are being left unaddressed.
Paradoxically, for the few serious LLM flaws that are fixed, the process is incredibly fast. The Mean Time to Resolve (MTTR) is a mere 19 days—the quickest of any pentest category. This tells me two things. First, when organizations can fix an issue, they do so with urgency. But the shockingly low overall resolution rate suggests that organizations are constrained to fixing only the easiest issues. The more complex vulnerabilities, especially those dependent on third-party model providers, are being left to fester. This "fix what's easy" approach creates a false sense of security while accumulating a significant and dangerous risk portfolio.
For our new report, we analyzed data from thousands of penetration tests, including a dedicated focus on LLM applications, and surveyed 450 security leaders and practitioners. The findings should be a wake-up call for every CISO and security team.
Nearly one-third (32%) of all vulnerabilities discovered in LLM application pentests are classified as "serious," meaning they pose a high or critical risk. This is the highest proportion of serious findings we've seen across any asset type since we began testing LLM applications.
What’s even more concerning is what happens—or rather, what doesn’t happen—after these critical flaws are found.
Only 21% of these serious LLM vulnerabilities are ever resolved.
Let that sink in. To put that number in perspective, our data shows that across all types of penetration tests, organizations manage to fix 69% of serious vulnerabilities. The 21% remediation rate for serious LLM flaws isn't just low; it's a stark outlier that signals a profound failure in how we are managing AI risk. Nearly 80% of the most critical security flaws in this new generation of applications are being left unaddressed.
Paradoxically, for the few serious LLM flaws that are fixed, the process is incredibly fast. The Mean Time to Resolve (MTTR) is a mere 19 days—the quickest of any pentest category. This tells me two things. First, when organizations can fix an issue, they do so with urgency. But the shockingly low overall resolution rate suggests that organizations are constrained to fixing only the easiest issues. The more complex vulnerabilities, especially those dependent on third-party model providers, are being left to fester. This "fix what's easy" approach creates a false sense of security while accumulating a significant and dangerous risk portfolio.
The reasons behind this remediation gap are complex, touching on the classic triad of people, process, and technology.
It's also crucial to recognize that the vulnerabilities themselves are a mix of the old and the new. Our pentesting data shows that classic weaknesses like SQL Injection are still the most common finding in LLM applications, accounting for nearly 20% of all issues. This indicates that in the rush to deploy AI, teams are neglecting foundational security practices, allowing well-understood vulnerabilities to be exploited through this new, highly interactive interface. Attackers are using novel vectors like Prompt Injection (11.5% of findings) and Insecure Output Handling (14.5%) to exploit timeless security flaws.
The current trajectory is unsustainable, but it is reversible. Security leaders must take decisive action to close this gap.
To gain a deeper understanding of these critical issues, the specific vulnerabilities being exploited (including case studies of real LLM pentests), and recommendations for securing your LLM applications, I urge you to read the full State of LLM Application Security Report. It’s time to move from assumption to evidence and build a secure foundation for the AI-powered future.