WEBINAR
Learn how software development company Personio takes a strategic approach to pentesting.
WEBINAR
Learn how software development company Personio takes a strategic approach to pentesting.

The LLM Security Blind Spot: Why We're Ignoring Nearly 80% of Critical AI Risks

The speed and scale of generative AI adoption are staggering. Our latest research, compiled in the State of LLM Application Security Report, found that 94% of organizations have seen a significant increase in the adoption of this technology in the last year alone. Much like the rush to the cloud, this surge in development is creating a massive security oversight gap.

The data paints a stark and frankly alarming picture. While security leaders are right to be concerned—72% rank AI-related attacks as their top IT risk—the reality of our collective preparedness falls dangerously short. This isn't just a gap; it's a growing blind spot, and it’s expanding with every new AI feature that gets pushed to market.

The troubling reality: High-risk vulnerabilities and a widening remediation gap

For our new report, we analyzed data from thousands of penetration tests, including a dedicated focus on LLM applications, and surveyed 450 security leaders and practitioners. The findings should be a wake-up call for every CISO and security team.

Nearly one-third (32%) of all vulnerabilities discovered in LLM application pentests are classified as "serious," meaning they pose a high or critical risk. This is the highest proportion of serious findings we've seen across any asset type since we began testing LLM applications.

What’s even more concerning is what happens—or rather, what doesn’t happen—after these critical flaws are found.

Only 21% of these serious LLM vulnerabilities are ever resolved.

Let that sink in. To put that number in perspective, our data shows that across all types of penetration tests, organizations manage to fix 69% of serious vulnerabilities. The 21% remediation rate for serious LLM flaws isn't just low; it's a stark outlier that signals a profound failure in how we are managing AI risk. Nearly 80% of the most critical security flaws in this new generation of applications are being left unaddressed.

Paradoxically, for the few serious LLM flaws that are fixed, the process is incredibly fast. The Mean Time to Resolve (MTTR) is a mere 19 days—the quickest of any pentest category. This tells me two things. First, when organizations can fix an issue, they do so with urgency. But the shockingly low overall resolution rate suggests that organizations are constrained to fixing only the easiest issues. The more complex vulnerabilities, especially those dependent on third-party model providers, are being left to fester. This "fix what's easy" approach creates a false sense of security while accumulating a significant and dangerous risk portfolio.

Why are we falling behind? People, process, and technology

The reasons behind this remediation gap are complex, touching on the classic triad of people, process, and technology.

  1. Nascent expertise and vendor dependency: The field of LLM security is still evolving. Many organizations simply lack the in-house talent to properly assess and remediate these new, nuanced vulnerabilities. This leads to an over-reliance on the vendors supplying the foundational models, who may not—or cannot—prioritize security fixes at the application layer.
  2. Failure of process amid speed-to-market pressure: In the frantic race for a competitive edge, security is often an afterthought. Security teams are brought in late, handed a nearly-finished product, and pressured to sign off. A staggering 36% of security professionals admit that the demand for genAI has completely outpaced their team's ability to manage its security. This failure of process results in rushed deployments and a culture of risk acceptance over risk mitigation.
  3. Lagging tools and budgets: The technology itself presents unique challenges. Automated scanners, while essential for basic hygiene, are woefully inadequate for uncovering the kinds of complex, context-dependent flaws inherent to LLMs, like sophisticated prompt injections or business logic abuse. These require human creativity and an adversarial mindset to discover. Furthermore, AI security budgets and regulatory frameworks are struggling to keep up with the sheer pace of innovation, leaving teams under-resourced and without clear compliance drivers.

It's also crucial to recognize that the vulnerabilities themselves are a mix of the old and the new. Our pentesting data shows that classic weaknesses like SQL Injection are still the most common finding in LLM applications, accounting for nearly 20% of all issues. This indicates that in the rush to deploy AI, teams are neglecting foundational security practices, allowing well-understood vulnerabilities to be exploited through this new, highly interactive interface. Attackers are using novel vectors like Prompt Injection (11.5% of findings) and Insecure Output Handling (14.5%) to exploit timeless security flaws.

Charting a more secure path forward

The current trajectory is unsustainable, but it is reversible. Security leaders must take decisive action to close this gap.

  • Implement rigorous, human-led offensive security: Go beyond automated tools by using specialized frameworks like the OWASP Top 10 for LLMs as a baseline. True resilience requires human-led penetration testing to find the complex, creative exploits that automated defenses miss, moving you from assumption to evidence.
  • Establish proactive and LLM specific security controls: Implement controls tailored to LLM threats. This includes robust input validation against prompt injection, output filtering to stop data leakage, and strict, least-privilege access controls for the model and its data sources.
  • Mandate collaboration between security and AI/ML teams: Make security a foundational, "shift-left" element of the AI development lifecycle. This requires a mandated partnership between security and AI teams to ensure secure development practices are integrated from the start, not bolted on at the end.
  • Aggressively manage AI supply chain risk: Actively manage the significant risks from third-party models, platforms, and data. Demand evidence of your vendors' security testing and vulnerability management, and ensure contractual agreements clearly define security responsibilities.

To gain a deeper understanding of these critical issues, the specific vulnerabilities being exploited (including case studies of real LLM pentests), and recommendations for securing your LLM applications, I urge you to read the full State of LLM Application Security Report. It’s time to move from assumption to evidence and build a secure foundation for the AI-powered future.

 

Back to Blog
About Gunter Ollmann
Gunter Ollmann serves as Cobalt's Chief Technology Officer (CTO). With rich and diverse experience in cybersecurity innovation, Ollmann leads Cobalt's technology and services strategy, delivering AI-enabled offensive security solutions coupled with unmatched human ingenuity. More By Gunter Ollmann