REPORT
The 25x Remediation Gap: See how elite security teams resolve risks in 10 days vs. 249
REPORT
The 25x Remediation Gap: See how elite security teams resolve risks in 10 days vs. 249

Building Honey-AI, Part 1: Why AI Infrastructure Needs Its Own Honeypot

AI infrastructure has created a new attack surface.

For years, defenders have monitored SSH brute-force attempts, exposed admin panels, web-scanner noise, credential stuffing, cloud metadata probes, and API abuse. That is all still important—but AI systems added something different.

I built Honey-AI to study how malicious adversaries attack AI infrastructure.

Attackers are no longer only looking for passwords, shells, databases, or cloud tokens. They are also looking for AI API keys, model endpoints, agent workflows, MCP servers, RAG data, tool permissions, system prompts, vector stores, and hidden automation paths.

The bigger point is this: AI infrastructure should be treated like a production attack surface, not just developer tooling.

Modern AI systems now include API gateways, SDKs, keys, model backends, embeddings, vector stores, plugin manifests, MCP servers, agent instructions, tool outputs, hidden config files, and developer automation. Each of these creates a path for reconnaissance, credential theft, prompt injection, or agent manipulation.

Honey-AI exists to make those paths visible.

Why AI Honeypots Need To Be Different

A traditional honeypot usually answers one basic question:

  • Who is attacking this service?
  • That is useful, but AI infrastructure forces better questions:
    • Are attackers testing stolen OpenAI keys?
    • Are they trying Anthropic, Gemini, Cohere, Azure, or OpenRouter keys?
    • Are they trying to extract system prompts?
    • Are they probing RAG endpoints?
    • Are they enumerating vector stores?
    • Are they discovering MCP tools?
    • Are they submitting malicious tool outputs?
    • Are they using LangChain, LiteLLM, CrewAI, Cursor, Claude Desktop, Copilot, or a custom agent?
    • Are they stealing canary credentials and reusing them somewhere else?
    • Are LLM agents following planted instructions that a human would ignore?

Those questions are hard to answer with a generic web honeypot.

A request to `/v1/chat/completions` with an `Authorization: Bearer sk-proj-*` key is not just a generic POST request. A request to `/v1/vector_stores` may be RAG reconnaissance. A call to `tools/list` on an MCP endpoint may be an agent discovering available tools. A request for `/.cursorrules`, `/CLAUDE.md`, or `/llms.txt` may be an AI coding agent scraping instructions before acting.

Honey-AI is built around those signals. It classifies traffic into categories like:

Screenshot 2026-05-18 at 9.37.45 AM

The classifier is rule-based and fast. A richer LLM analysis can run in the background afterward.

The important point is that Honey-AI does not treat every request as generic web noise. It treats AI-specific behavior as the signal.

What Honey-AI Pretends To Be

Screenshot 2026-05-18 at 9.37.57 AM

It exposes endpoints that resemble real AI infrastructure across multiple ecosystems. For OpenAI-compatible traffic, it supports endpoints like:

Screenshot 2026-05-18 at 9.38.08 AM

For Anthropic-style traffic, it handles Claude-style endpoints like:

Screenshot 2026-05-18 at 9.38.16 AM

It also mimics Azure OpenAI paths, Cohere chat and rerank endpoints, Gemini generation and embedding endpoints, audio endpoints, vector database APIs, and MCP JSON-RPC calls.

That breadth matters.

Attackers usually do not know exactly what they are hitting at first. They enumerate. They test multiple API shapes. They reuse tools. They validate keys across providers. A realistic AI honeypot needs to look broad enough to keep that behavior going.

Honey-AI also exposes files that AI agents and coding tools are likely to fetch:

Screenshot 2026-05-18 at 9.38.27 AM

These are not random paths. They are the kinds of files modern AI coding agents, plugin scanners, repo crawlers, and automated recon tools may inspect before deciding what to do next.

That turns the honeypot into a sensor for agent behavior, not only human attacker behavior.

Credential Stuffing Against AI APIs

One of the clearest threats Honey-AI captures is stolen key testing.

Attackers get API keys from GitHub leaks, exposed `.env` files, logs, compromised developer machines, Slack dumps, CI/CD systems, pasted configs, and prompt leakage. Once they have a key, they need to know whether it works.

Honey-AI is designed to detect attempts to validate those keys across AI providers.

Examples include:

Screenshot 2026-05-18 at 9.38.35 AM

Screenshot 2026-05-18 at 9.38.43 AM

A single request to `/v1/models` may not look dramatic. But when it includes a stolen-looking key, SDK headers, and repeated probing, it becomes credential validation.

Honey-AI also captures SDK fingerprinting headers.

OpenAI SDKs often include `x-stainless-*` headers. Those headers can reveal the SDK language, SDK version, operating system, runtime, and client behavior.

That helps answer an important question: Is this just curl noise, or is someone testing AI credentials with a specific SDK or framework?

That distinction matters.

Prompt Injection And System Prompt Harvesting

AI attackers do not only steal keys. They also try to manipulate model behavior. Honey-AI captures common prompt extraction and jailbreak attempts, including:

Screenshot 2026-05-18 at 9.38.52 AM

The interesting part is not only that these strings appear. The interesting part is where they appear.

A direct prompt injection attempt inside `/v1/chat/completions` is one thing.

An injection embedded inside an MCP resource, a fake file, an assistant thread, or a submitted tool result is more interesting. That is where agentic systems become risky.

Agent workflows often treat tool output as trusted context. If an attacker can control a tool response, retrieved document, file read, or MCP resource, they may be able to inject instructions into the model's working context.

Honey-AI is designed to observe those surfaces. It captures assistant runs, thread messages, tool output submissions, MCP resource reads, and polling behavior. This makes it possible to study how attackers target the control flow around the model, not just the prompt sent directly to it.

MCP As A New Attack Surface

One of the most important parts of Honey-AI is the MCP honeypot.

Model Context Protocol is becoming a way for AI agents to discover tools, access resources, call functions, read files, query databases, and interact with external systems.

That makes MCP very interesting from a security perspective. Honey-AI exposes an MCP endpoint that simulates an internal company MCP server. It supports discovery and JSON-RPC style methods like:

Screenshot 2026-05-18 at 9.39.01 AM

This lets Honey-AI observe MCP-specific behavior, including:

Screenshot 2026-05-18 at 10.15.58 AM

For example, a request to `tools/list` may show that a client is discovering available tools.

A later `tools/call` to `read_file`, `execute_sql`, `run_shell`, `get_secrets`, `http_request`, or `send_email` shows intent.

A `resources/read` request for `file:///app/.env` or `database://production/users` shows even more.

This is where Honey-AI becomes more than a fake API. It becomes a way to study how agentic clients behave when they see realistic internal tools and resources.

That is where the research gets useful.

Layered Prompt Injection Payloads

Honey-AI includes a layered injection system designed to separate human reviewers from LLM agents.

The project defines three configurable payload layers:

Screenshot 2026-05-18 at 10.15.28 AM

The idea is simple. A human may view a response and ignore hidden or irrelevant instructions. An LLM agent may ingest that same content into context and act on it.

Honey-AI plants directives into places agents are likely to parse, such as model descriptions, error bodies, assistant replies, or decoy resources. If the agent follows those instructions and hits a decoy endpoint, that becomes a strong signal that an automated AI system parsed and acted on the lure.

The decoy endpoints include:

Screenshot 2026-05-18 at 9.39.22 AM

Hits to these endpoints are high-confidence events. This is one of the most useful ideas in the project.

Do not only wait for attackers to send obvious malicious payloads. Plant controlled signals and observe whether agents follow them. That makes Honey-AI useful for studying agent behavior, not just attacker behavior.

Canary Credentials And Confirmed Theft

A common problem with honeypots is that they capture access attempts but cannot always prove theft. Honey-AI tries to close that loop with canary tracking.

The honeypot can issue fake credentials through realistic endpoints and files, such as:

Screenshot 2026-05-18 at 10.14.58 AM

Those credentials are recorded. If the same key later appears in a request, Honey-AI can flag it as confirmed reuse.

That is a stronger signal than "someone viewed a fake key."

It becomes: Someone retrieved a fake key and then attempted to use it.

Honey-AI also supports beacon tokens. A unique URL can be embedded in a fake credential or config response. If an attacker or their tooling later requests that beacon URL, Honey-AI records the source IP, user agent, and time since issuance.

This turns passive deception into measurable telemetry.

For defenders, that matters because it helps separate curiosity, scanning, and real credential theft behavior.

SSH Deception And Cross-Lure Correlation

Honey-AI also includes an SSH honeypot on port 2222. That may sound unrelated to AI APIs, but it is useful when combined with AI lures.

Attackers often pivot across exposed services. A source IP may first hit `/login`, then `/v1/models`, then `/mcp`, then SSH.

Honey-AI can stitch those behaviors into a single attacker profile.

The SSH service captures:

Screenshot 2026-05-18 at 9.39.36 AM

It also supports realistic shell outputs for commands like:

Screenshot 2026-05-18 at 10.14.23 AM

When an LLM backend is available, the shell can produce coherent responses. That keeps the session believable longer and may reveal more tradecraft.

The SSH honeypot can also include planted canaries in shell history and `.bashrc`, tying SSH activity back into the broader Honey-AI deception system.

That matters because attackers do not think in terms of your product categories. They probe whatever looks useful.

Honey-AI treats HTTP, AI APIs, MCP, and SSH as related signals.

From Logs To Tradecraft

Honey-AI does not just log raw requests. It tries to extract tradecraft.

For each suspicious request, it can identify fields such as:

Screenshot 2026-05-18 at 9.39.49 AM

That means a request can be turned into something more useful than:

Screenshot 2026-05-18 at 9.39.58 AM

Instead, the defender can see:

Screenshot 2026-05-18 at 10.13.35 AM

That is the difference between basic logging and security research output.

Honey-AI can also include enrichment from AbuseIPDB, GreyNoise, and Shodan when configured. That allows source IPs to be enriched with abuse scores, scanner labels, open ports, known CVEs, hostnames, organizations, Tor status, or known benign infrastructure tags.

Used carefully, this helps defenders avoid overreacting to commodity scanning while still flagging high-signal behavior.

Why This Matters

AI security is often discussed in abstract terms: prompt injection, jailbreaks, RAG poisoning, model abuse, and agent risk. Defenders still need practical telemetry.

They need to know what attackers are actually doing against exposed AI infrastructure. Honey-AI is an attempt to build that visibility.

It watches for stolen key testing, fake provider enumeration, MCP probing, RAG endpoint mapping, vector database abuse, prompt harvesting, tool-call injection, canary reuse, agent discovery, and cross-service attacker behavior.

That makes it useful for:

Screenshot 2026-05-18 at 9.40.25 AM

The most important shift is this:

  • AI honeypots should not only mimic chat endpoints. They should mimic the surrounding ecosystem that agents and attackers depend on.
  • That ecosystem includes keys, tools, files, configs, plugins, MCP servers, vector stores, assistant runs, shell access, SDK fingerprints, and agent instructions.
  • That is where the attack surface is moving.

Honey-AI is built to answer questions like:

Screenshot 2026-05-18 at 10.12.45 AM

That is exactly what makes Honey-AI so valuable.

It isn't just a honeypot; it’s a dedicated engine for capturing AI-specific attack telemetry.

In Part 2, which we will publish soon, we’ll show you the results, diving into the real-world attacks we have seen daily against Honey-AI, ranging from automated scanners hunting for AI API keys to human adversaries manually probing Model Context Protocol (MCP) endpoints.

We are gleaning new techniques and tactics every few minutes, observing threats as they unfold in real time. Furthermore, the platform features a "rewind" capability, allowing us to reconstruct exactly how an attacker behaves once they believe they’ve successfully compromised an AI system.

State of Pentesting Report 2026 Call to Action

Back to Blog
About Adam Toscher
Adam Toscher is a seasoned Cybersecurity Professional, Offensive Security Engineer, and Red Team Operator with over 20 years of experience across IT and cybersecurity, specializing in identifying critical vulnerabilities and developing advanced, actionable defensive strategies for Fortune 500 companies and government agencies. A recognized authority, he's best known for coining "Top 5 Ways To Get Domain Admin Before Lunch" and is a sought-after international speaker and prolific author of cornerstone cybersecurity blogs, consistently ranking at the top for red teaming and pentesting. More By Adam Toscher