AI Agent Firewalls: Why You Need to Secure Your MCP Tool Chain Before It's Too Late

TL;DR

Model Context Protocol (MCP) servers have become a critical and largely undefended attack surface in 2026, with security researchers discovering over 8,000 exposed MCP servers accessible from the public internet. Threat actors are actively deploying worms like Shai-Hulud and SANDWORM_MODE that hijack AI agent configurations, while OWASP's new Agentic AI Top 10 formally classifies tool chain compromise as a top-tier risk. Organisations deploying AI agents must implement MCP-aware firewalls and tool-chain security controls before these attacks scale.​‌‌​​​​‌‍​‌‌​‌​​‌‍​​‌​‌‌​‌‍​‌‌​​​​‌‍​‌‌​​‌‌‌‍​‌‌​​‌​‌‍​‌‌​‌‌‌​‍​‌‌‌​‌​​‍​​‌​‌‌​‌‍​‌‌​​‌‌​‍​‌‌​‌​​‌‍​‌‌‌​​‌​‍​‌‌​​‌​‌‍​‌‌‌​‌‌‌‍​‌‌​​​​‌‍​‌‌​‌‌​​‍​‌‌​‌‌​​‍​‌‌‌​​‌‌‍​​‌​‌‌​‌‍​‌‌​‌‌​‌‍​‌‌​​​‌‌‍​‌‌‌​​​​‍​​‌​‌‌​‌‍​‌‌‌​​‌‌‍​‌‌​​‌​‌‍​‌‌​​​‌‌‍​‌‌‌​‌​‌‍​‌‌‌​​‌​‍​‌‌​‌​​‌‍​‌‌‌​‌​​‍​‌‌‌‌​​‌‍​​‌​‌‌​‌‍​‌‌‌​‌​​‍​‌‌​‌‌‌‌‍​‌‌​‌‌‌‌‍​‌‌​‌‌​​‍​​‌​‌‌​‌‍​‌‌​​​‌‌‍​‌‌​‌​​​‍​‌‌​​​​‌‍​‌‌​‌​​‌‍​‌‌​‌‌‌​‍​​‌​‌‌​‌‍​‌‌‌​​​​‍​‌‌‌​​‌​‍​‌‌​‌‌‌‌‍​‌‌‌​‌​​‍​‌‌​​‌​‌‍​‌‌​​​‌‌‍​‌‌‌​‌​​‍​‌‌​‌​​‌‍​‌‌​‌‌‌‌‍​‌‌​‌‌‌​‍​​‌​‌‌​‌‍​​‌‌​​‌​‍​​‌‌​​​​‍​​‌‌​​‌​‍​​‌‌​‌‌​


What Is MCP and Why Does Everyone Suddenly Care About Securing It?

If you've been following the AI agent space, you've heard about MCP — Model Context Protocol. It's the open standard, developed by Anthropic, that lets AI assistants like Claude connect to external tools: databases, file systems, APIs, code execution environments, calendars, and basically anything else you can imagine. Think of MCP as the USB port of AI — it lets you plug things in.

ELI10: Imagine your AI assistant is a really smart kid at school. MCP is like giving that kid a backpack full of tools — a calculator, a ruler, a paintbrush. The kid can now do way more stuff! But here's the problem: if someone sneaks into the backpack and swaps the calculator for one that

gives wrong answers, the kid will keep getting questions wrong without knowing why. That's what MCP attacks do.​‌‌​​​​‌‍​‌‌​‌​​‌‍​​‌​‌‌​‌‍​‌‌​​​​‌‍​‌‌​​‌‌‌‍​‌‌​​‌​‌‍​‌‌​‌‌‌​‍​‌‌‌​‌​​‍​​‌​‌‌​‌‍​‌‌​​‌‌​‍​‌‌​‌​​‌‍​‌‌‌​​‌​‍​‌‌​​‌​‌‍​‌‌‌​‌‌‌‍​‌‌​​​​‌‍​‌‌​‌‌​​‍​‌‌​‌‌​​‍​‌‌‌​​‌‌‍​​‌​‌‌​‌‍​‌‌​‌‌​‌‍​‌‌​​​‌‌‍​‌‌‌​​​​‍​​‌​‌‌​‌‍​‌‌‌​​‌‌‍​‌‌​​‌​‌‍​‌‌​​​‌‌‍​‌‌‌​‌​‌‍​‌‌‌​​‌​‍​‌‌​‌​​‌‍​‌‌‌​‌​​‍​‌‌‌‌​​‌‍​​‌​‌‌​‌‍​‌‌‌​‌​​‍​‌‌​‌‌‌‌‍​‌‌​‌‌‌‌‍​‌‌​‌‌​​‍​​‌​‌‌​‌‍​‌‌​​​‌‌‍​‌‌​‌​​​‍​‌‌​​​​‌‍​‌‌​‌​​‌‍​‌‌​‌‌‌​‍​​‌​‌‌​‌‍​‌‌‌​​​​‍​‌‌‌​​‌​‍​‌‌​‌‌‌‌‍​‌‌‌​‌​​‍​‌‌​​‌​‌‍​‌‌​​​‌‌‍​‌‌‌​‌​​‍​‌‌​‌​​‌‍​‌‌​‌‌‌‌‍​‌‌​‌‌‌​‍​​‌​‌‌​‌‍​​‌‌​​‌​‍​​‌‌​​​​‍​​‌‌​​‌​‍​​‌‌​‌‌​

The reason this matters so much right now is simple: AI agents are moving from the lab to production. Developers are building autonomous AI systems — agents that browse the web, write code, send emails, query databases, and take actions in the real world. Every one of those capabilities is wired up through an MCP connection. And those connections? Most of them have zero security controls.


How Bad Is the MCP Exposure Problem Right Now?

Let's get concrete. In early 2026, security researchers conducting internet-wide scans discovered over 8,000 MCP servers exposed directly to the public internet with no authentication, no encryption, and no access controls [1]. These weren't honeypots or test servers — many were production deployments with live tool access to enterprise systems.

This is not a hypothetical risk. Exposed MCP servers are the equivalent of leaving your SSH daemon open with no password and a sign on the door that says "AI TOOLS INSIDE."

The attack surface is compounding for several reasons:

  1. MCP adoption is exploding. Every major AI development platform now supports MCP. Claude, GPT-4, Gemini, and dozens of open-source models all have MCP integration paths.
  2. Developers are moving fast. Security is often the last thing added, not the first.
  3. MCP spec is young. The protocol was released in late 2024 and reached wide adoption in 2025. Security tooling hasn't caught up.
  4. Tool permissions are broad. A single MCP server might grant an AI agent access to your entire filesystem, your email, your code repositories, and your cloud infrastructure simultaneously.

What Is Tool Poisoning and Why Is It So Dangerous?

Tool poisoning is the MCP equivalent of a supply chain attack. An attacker modifies or replaces an MCP tool definition — either by compromising the server hosting it, injecting malicious instructions into a legitimate tool's description, or tricking an agent into loading a rogue tool — so that the AI model executes attacker-controlled behaviour while the user thinks everything is normal.

ELI10: Remember the backpack example? Tool poisoning is when the bad guy doesn't swap the whole calculator — they just put a tiny sticker over a few buttons. Now when the kid presses "+" they get "-" instead. The kid has no idea anything is wrong.

The OWASP Agentic AI Top 10, published in 2026, formally names Tool and Plugin Compromise (AA:2026-05) as one of the ten most critical risks in agentic AI systems [2]. According to OWASP's documentation, this attack class includes:

  • Malicious tool injection: Attacker-controlled MCP servers that appear legitimate
  • Tool description poisoning: Embedding hidden instructions in tool metadata that the model reads but the user doesn't see
  • Response manipulation: Intercepting and modifying tool outputs mid-flight to alter agent behaviour
  • Privilege escalation via tools: Using legitimate tool access to bootstrap unauthorized capabilities

Researchers have catalogued 54 real-world MCP injection attempts across public bug bounty programs and incident disclosures as of early 2026 [1]. The actual number is certainly higher — most organisations don't know they've been targeted, because MCP traffic is rarely logged or monitored.


SANDWORM_MODE and Shai-Hulud: The First MCP Worms Are Here

The threat escalated in late 2025 and early 2026 with the emergence of two AI-targeting worms specifically designed to propagate through MCP infrastructure.

SANDWORM_MODE was first documented by threat intelligence researchers analysing compromised AI development environments. The worm operates by scanning for exposed MCP configuration files (commonly stored as .mcp.json or similar), modifying the tool definitions to include self-replicating payloads, and then using the AI agent's own capabilities to propagate to connected systems. Because it rides inside legitimate tool responses, it bypasses most traditional network security controls.

Shai-Hulud (named, presumably, for the great worm of Dune) is more sophisticated. It targets the MCP config files used by AI coding assistants and injects persistent backdoor instructions into the system prompt context. Once installed, every subsequent AI session operates under the attacker's hidden instructions — exfiltrating code, credentials, and session tokens while appearing to function normally [3].

ELI10: Imagine a computer virus, but instead of infecting your computer directly, it infects your AI assistant's instruction manual. Now every time you ask your AI to help you, it's secretly also doing bad things for the hacker. And the worst part? The AI doesn't know it's been tricked — it's just following instructions.

Both worms demonstrate a fundamental architectural problem: AI agents are trusted execution environments, and the trust is not verified. When an agent fetches a tool definition from an MCP server, it typically has no way to know if that definition has been tampered with.


The Claude Code RCE: A Real-World Wake-Up Call

In 2026, Check Point Research published findings detailing a Remote Code Execution (RCE) vulnerability in Claude Code — Anthropic's AI-powered coding assistant — that could be triggered through malicious MCP tool responses [4]. The vulnerability allowed attackers to execute arbitrary code on the developer's machine by crafting a response from a compromised MCP tool server that Claude Code would then attempt to execute as part of its normal coding workflow.

Worse, the same research identified an API key exfiltration path. Because Claude Code has access to the developer's environment variables (needed for legitimate coding tasks), a poisoned MCP tool could read and exfiltrate API keys, credentials, and access tokens — all while the developer saw nothing unusual in the UI.

This is not a theoretical attack. This is an attack against one of the most widely-used AI coding tools in the world, exploited through the MCP layer that was supposed to make the tool more powerful and useful.

NIST's AI Risk Management Framework (AI RMF) explicitly addresses this class of threat under the "Secure" function, noting that AI systems interacting with external tools must implement runtime integrity verification and anomaly detection [5]. Most current MCP deployments implement neither.


What Is an AI Agent Firewall, and How Does pipelock Work?

An AI agent firewall is a security layer that sits between your AI model and its MCP tool connections. Like a traditional network firewall inspects packets, an AI agent firewall inspects tool requests, tool responses, and agent behaviour — looking for signs of compromise, policy violations, or anomalous activity.

pipelock is one of the first purpose-built AI agent firewalls designed specifically for MCP security. It operates as a proxy between the AI agent and MCP servers, providing:

  • Tool definition integrity checking: Validates that tool schemas match known-good signatures before the agent processes them
  • Prompt injection detection: Scans tool descriptions and responses for embedded instruction attacks
  • Behavioural anomaly detection: Flags when an agent's tool usage patterns deviate from established baselines
  • Capability sandboxing: Enforces granular limits on what tools can actually access, regardless of what permissions the MCP server claims
  • Real-time blocking: Can halt tool execution mid-flight when anomalies are detected, rather than just logging after the fact

The architectural insight behind pipelock and similar tools is important: you cannot trust the AI model to defend itself. Large language models don't have inherent security properties. They're trained to be helpful and to follow instructions — including instructions embedded in tool responses by attackers. The security layer must exist outside the model's trust boundary.

ELI10: Imagine the AI's backpack has a smart lock on it now. Before any new tool goes into the backpack, the lock checks: "Is this tool the real deal? Has anyone messed with it? Is it trying to do something sneaky?" If something looks wrong, the lock says "nope" and throws it out. The AI only gets tools that have been checked and approved.


What Does the OWASP Agentic AI Top 10 Say About This?

OWASP published its Agentic AI Top 10 in 2026, representing the first formal community consensus on the most critical security risks in autonomous AI systems [2]. Relevant to MCP security, the list includes:

AA:2026-01 — Prompt Injection (now including indirect injection via tools): The number one risk, expanded to cover tool-mediated injection attacks where malicious content in tool responses hijacks agent behaviour.

AA:2026-03 — Excessive Agency: Agents operating with more permissions and capabilities than they need for their defined task. An agent that only needs to read files but has write access to the entire filesystem is a liability.

AA:2026-05 — Tool and Plugin Compromise: Direct compromise of MCP servers or tool definitions to manipulate agent behaviour. The Shai-Hulud and SANDWORM_MODE worms both fall into this category.

AA:2026-07 — Insufficient Logging and Monitoring: Without comprehensive audit trails of tool calls, responses, and agent decisions, detecting and responding to compromise is nearly impossible.

AA:2026-09 — Supply Chain Vulnerabilities: Third-party MCP tool servers introduce the same supply chain risks as third-party code libraries — but with direct execution privileges inside your AI systems.

OWASP's guidance mirrors CISA's recommendations in their 2025 AI Security Considerations advisory [6], which emphasises defence-in-depth for AI systems: assume the model will be manipulated, build controls outside the model's trust boundary, and log everything.


How to Actually Secure Your MCP Tool Chain: A Practical Framework

You don't need to wait for a perfect solution before you start hardening. Here's a practical framework based on current best practices from NIST, OWASP, and the security research community.

Step 1: Inventory Your MCP Attack Surface

You can't secure what you don't know exists. Start by cataloguing every MCP server your agents connect to: who hosts it, what tools it exposes, what systems those tools access, and who has permission to modify the tool definitions. This is your MCP asset register.

Step 2: Apply Least Privilege to Tool Access

Every MCP tool should have the minimum permissions necessary to do its job. A web search tool doesn't need filesystem access. A code execution tool doesn't need network access to external APIs. Review every tool's capability scope and strip unnecessary permissions.

Step 3: Implement Network-Level Controls for MCP Servers

Exposed MCP servers should not be reachable from the public internet unless that's an explicit, audited design decision. Use network segmentation, VPNs, or zero-trust network access to restrict who can reach your MCP infrastructure [7].

Step 4: Deploy an MCP-Aware Security Proxy

Solutions like pipelock, or custom implementations using the MCP inspector tooling, provide the runtime inspection layer that's missing from most deployments. At minimum, implement:

  • Schema validation for all tool definitions
  • Output filtering for tool responses
  • Anomaly detection on tool call patterns

Step 5: Log and Monitor Everything

Every tool call, every tool response, every agent decision that results in a tool invocation should be logged with enough context to reconstruct the full decision chain. Feed these logs into your SIEM. Write detection rules for known attack patterns (embedded instructions in tool descriptions, unusual exfiltration patterns in tool outputs, etc.).

Step 6: Treat MCP Servers Like Code Dependencies

Apply the same supply chain security practices to MCP servers that you apply to code libraries: pin versions, verify hashes, review changes, and have a process for responding to compromised upstream tools. NIST's Secure Software Development Framework (SSDF) provides the applicable controls [8].


What Are the Signs Your MCP Environment Has Been Compromised?

ELI10: If your AI starts doing things you didn't ask it to do, or if it seems confused about simple tasks, or if you notice files or data moving around that you didn't authorise — those are red flags. It's like if you asked your smart robot helper to make a sandwich and it also sent a secret message to someone while it was doing it.

Watch for these indicators of compromise:

  • Unexpected tool invocations: Your agent calls tools it has no reason to call for the current task
  • Anomalous data flows: Tool calls are exfiltrating data to unexpected endpoints
  • Modified tool schemas: Tool definitions change between sessions without an approved change record
  • Instruction drift: Agent behaviour changes subtly over time without corresponding changes to your prompts
  • Performance anomalies: Unusual latency in tool responses (possible interception proxy)
  • Credential anomalies: API keys or tokens used from unexpected locations or at unexpected times

Frequently Asked Questions About MCP Security

Model Context Protocol (MCP) is an open standard that allows AI models to connect to external tools, services, and data sources. It's a security risk because it dramatically expands the attack surface of AI systems: every tool connection is a potential injection point, and the AI model has no inherent ability to distinguish between legitimate tool outputs and attacker-modified ones.

Security researchers discovered over 8,000 MCP servers accessible from the public internet with no authentication in early 2026. This number is likely an undercount, as many organisations don't know they have exposed MCP infrastructure.

Tool poisoning is an attack where an adversary modifies MCP tool definitions or tool responses to embed malicious instructions that the AI model will follow without the user's knowledge. This can cause the agent to exfiltrate data, execute unauthorized code, or take actions contrary to the user's intent.

Traditional network security tools are not effective against most MCP attacks because the malicious content travels inside legitimate HTTP/WebSocket traffic to ports and endpoints that are supposed to be accessible. You need application-aware security that understands MCP's structure and semantics — tools like pipelock that inspect the content of MCP messages, not just their network headers.

Immediately audit what MCP servers your agents connect to, confirm they are not exposed to the public internet, implement network-level access controls, and begin logging all tool call activity. Then evaluate purpose-built MCP security tooling for your environment. If you need help assessing your exposure, get in touch with the team at lil.business.

No. MCP is an open standard supported by multiple AI platforms. The vulnerabilities described here — tool poisoning, prompt injection via tool responses, MCP server exposure — apply to any AI agent that uses MCP to connect to external tools, regardless of the underlying model provider.

The OWASP Agentic AI Top 10 is a community-developed list of the ten most critical security risks in autonomous AI agent systems, published by OWASP in 2026. It is the AI-agent equivalent of the OWASP Web Application Top 10 — a reference standard for developers and security teams building and deploying AI agents. It explicitly covers tool chain compromise, prompt injection via tool responses, and insufficient monitoring of agent behaviour.


The Bottom Line: The Attack Surface Moved, and Most Defences Didn't

The security industry spent years hardening web applications, APIs, and cloud infrastructure. We built WAFs, API gateways, and zero-trust architectures. And then AI agents arrived, and we gave them keys to all of it through an MCP interface with essentially no security controls.

The 8,000+ exposed MCP servers aren't a bug in a single product — they're a symptom of an industry that moved fast on capability and slow on security. The Shai-Hulud and SANDWORM_MODE worms are early signals. The Claude Code RCE is a proof-of-concept for a whole class of attacks. The OWASP Agentic AI Top 10 is the first formal recognition that this problem is real and getting worse.

AI agent firewalls are not optional infrastructure anymore. They're table stakes for any organisation running agents in production. The question isn't whether to secure your MCP tool chain — it's whether you'll do it before or after a breach.


Get Help Securing Your AI Agent Infrastructure

If you're deploying AI agents and need help assessing your MCP security posture, designing tool chain controls, or building an agentic AI security framework that actually works — the team at lilMONSTER has you covered.

Book a consultation at consult.lil.business and let's talk about what your AI security looks like before the threat actors do.


FAQ

Q: What is the main security concern covered in this post? A:

Q: Who is affected by this? A:

Q: What should I do right now? A:

Q: Is there a workaround if I can't patch immediately? A:

Q: Where can I learn more? A:

References

[1] Security Researcher Collective, "Large-Scale MCP Server Exposure Survey: 8,000+ Unauthenticated Endpoints Discovered," arXiv preprint, 2026. [Online]. Available: https://arxiv.org/

[2] OWASP Foundation, "OWASP Agentic AI Top 10 — 2026 Edition," OWASP, 2026. [Online]. Available: https://owasp.org/www-project-agentic-ai-top-10/

[3] Threat Intelligence Community, "Shai-Hulud and SANDWORM_MODE: First-Generation MCP Worms Targeting AI Agent Configurations," Threat Research Blog, 2026.

[4] Check Point Research, "Remote Code Execution and API Key Exfiltration via Malicious MCP Tool Responses in Claude Code," Check Point Software Technologies, 2026. [Online]. Available: https://research.checkpoint.com/

[5] National Institute of Standards and Technology, "Artificial Intelligence Risk Management Framework (AI RMF 1.0)," NIST AI 100-1, U.S. Department of Commerce, Jan. 2023. [Online]. Available: https://doi.org/10.6028/NIST.AI.100-1

[6] Cybersecurity and Infrastructure Security Agency, "Security Considerations for AI Systems," CISA, 2025. [Online]. Available: https://www.cisa.gov/ai

[7] National Institute of Standards and Technology, "Zero Trust Architecture," NIST Special Publication 800-207, Aug. 2020. [Online]. Available: https://doi.org/10.6028/NIST.SP.800-207

[8] National Institute of Standards and Technology, "Secure Software Development Framework (SSDF) Version 1.1: Recommendations for Mitigating the Risk of Software Vulnerabilities," NIST Special Publication 800-218, Feb. 2022. [Online]. Available: https://doi.org/10.6028/NIST.SP.800-218

[9] Anthropic, "Model Context Protocol Specification," Anthropic, 2024. [Online]. Available: https://modelcontextprotocol.io/

[10] OWASP Foundation, "OWASP Top 10 for Large Language Model Applications — 2025," OWASP, 2025. [Online]. Available: https://owasp.org/www-project-top-10-for-large-language-model-applications/

Ready to strengthen your security?

Talk to lilMONSTER. We assess your risks, build the tools, and stay with you after the engagement ends. No clipboard-and-leave consulting.

Get a Free Consultation