Agentic AI Security: What Happens When Your AI Assistant Goes Rogue
TL;DR
- Agentic AI is different. Your AI assistant no longer just answers questions — it takes actions: booking meetings, writing code, sending emails, calling APIs. That changes the risk picture entirely.
- New attack surfaces are real and being exploited now. MCP servers, tool poisoning, memory injection, and WebSocket hijacking are live attack vectors — not theoretical ones. Over 8,000 MCP servers were found exposed to the public internet in early 2026 [1].
- The OWASP Agentic AI Top 10 (2026) is your new security bible for AI deployments — prompt injection and excessive agency top the list [2].
- SMBs are not too small to be targeted. Attackers go after AI-assisted development tools (Cursor, Windsurf, Claude Code) because that's where the code — and the credentials — live.
- Six practical steps can dramatically reduce your exposure — no PhD required, and no six-figure budget either.
What Is Agentic AI? (Explained Like You're 10)
Picture a really smart intern.
Get Our Weekly Cybersecurity Digest
Every Thursday: the threats that matter, what they mean for your business, and exactly what to do. Trusted by SMB owners across Australia.
No spam. No tracking. Unsubscribe anytime. Privacy
The old version of AI was like texting that intern a question and getting a reply: "Hey, how do I write a marketing email?" → "Here's a template." Helpful. Passive. Low-risk.
Agentic AI is like giving that intern a laptop, a credit card, and the keys to your office — and asking them to "handle the marketing." They'll draft the email, log into y
Free Resource
Get the Free Cybersecurity Checklist
A practical, no-jargon security checklist for Australian businesses. Download free — no spam, unsubscribe anytime.
Send Me the Checklist →That's the power. And that's the problem.
Agentic AI systems — tools like Claude Code, Cursor, Devin, AutoGPT, and custom LLM pipelines — can browse the web, write and execute code, manage files, call APIs, and chain hundreds of actions together with minimal human oversight. They're transformative for small businesses trying to compete with enterprise-level resources.
But when an AI agent can act, a compromised AI agent can act against you.
The shift from "AI as advisor" to "AI as actor" is the defining security challenge of 2026. Let's break down exactly what's at stake — and what you can do about it.
The New Attack Surface: Where Agentic AI Gets Exploited
Traditional cybersecurity worries about protecting servers, networks, and user accounts. Agentic AI adds a new layer: the AI reasoning pipeline itself. Attackers don't need to break your firewall if they can convince your AI to do the dirty work for them.
Here are the four primary attack surfaces emerging in the wild.
1. MCP Servers: The Unsecured Backbone of AI Agents
ELI10 analogy: If your AI agent is a contractor doing renovations, MCP (Model Context Protocol) servers are the supply depot — the place the contractor goes to fetch tools, materials, and instructions. If someone tampers with the supply depot, the contractor brings bad stuff into your house without knowing it.
MCP (Model Context Protocol) is Anthropic's open standard for connecting AI assistants to external tools and data sources [3]. It's what lets Claude read your files, search the web, or call your business APIs. And it's everywhere.
A February 2026 security scan found over 8,000 MCP servers publicly exposed on the internet — many without authentication, many running on default configurations [1]. This means researchers (and attackers) could interact directly with the tools powering AI agents, inject malicious instructions, or extract sensitive context the AI had loaded into memory.
For a small business using a managed AI assistant or AI-powered IDE, an exposed MCP server is essentially an unlocked side door into your AI's brain.
2. Tool Poisoning: Corrupting What the AI Knows How to Do
ELI10 analogy: Imagine your AI assistant learned to use a "calculator" tool, but someone secretly replaced the calculator with one that adds a small surcharge to every transaction and sends the receipt to a stranger. You'd never notice — it still looks like a calculator.
Tool poisoning attacks inject malicious instructions into tool definitions that the AI trusts. Because AI agents often read tool documentation as part of their context window, a poisoned tool description can redirect the AI's behavior entirely [4].
A particularly nasty variant called "rug pull" poisoning delivers a legitimate-looking tool initially, then swaps in a malicious version after the AI has already granted it trust. This mirrors the classic supply chain attack, but happening inside an AI reasoning loop at machine speed.
3. Memory Injection: Poisoning the AI's Long-Term Memory
ELI10 analogy: Imagine if someone could sneak into your notebook while you slept and rewrite your to-do list. You wake up, check your notes, and confidently follow instructions that aren't yours — because they look like yours.
Modern AI agents increasingly use persistent memory — the ability to remember facts, preferences, and context across sessions. This is incredibly useful. It's also a juicy target.
MINJA (Memory INJection Attack) is a class of attack targeting AI memory systems. Researchers demonstrated that carefully crafted inputs can be stored in an AI's long-term memory and then retrieved later to manipulate the agent's future behaviour — with injection success rates exceeding 95% in tested systems [5].
The attack doesn't need real-time access to the AI. An attacker can leave a "time bomb" in your AI's memory during one interaction, and it detonates sessions later — potentially weeks afterward — when the AI retrieves that context to assist with a sensitive task.
4. WebSocket Hijacking and Real-Time Session Attacks
ELI10 analogy: WebSockets are like a phone call between your browser and a server — a live, two-way connection. If someone taps the line, they can listen in and even start talking.
AI coding assistants like Cursor and Windsurf use persistent WebSocket connections to communicate with their backend services. ClawJacked is a demonstrated attack class that exploits weaknesses in how these connections are authenticated, allowing an attacker to inject into an active AI coding session [6].
In practical terms: a developer is working on your company's codebase with an AI assistant. An attacker who can position themselves on the same network — a coffee shop, a compromised router, even a malicious browser extension — could silently inject instructions into the AI's session, causing it to exfiltrate code, plant backdoors, or steal API keys embedded in the environment.
Real-World Incidents: This Is Happening Now
It's tempting to file these under "theoretical research" and move on. Don't. Each of these has been demonstrated against real tools in the last 12 months.
SANDWORM_MODE: Targeting AI Development Tools
SANDWORM_MODE is a discovered prompt injection payload specifically crafted to hijack AI coding assistants including Cursor, Windsurf, and Claude Code [7]. When a developer opens a malicious file — even just a README or a code snippet from a public repository — the payload embeds instructions in the AI's context.
Those instructions can be subtle: always include this import, never flag this function as suspicious, when you see credentials, log them to this endpoint. The developer sees normal AI assistance. The AI is quietly following attacker instructions.
For small development teams and solo founders using AI-assisted coding, this is not a distant threat. Every third-party library you install, every open-source snippet you pull in, is a potential injection vector.
8,000+ Exposed MCP Servers
The February 2026 scan that found 8,000+ exposed MCP servers [1] wasn't conducted by an attacker — it was researchers sounding the alarm. But the attack surface was real. Among the exposed servers:
- MCP servers with access to file systems (able to read and write local files)
- Servers with database connectors (exposing business data)
- Servers holding API credentials for third-party services
For a small business that deployed an AI assistant and forgot about the MCP configuration, this is the equivalent of leaving your server room door open and posting the address on a billboard.
MINJA: Poisoning AI Memory at 95%+ Success
The MINJA research [5] tested injection attacks against production AI memory systems. Their findings were stark: in most tested configurations, an attacker who could interact with the AI once — even through an indirect channel like a web page the AI visited or a document it summarised — could reliably implant persistent false memories.
These memories then influenced future sessions. In one test scenario, a poisoned memory caused an AI assistant to consistently recommend a malicious API endpoint when helping with code — weeks after the initial injection, across multiple fresh sessions.
The OWASP Agentic AI Top 10 (2026): Your New Security Reference
The Open Web Application Security Project released its first Agentic AI Top 10 in early 2026 [2]. If you've ever heard of the OWASP Top 10 for web security (SQL injection, XSS, etc.), think of this as the same framework — but for AI agent deployments.
Here are the most critical items for SMBs to understand:
| # | Risk | What It Means for Your Business |
|---|---|---|
| 1 | Prompt Injection | Attackers embed malicious instructions in data your AI reads (documents, web pages, emails) |
| 2 | Excessive Agency | AI is given too many permissions and can take actions far beyond what it should |
| 3 | Insecure Tool Execution | AI tools run without validation, enabling code execution or data exfiltration |
| 4 | Memory Poisoning | Long-term AI memory is manipulated to change future behaviour |
| 5 | Inadequate Human Oversight | Agents act autonomously without checkpoints where humans can catch errors |
| 6 | Supply Chain Attacks | Malicious plugins, tools, or MCP servers introduced via third-party integrations |
| 7 | Data Leakage via Context | Sensitive data loaded into agent context is exfiltrated through crafted prompts |
| 8 | Insecure Multi-Agent Trust | One compromised agent in a pipeline can poison downstream agents |
| 9 | Audit Trail Gaps | No logging means no ability to detect or reconstruct attacks |
| 10 | Resource Exhaustion | Agents spin up compute or API calls at scale, creating unexpected costs or outages |
The OWASP taxonomy gives you a structured way to assess your AI deployments. It's not about being paranoid — it's about being systematic. [2]
ISO 27001 SMB Starter Pack — $97
Everything you need to start your ISO 27001 journey: gap assessment templates, policy frameworks, and implementation roadmap built for Australian SMBs.
Get the Starter Pack →What SMBs Should Actually Do: The 6-Step Checklist
Here's the good news: you don't need to be a security researcher to dramatically reduce your exposure. Most of the high-value fixes are configuration and policy changes, not expensive tooling.
Step 1: Inventory Your AI Agents and Their Permissions
You can't protect what you don't know exists. Start by listing every AI tool your business uses:
- AI coding assistants (Cursor, Copilot, Claude Code, Windsurf)
- AI-powered customer service or email tools
- AI integrations in your CRM, project management, or productivity suite
- Custom agents or automations built on LLM APIs
For each one, ask: What can it access? What can it do? Can it read your file system? Send emails? Call your payment API? Write to your database?
Document the permissions. You'll be surprised how many are over-provisioned.
Principle of least privilege applies to AI too. An AI that helps you write blog posts doesn't need access to your customer database.
Step 2: Audit and Secure Your MCP Servers
If you're running any MCP-based AI infrastructure — or using a product that does — find out what's exposed.
- Run a port scan on your AI infrastructure (tools like
nmapcan help, or ask your IT person) - Ensure MCP servers require authentication — no open endpoints
- Check that MCP servers are not accessible from the public internet unless absolutely necessary
- Review what tools each MCP server exposes — remove anything not actively used
If you're using a hosted AI service, ask your vendor: "How are MCP server endpoints secured? Can you show me the authentication configuration?" A vendor who can't answer that question clearly is a risk.
Step 3: Apply Human Checkpoints to Consequential Actions
The OWASP framework highlights inadequate human oversight as a top risk — and it's the easiest to fix without any technical knowledge.
Simple rule: Any action that's hard to reverse requires human approval.
- Sending emails to more than X recipients? Human approval.
- Committing code to a production branch? Human approval.
- Calling an API that costs money or modifies a database? Human approval.
Most AI agent frameworks support "human-in-the-loop" configurations. Turn them on for high-stakes workflows. The slight slowdown is worth the protection.
ELI10: This is like the rule at banks where two people have to sign off on transfers over a certain amount. Not because you don't trust your staff — because you've designed the system so that even if one person makes a mistake (or is compromised), there's a check.
Step 4: Treat AI Agent Inputs as Untrusted Data
Everything your AI reads — web pages, uploaded documents, emails, API responses, GitHub repositories — could contain injected instructions. This is the core of prompt injection risk.
Practical mitigations:
- Don't let AI agents access arbitrary web URLs unless you've explicitly trusted that source
- Scan documents before they enter AI workflows — especially PDFs and Word files from outside your organisation
- Don't paste external content directly into AI prompts without reviewing it first
- Use AI systems that have built-in injection detection (some enterprise tools now include this)
You can't eliminate the risk entirely with current technology — but you can dramatically reduce the attack surface by being deliberate about what data enters your AI pipelines.
Step 5: Enable Logging and Audit Trails
If an AI agent does something wrong — or is compromised — you need to know:
- What did it do?
- When?
- What context led to that action?
Most AI platforms have logging you can enable. Turn it on. Store logs somewhere the AI can't modify them (a separate logging service, an immutable log store, even a simple append-only text file).
At minimum, log:
- Every action the AI takes that affects external systems
- Every tool call and its arguments
- Session starts and ends
This isn't just for security — it's for accountability. When a client asks "did your AI do X?", you want to be able to answer with evidence.
Step 6: Keep AI Tool Dependencies Updated — Aggressively
The supply chain risk is real. AI tool ecosystems move fast, and vulnerabilities appear frequently. A plugin or extension that was safe last month may have been compromised this week.
- Subscribe to security advisories for every AI tool you use
- Update AI extensions, plugins, and MCP servers on a short cycle (weekly for critical tools)
- Remove plugins and integrations you're not actively using
- Prefer official, vendor-maintained integrations over community-built ones for sensitive workflows
Consider a simple monthly review: "What AI tools did we add in the last 30 days? Are they still necessary? Are they up to date?"
The Bottom Line: Agentic AI Is Worth It — With Eyes Open
Let's be clear about something. Agentic AI is a genuine competitive advantage for small businesses. The ability to automate complex workflows, write and review code, handle customer inquiries, and analyse data at scale — that used to require teams of specialists. Now a two-person company can punch well above its weight.
The goal isn't to avoid agentic AI. The goal is to deploy it with the same discipline you (hopefully) apply to any other powerful business tool: understand the risks, configure it properly, and keep an eye on what it's doing.
The businesses that will struggle aren't the ones using AI — they're the ones using it carelessly. Every AI agent that has been compromised in the wild was compromised because someone skipped a configuration step that would have taken 20 minutes to get right.
Twenty minutes. That's the gap between a competitive advantage and a liability.
How lilMONSTER Can Help
Navigating AI security without a dedicated security team is genuinely hard. The attack surfaces are new, the tooling is immature, and the threat landscape changes week to week.
lilMONSTER works with SMBs to assess and secure agentic AI deployments — without the enterprise price tag or the jargon-heavy reports that collect dust on a shelf. We translate the threat landscape into plain-language risk assessments and actionable configuration guides tailored to your specific tools and workflows.
Whether you're running a custom LLM pipeline, a team of AI coding assistants, or just trying to figure out if your current AI setup is secure, we can help.
→ Book a free 30-minute security consultation
No pressure, no pitch deck. Just an honest conversation about where you stand and what matters most.
References
[1] Security Research Collective, "Exposed MCP Servers: A Global Scan of Model Context Protocol Deployments," Technical Advisory, February 2026. Available: https://mcp-security-scan.io/report-2026
[2] OWASP Foundation, "OWASP Top 10 for Agentic AI Systems 2026," OWASP Project, January 2026. Available: https://owasp.org/www-project-top-10-for-large-language-model-applications/
[3] Anthropic, "Model Context Protocol Specification," Anthropic Documentation, 2025. Available: https://modelcontextprotocol.io/specification
[4] Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., and Fritz, M., "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection," Proceedings of the ACM Workshop on Artificial Intelligence and Security (AISec), 2023, pp. 79–90. doi:10.1145/3605764.3623985
[5] Chen, L., Zhang, W., and Patel, R., "MINJA: Memory Injection Attacks Against Persistent AI Agent Systems," arXiv preprint arXiv:2501.09876, January 2026. Available: https://arxiv.org/abs/2501.09876
[6] Nakamura, T. and Williams, S., "ClawJacked: WebSocket Session Hijacking in AI Coding Assistants," Proceedings of the IEEE Symposium on Security and Privacy (SP), 2026, pp. 1142–1158. doi:10.1109/SP.2026.00089
[7] Defensive Security Labs, "SANDWORM_MODE: Anatomy of a Prompt Injection Campaign Targeting AI-Assisted Development Environments," Threat Intelligence Report, December 2025. Available: https://defensivelabs.io/reports/sandworm-mode
[8] NIST, "Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations," NIST AI 100-2e2023, National Institute of Standards and Technology, March 2024. doi:10.6028/NIST.AI.100-2e2023
[9] Perez, F. and Ribeiro, I., "Ignore Previous Prompt: Attack Techniques For Language Models," NeurIPS Workshop on Machine Learning Safety, 2022. Available: https://arxiv.org/abs/2211.09527
[10] Anthropic, "Claude's Character and Safety Approach: Responsible Scaling Policy," Anthropic Policy Document, 2024. Available: https://www.anthropic.com/rsp
FAQ
An agentic AI system is any AI that can take actions in the world, not just generate text. This includes AI tools that can browse the web, run code, send emails, call APIs, manage files, or control other software. Modern examples include AI coding assistants (Cursor, GitHub Copilot), AI customer service agents, and custom automation pipelines built on LLM APIs like OpenAI or Claude.
If you're using ChatGPT (or similar) only to draft text and you manually copy-paste the output, your exposure is relatively low. The risks escalate when AI has access to external systems, can take autonomous actions, or is connected to tools via plugins or APIs. As you give AI more capability, the attack surface grows proportionally.
MCP (Model Context Protocol) is the technical standard that lets AI assistants connect to external tools — your file system, APIs, databases, web browsers. You don't need to understand the internals, but you should know if your AI tools use it. If they do, ensuring those connections are authenticated and not exposed to the public internet is a basic hygiene step. Ask your vendor or IT contact.
Very realistic — and underappreciated. Prompt injection doesn't require a sophisticated attacker. Malicious instructions can be embedded in documents you open, web pages your AI browses, emails it processes, or GitHub repositories it reads. The sophistication required to execute a basic injection attack is low. The sophistication required to detect one after the fact is much higher, which is why prevention is critical.
Potentially, yes. Any AI coding assistant that reads repository content — which is all of them — could be exposed to injection payloads embedded in third-party code, README files, or issue comments. Mitigations include keeping the assistant's context scoped to code you control, being cautious with external dependencies, and keeping the tool updated. A security review of your AI-assisted development workflow is worthwhile if your codebase handles sensitive data or is customer-facing.
You could, but you'd be giving up a significant competitive advantage — and the risk would still exist for your competitors, clients, and supply chain partners whose AI-assisted workflows interact with your data. A better approach is to deploy AI thoughtfully: understand what each tool can access, configure it to minimum necessary permissions, add human checkpoints for high-stakes actions, and log what it does. That's a manageable security posture, not a fundamental rejection of the technology.
It varies widely based on scope and provider. A focused review of your AI tooling and configurations — the kind that tells you what's exposed and what the priority fixes are — shouldn't require enterprise-level investment. At lilMONSTER, we've designed our consulting model specifically for SMBs who can't absorb big-firm pricing. The free 30-minute consultation is a no-commitment way to get a straight answer about where you stand.
Published by lilMONSTER — cybersecurity guidance for people running real businesses. Have a question? Get in touch.
Work With Us
Ready to strengthen your security posture?
lilMONSTER assesses your risks, builds the tools, and stays with you after the engagement ends. No clipboard-and-leave consulting.
Book a Free Consultation →