Does prompt caching work automatically in Zapier, Make, or n8n?

Partly. OpenAI caching is automatic and needs no parameter, so it fires through any node as long as the prompt prefix is byte-identical across calls. Anthropic caching requires an explicit cache_control breakpoint, which the native AI nodes in those tools generally do not expose as of June 2026, so you set it through a raw HTTP request step. Either way, the discount only applies if a later call reuses the same static prefix before the cache expires.

Why am I not getting a cache discount even though caching is on?

Two usual causes. Your automation runs less often than once every five minutes, so the cache evicts between runs and every call is a cold miss. Or your prompt puts the variable data (this lead, today's date, this invoice) before the static instructions, so the prefix changes every call and there is nothing shared to cache. Fix the second by moving all static content to the front and the variable record to the end.

How often does my automation need to run for caching to help?

Roughly more than once every five minutes against the same static prefix. Both providers evict the cache after about five to ten minutes of inactivity, and Anthropic's default lifetime is five minutes, refreshed free on each read. A workflow that processes a lead every forty minutes or an invoice a few times a day is cold on every run. A burst that drains a queue in seconds, or a loop over a list, reuses the prefix while it is still warm.

Is prompt caching cheaper than the Batch API for bulk work?

For a one-off bulk job, the Batch API is usually the cleaner win. It gives 50 percent off with no cadence or prefix-ordering requirement, on a separate rate-limit pool. Caching can stack on top of a high-frequency real-time workload, but it does nothing for a low-frequency job that never reuses a warm prefix. If no one is waiting on each item, batch first, then consider caching the shared prefix inside the batch.

What is the minimum prompt size that can be cached?

1,024 tokens on OpenAI and on Claude Sonnet 4.6 and Opus 4.8, and 4,096 tokens on Claude Haiku 4.5, as of June 2026. If your static prefix is shorter than the model's minimum, nothing caches no matter how often you call. A short system prompt plus a small record will never qualify; a long instruction block with examples and a schema usually will.

Prompt caching in automations: when it pays off

Prompt caching discounts the part of your prompt that stays identical from one call to the next, but only if a later call reuses that exact prefix before the cache expires, which is about five minutes by default. Most business automations fail both conditions. They fire less often than once every five minutes, so the cache is cold on every run, and they build the prompt with the variable record at the top, so there is no shared prefix to cache in the first place. Caching earns its discount, up to 90 percent off the cached tokens, on bursty or batched workloads that send many similar calls inside the window, structured with the static content first. For the typical low-frequency automation it does nothing, and on Anthropic a misplaced cache breakpoint can make the call cost more than not caching at all.

The advice you will read everywhere is "turn on caching, cut your AI bill by 90 percent." That number is real at the API level. It is also irrelevant to most of the automations operators actually run, for reasons the cost-cutting guides never mention. The discount is conditional, and the conditions are exactly the ones a no-code workflow tends to break.

What prompt caching actually discounts

Caching does not discount your prompt. It discounts the longest run of tokens at the start of your prompt that is byte-for-byte identical to a request the provider has recently seen. That shared opening is the prefix: usually your system instructions, few-shot examples, and output schema. The variable part (this customer's message, this invoice, today's date) is never cached, because it is different every time. The whole mechanism rests on reuse of a stable prefix.

The numbers are current as of June 2026.

	OpenAI	Anthropic
How you turn it on	Automatic, no parameter	Explicit cache_control breakpoint
Discount on cached tokens	Up to 90 percent off on current GPT-5.x models	Cache read is 10 percent of the input price, a 90 percent discount
Cost to write the cache	None	25 percent surcharge on the cached tokens for the 5-minute cache, 100 percent for the 1-hour cache, paid once per write
Minimum prefix to cache	1,024 tokens	1,024 tokens (Sonnet 4.6, Opus 4.8), 4,096 (Haiku 4.5)
Default lifetime	Evicts after 5 to 10 minutes of inactivity, max about 1 hour	5 minutes, refreshed free on every read
What must match	Longest identical prefix from the very start	Everything up to and including the breakpoint

Two rows decide whether caching does anything for you, and neither is the discount. The lifetime row sets how often you have to call to keep the cache warm. The "what must match" row sets how you have to build the prompt so there is a shared prefix at all. Get either wrong and the 90 percent never arrives.

Why most automations never get a cache hit

Two structural facts about business automations defeat caching, and they are independent. You can fix one and still lose to the other.

The first is cadence against the cache window. Both providers evict the cache after roughly five to ten minutes of inactivity. Anthropic's default lifetime is five minutes, extended only when a read refreshes it. Now look at how often a real workflow fires. A lead-intake automation might run when a form is submitted, which could be every forty minutes. An invoice-extraction flow runs a handful of times a day. A nightly summarization job runs once. In every one of those, the gap between calls is longer than the cache lives, so each run starts from a cold cache and pays full price. Caching only stays warm under sustained traffic: a chatbot mid-conversation, a queue being drained, a loop running back to back. The low-frequency, event-triggered pattern that defines most operator automations is the worst case for it.

The second is prefix ordering, and this one is self-inflicted. The discount applies only to the identical opening of the prompt. If your prompt template interpolates the variable record first ("Lead: Jane Doe, ACME Corp, submitted 14:32. Now follow these instructions...") then the prefix is different on every single call and nothing is shared, even under heavy traffic. Both providers say the same thing in their docs: put static content at the beginning and variable content at the end. OpenAI states it as structure guidance; Anthropic enforces it through where you place the cache breakpoint. No-code AI nodes and hand-written prompt templates routinely do the opposite, dropping the record at the top because that reads naturally to a human. That single ordering choice is a guaranteed cache miss on every run.

This connects to a point we have made before: the model is usually a rounding error in an automation's monthly bill, and the platform's per-step billing dominates. Caching attacks the rounding error, not the dominant cost. Even when it works perfectly, you are shaving the smallest line on the invoice. That is worth doing when it is free, which on OpenAI it is. It is not worth contorting a workflow around.

The Anthropic trap: a breakpoint that costs more

On OpenAI, a cache miss is harmless. Caching is automatic and free, so when the prefix does not match you simply pay the normal input price, the same as if caching did not exist. There is no downside to a miss.

Anthropic is different, and this is the part almost no guide flags. To cache on Anthropic you place a cache_control breakpoint, and writing the cache costs 25 percent more than the base input price for the cached tokens (100 percent more for the one-hour cache). You pay that write premium every time the cache is cold. If your automation runs less often than the cache lives, every run is a cold write with no read to amortize it, so you pay the surcharge and never collect the discount.

Walk the math on a Sonnet-class model at $3 per million input tokens, with a 3,000-token static prefix and 200 tokens of variable lead data.

No caching: 3,200 tokens at $3 per million is about $0.0096 per run.
Warm cache read: the 3,000 static tokens at 10 percent plus 200 at full price is about $0.0015 per run, roughly 84 percent cheaper. This is the number the guides quote.
Cold cache write, which is what a low-frequency flow gets on every run: the 3,000 tokens at the 1.25x write rate plus 200 at full price is about $0.012 per run, about 25 percent more expensive than not caching at all.

So an operator who reads "caching saves 90 percent," switches it on for a flow that fires every forty minutes, and walks away has not saved anything. They have raised the per-run cost by a quarter, quietly, on every run, because the cache is never warm when the next one arrives. Caching on Anthropic is a bet that a read will follow your write inside five minutes. If you cannot guarantee that, do not place the breakpoint.

When caching actually pays off

The decision is one question: will another call reuse this exact static prefix within the cache window? If yes, structure the prompt static-first and cache. If no, do not bother, and on Anthropic actively leave it off.

Workload	Reuses a warm prefix?	Caching verdict
Live chatbot mid-conversation	Yes, turns arrive seconds apart	Cache the system prompt and history
Draining a queue or looping a list back to back	Yes, calls are continuous	Cache the shared instructions
Many parallel calls in one burst	Yes, but warm the cache with one call first	Cache after the first response returns
Lead intake every 30 to 60 minutes	No, cache evicts between runs	Skip it (OpenAI), leave off (Anthropic)
Invoice extraction a few times a day	No, always cold	Skip it
Nightly summarization, once per run	No	Skip it; batch instead

A practical note for the burst case. The cache entry only becomes available after the first response starts coming back. If you fire fifty parallel calls at once from a cold cache, all fifty write the cache and none of them read it, so you pay fifty writes. Send one call first, let it return, then fan out the rest against the now-warm prefix. That ordering is the difference between paying the write premium once and paying it fifty times.

What to do instead, and how to start

For the most common expensive case, a bulk job where no single item is urgent, caching is the wrong lever. The right one is the asynchronous Batch API, which runs the same model at half the token cost with no cadence requirement and no prefix-ordering gymnastics. You can stack caching on top of a batch if every request shares a long static prefix, but the 50 percent batch discount does the heavy lifting and arrives without any of the conditions caching imposes. Reach for batching first on overnight and backlog work, and reach for caching only when sustained real-time traffic shares a stable prefix.

If you do want the caching discount, the build is small and worth getting right. First, restructure every AI prompt in the flow so all static content (instructions, examples, schema, reference text) sits at the top and the per-call variable goes last. That one change is free, it helps on OpenAI automatically, and it is the precondition for any cache hit on either provider. Second, confirm your static prefix clears the model's minimum, 1,024 tokens for most current models, or nothing caches regardless of frequency. Third, on Anthropic, only place the breakpoint where you can reasonably expect a read within five minutes, and use the one-hour cache only if the read might lag. On a flow that fires every forty minutes, the honest answer is to leave caching off and look at batching or model choice instead.

Pick one AI step you run often and check its real cadence and its prompt order before you touch a caching setting. If it does not fire more than once every few minutes against an identical prefix, caching is not your cost lever and you should stop optimizing it. If you want a second set of eyes on where the money in your workflow automations actually goes, and which steps are worth caching, batching, or leaving alone, tell us what you are running.

Prompt caching in automations: when it pays off

What prompt caching actually discounts

Why most automations never get a cache hit

The Anthropic trap: a breakpoint that costs more

When caching actually pays off

What to do instead, and how to start

Frequently Asked Questions

SOURCES & CITATIONS

About Alexey Yushkin

Related reading

Batch vs real-time AI calls in automations

How to test an AI automation before you trust it

How to stop an automation from creating duplicates

Want this kind of system in your business?