Should my automation call the AI for each item or in a batch?

Ask whether a person is waiting on that exact result. A chatbot reply, a ticket that needs routing now, a lead a salesperson will call in minutes: those are real time. Re-scoring an existing list, summarizing last night's transcripts, classifying a backlog of records: no one is blocked on any single item, so those belong in a batch. The split is latency tolerance, not how many items you have or which model you use.

How much cheaper is the Batch API?

Both OpenAI and Anthropic charge 50 percent of the synchronous token price for batched requests, as of June 2026. It is the same model and the same output quality. The discount applies to input and output tokens. You are paying less in exchange for accepting that the answer arrives within 24 hours instead of in a second.

How long does a batch take to come back?

The published completion window is 24 hours for both providers, but most batches finish much faster. Anthropic states most batches complete in under an hour. OpenAI says batches complete within 24 hours and often more quickly. If your work can wait until the next business morning, a batch almost always returns well before then.

Can I use structured outputs or a JSON schema with the Batch API?

Yes. A batched request is a normal model request submitted asynchronously, so the same structured-output and JSON-schema controls you use in a real-time call work inside a batch. That matters because the output of a bulk job usually feeds another step, so you still want a guaranteed shape rather than free text you have to parse.

Do Zapier, Make, and n8n have a batch button for AI calls?

No native async Batch API node, as of June 2026. Their built-in AI actions and the looping nodes call the synchronous endpoint once per item, at full price. To use the Batch API you submit it yourself through an HTTP or code step, then poll for the result in a second flow. The platforms default you into the per-item loop because their whole model is one-trigger-one-item.

Batch vs real-time AI calls in automations

Decide per AI step by whether a person is waiting on that exact result. If a human is blocked on this one item right now, call the model in real time. If nobody is waiting, send the work to the provider's async Batch API, which runs the same model at half the token cost on a separate, higher rate-limit pool, with results back inside 24 hours. The deciding factor is latency tolerance, not how many items you have and not which model you picked. Most operators get this backwards, looping a synchronous call over a list at full price for work that could have run overnight for half.

The reason it goes backwards is the tool, not the operator. Zapier, Make, and n8n are built around one event producing one item to process. So when you have five thousand records to classify, the natural move inside those tools is to loop the AI action five thousand times, each call synchronous and full-price. The async option that exists in every major provider's API is invisible from inside the no-code UI, so it never enters the decision.

The decision is latency tolerance, not volume

Volume feels like the axis. It is not. You can have one item that must be real time and a million that should be batched. The real question is who, if anyone, is blocked on the result of a single item.

A live chatbot answer is real time because a visitor is staring at a typing indicator. An inbound support ticket that needs routing is usually real time, because the faster it lands in the right queue the faster someone works it, though the model doing the routing can be a cheap classifier. A new lead your sales team calls within ten minutes is real time. In all three, the value of the answer decays in seconds.

Now flip it. Re-scoring your entire existing lead list against a new model. Summarizing last night's call transcripts before the morning standup. Categorizing two years of old tickets so you can report on them. Generating first drafts of next week's content. In none of these is a person waiting on any single record. The whole job has a deadline, the next morning or the end of the week, but no individual item does. That gap, between a job deadline and an item deadline, is exactly where batching wins and where the synchronous loop wastes money.

What the Batch API actually gives you

Both major providers expose an asynchronous batch endpoint. You submit a file of requests, the provider processes them when it has spare capacity, and you collect the results later. The trade is explicit: you give up immediacy and you get a discount plus a separate, much larger throughput allowance. The numbers below are current as of June 2026.

Property	OpenAI Batch API	Anthropic Message Batches
Token discount	50% off synchronous	50% off synchronous (input and output)
Completion window	Within 24 hours, often faster	When all complete or after 24 hours, most under 1 hour
Requests per batch	Up to 50,000	Up to 100,000 (or 256 MB)
Input size cap	Up to 200 MB	256 MB per batch
Rate limits	Separate pool, does not consume your standard per-model limits	Separate batch rate limits
Results retention	Retrievable after completion	Results downloadable for 29 days

Two of these rows matter more than the discount. The first is the separate rate-limit pool. When you loop a synchronous call over a big list, you are racing your own real-time traffic against the same per-minute token ceiling. You hit 429 errors, your platform retries, and now you are paying for failed calls and duplicate work. Batched requests draw from a different, higher allowance, so a five-thousand-item job does not throttle the chatbot serving your customers. If rate-limit errors are pushing you toward retry logic on an AI step, batching the non-urgent half of the load is the cleaner fix than tuning backoff.

The second is the 50 percent itself, and this is where it ties into total cost. We have written before that the model is usually a rounding error in an automation's monthly bill and the platform's billing unit dominates. Batching attacks both at once. It halves the token price, and because you submit one request containing many items through a single code or HTTP step, you can avoid multiplying the platform's per-step or per-execution charge across every record. A loop that costs you one platform operation per item becomes one operation for the whole job.

Why the no-code default is the expensive one

Here is the part worth sitting with. The most common AI automation pattern in Zapier, Make, and n8n is the one that should almost never be used for bulk work: trigger fires, AI action runs, repeat. It is the path of least resistance because it matches how those tools think. One item in, one item out.

That default is fine when the trigger is a single real event and a person is waiting. It is wasteful the moment you are processing a list. You pay full token price instead of half. You share your real-time rate limit instead of a separate pool. You incur a platform operation per item instead of one for the set. And the job runs slower, because synchronous calls go one at a time or a few in parallel, while a batch lets the provider fan the work out across its own fleet.

None of the three no-code platforms has a native async batch node, as of June 2026. Their AI integrations call the synchronous endpoint. So the cheaper path is not a setting you toggle. You have to build it, which is why most operators never do, and why the bill on a bulk AI job is often double what it needed to be.

What to batch and what to keep real time

A short mapping for common operator jobs. The test in every row is the same: is a person blocked on this single item.

Job	Person waiting on one item?	Pattern
Chatbot reply to a live visitor	Yes	Real time
Route an inbound ticket to a queue	Yes	Real time, cheap model
Enrich and score a lead that just arrived	Usually	Real time
Extract fields from a just-uploaded invoice	Yes	Real time
Re-score or re-enrich an entire existing list	No	Batch
Summarize last night's transcripts or emails	No	Batch
Classify a backlog of old records	No	Batch
Generate draft content for the week	No	Batch
Extract fields from a nightly dump of documents	No	Batch

The pattern is visible. Anything triggered by a live person or a fresh event with someone downstream is real time. Anything you could schedule for 2 a.m. and read in the morning is a batch. A bulk lead-enrichment pass like the kind behind leads.geninfos.com is the textbook batch case: thousands of records, no individual one urgent, a clear job deadline.

How to run a batch when your platform has no batch node

You build it as two flows. The first accumulates and submits. The second polls and processes the results. This is more work than dropping in an AI action, which is the honest reason the synchronous loop wins by default. It pays off on any recurring bulk job.

Flow one, submit. Collect the items you need processed, whether that is a query against your database, a read of new rows, or a file you assembled. Format each as a request line, include your structured-output schema so the results come back in a fixed shape instead of free text you have to parse, and POST the whole set to the provider's batch endpoint through an HTTP or code step. You get back a batch ID. Store it.

Flow two, retrieve. On a schedule, check the batch status by its ID. When it reports complete, download the results, match each one back to its source record by the custom ID you assigned, validate the values, and write them where they belong. Anthropic keeps results available for 29 days after creation, so a daily poll has ample margin. Build the same value-checking and human-review fallback you would for any AI output, because a batched answer can still be schema-valid and wrong.

One caution on the window. A batch can expire if the provider does not finish within 24 hours under heavy load. Design the retrieve flow to notice an expired or partial batch and resubmit the missing items, rather than assuming every batch returns clean. For work scheduled overnight against a morning deadline, that margin is almost always enough.

How to start

Pick one bulk AI job you already run as a per-item loop. Re-scoring a list, summarizing a daily dump, classifying a backlog, any job where no single item is urgent. Run it once through the provider's Batch API and compare the bill and the wall-clock time against the loop you have now. The token line will be roughly half, the platform-operation count will drop, and the job will likely finish faster because it stopped fighting your real-time rate limit.

Then make the rule permanent. For every new AI step, ask the one question before you build it: is a person waiting on this exact result. If yes, real time. If no, batch. That single check, applied per step, is what separates an automation that pays full price for patience it does not need from one that does not. If you want help drawing that line across an existing set of workflow automations, or want a bulk job moved off the synchronous loop, tell us what you are running.

Batch vs real-time AI calls in automations

The decision is latency tolerance, not volume

What the Batch API actually gives you

Why the no-code default is the expensive one

What to batch and what to keep real time

How to run a batch when your platform has no batch node

How to start

Frequently Asked Questions

SOURCES & CITATIONS

About Alexey Yushkin

Related reading

How to test an AI automation before you trust it

How to stop an automation from creating duplicates

Why Your AI Automation Returns Broken JSON

Want this kind of system in your business?