Batch vs real-time AI calls in automations
Decide per AI step by whether a person is waiting on that exact result. If yes, call the model in real time. If not, send the work to the provider's async Batch API, which runs the same model at 50 percent lower token cost on a separate, higher rate-limit pool with a 24-hour completion window. The decision axis is latency tolerance, not volume or model choice.
Decide per AI step by whether a person is waiting on that exact result. If a human is blocked on this one item right now, call the model in real time. If nobody is waiting, send the work to the provider's async Batch API, which runs the same model at half the token cost on a separate, higher rate-limit pool, with results back inside 24 hours. The deciding factor is latency tolerance, not how many items you have and not which model you picked. Most operators get this backwards, looping a synchronous call over a list at full price for work that could have run overnight for half.
The reason it goes backwards is the tool, not the operator. Zapier, Make, and n8n are built around one event producing one item to process. So when you have five thousand records to classify, the natural move inside those tools is to loop the AI action five thousand times, each call synchronous and full-price. The async option that exists in every major provider's API is invisible from inside the no-code UI, so it never enters the decision.
The decision is latency tolerance, not volume
Volume feels like the axis. It is not. You can have one item that must be real time and a million that should be batched. The real question is who, if anyone, is blocked on the result of a single item.
A live chatbot answer is real time because a visitor is staring at a typing indicator. An inbound support ticket that needs routing is usually real time, because the faster it lands in the right queue the faster someone works it, though the model doing the routing can be a cheap classifier. A new lead your sales team calls within ten minutes is real time. In all three, the value of the answer decays in seconds.
Now flip it. Re-scoring your entire existing lead list against a new model. Summarizing last night's call transcripts before the morning standup. Categorizing two years of old tickets so you can report on them. Generating first drafts of next week's content. In none of these is a person waiting on any single record. The whole job has a deadline, the next morning or the end of the week, but no individual item does. That gap, between a job deadline and an item deadline, is exactly where batching wins and where the synchronous loop wastes money.
What the Batch API actually gives you
Both major providers expose an asynchronous batch endpoint. You submit a file of requests, the provider processes them when it has spare capacity, and you collect the results later. The trade is explicit: you give up immediacy and you get a discount plus a separate, much larger throughput allowance. The numbers below are current as of June 2026.
| Property | OpenAI Batch API | Anthropic Message Batches |
|---|---|---|
| Token discount | 50% off synchronous | 50% off synchronous (input and output) |
| Completion window | Within 24 hours, often faster | When all complete or after 24 hours, most under 1 hour |
| Requests per batch | Up to 50,000 | Up to 100,000 (or 256 MB) |
| Input size cap | Up to 200 MB | 256 MB per batch |
| Rate limits | Separate pool, does not consume your standard per-model limits | Separate batch rate limits |
| Results retention | Retrievable after completion | Results downloadable for 29 days |
Two of these rows matter more than the discount. The first is the separate rate-limit pool. When you loop a synchronous call over a big list, you are racing your own real-time traffic against the same per-minute token ceiling. You hit 429 errors, your platform retries, and now you are paying for failed calls and duplicate work. Batched requests draw from a different, higher allowance, so a five-thousand-item job does not throttle the chatbot serving your customers. If rate-limit errors are pushing you toward retry logic on an AI step, batching the non-urgent half of the load is the cleaner fix than tuning backoff.
The second is the 50 percent itself, and this is where it ties into total cost. We have written before that the model is usually a rounding error in an automation's monthly bill and the platform's billing unit dominates. Batching attacks both at once. It halves the token price, and because you submit one request containing many items through a single code or HTTP step, you can avoid multiplying the platform's per-step or per-execution charge across every record. A loop that costs you one platform operation per item becomes one operation for the whole job.
Why the no-code default is the expensive one
Here is the part worth sitting with. The most common AI automation pattern in Zapier, Make, and n8n is the one that should almost never be used for bulk work: trigger fires, AI action runs, repeat. It is the path of least resistance because it matches how those tools think. One item in, one item out.
That default is fine when the trigger is a single real event and a person is waiting. It is wasteful the moment you are processing a list. You pay full token price instead of half. You share your real-time rate limit instead of a separate pool. You incur a platform operation per item instead of one for the set. And the job runs slower, because synchronous calls go one at a time or a few in parallel, while a batch lets the provider fan the work out across its own fleet.
None of the three no-code platforms has a native async batch node, as of June 2026. Their AI integrations call the synchronous endpoint. So the cheaper path is not a setting you toggle. You have to build it, which is why most operators never do, and why the bill on a bulk AI job is often double what it needed to be.
What to batch and what to keep real time
A short mapping for common operator jobs. The test in every row is the same: is a person blocked on this single item.
| Job | Person waiting on one item? | Pattern |
|---|---|---|
| Chatbot reply to a live visitor | Yes | Real time |
| Route an inbound ticket to a queue | Yes | Real time, cheap model |
| Enrich and score a lead that just arrived | Usually | Real time |
| Extract fields from a just-uploaded invoice | Yes | Real time |
| Re-score or re-enrich an entire existing list | No | Batch |
| Summarize last night's transcripts or emails | No | Batch |
| Classify a backlog of old records | No | Batch |
| Generate draft content for the week | No | Batch |
| Extract fields from a nightly dump of documents | No | Batch |
The pattern is visible. Anything triggered by a live person or a fresh event with someone downstream is real time. Anything you could schedule for 2 a.m. and read in the morning is a batch. A bulk lead-enrichment pass like the kind behind leads.geninfos.com is the textbook batch case: thousands of records, no individual one urgent, a clear job deadline.
How to run a batch when your platform has no batch node
You build it as two flows. The first accumulates and submits. The second polls and processes the results. This is more work than dropping in an AI action, which is the honest reason the synchronous loop wins by default. It pays off on any recurring bulk job.
Flow one, submit. Collect the items you need processed, whether that is a query against your database, a read of new rows, or a file you assembled. Format each as a request line, include your structured-output schema so the results come back in a fixed shape instead of free text you have to parse, and POST the whole set to the provider's batch endpoint through an HTTP or code step. You get back a batch ID. Store it.
Flow two, retrieve. On a schedule, check the batch status by its ID. When it reports complete, download the results, match each one back to its source record by the custom ID you assigned, validate the values, and write them where they belong. Anthropic keeps results available for 29 days after creation, so a daily poll has ample margin. Build the same value-checking and human-review fallback you would for any AI output, because a batched answer can still be schema-valid and wrong.
One caution on the window. A batch can expire if the provider does not finish within 24 hours under heavy load. Design the retrieve flow to notice an expired or partial batch and resubmit the missing items, rather than assuming every batch returns clean. For work scheduled overnight against a morning deadline, that margin is almost always enough.
How to start
Pick one bulk AI job you already run as a per-item loop. Re-scoring a list, summarizing a daily dump, classifying a backlog, any job where no single item is urgent. Run it once through the provider's Batch API and compare the bill and the wall-clock time against the loop you have now. The token line will be roughly half, the platform-operation count will drop, and the job will likely finish faster because it stopped fighting your real-time rate limit.
Then make the rule permanent. For every new AI step, ask the one question before you build it: is a person waiting on this exact result. If yes, real time. If no, batch. That single check, applied per step, is what separates an automation that pays full price for patience it does not need from one that does not. If you want help drawing that line across an existing set of workflow automations, or want a bulk job moved off the synchronous loop, tell us what you are running.
Frequently Asked Questions
SOURCES & CITATIONS
- Batch API — OpenAIhttps://developers.openai.com/api/docs/guides/batch
- Batch processing — Anthropichttps://platform.claude.com/docs/en/docs/build-with-claude/batch-processing
About Alexey Yushkin
Alexey is the founder of GENERAL INFORMATICS LLC. He designs and ships AI and automation systems for businesses and operators across the US.
Related reading
Want this kind of system in your business?
We build practical AI and automation systems for operators. Send us your current workflow and we will show you what to automate first.
Request a Workflow Review