Why does my automation create duplicate records?

Almost always one of three causes: the trigger fired more than once (a webhook re-delivery, a double-clicked form, an overlapping poll), a step retried after the write had already succeeded, or two scheduled runs read the same record before either finished writing. None of these are random. Each produces a second create, a second email, or a second charge unless a deduplication check sits in front of the step.

What should I use as a deduplication key?

A value that is identical every time the same business event happens, and unique across different events. An order ID, an invoice number, a webhook's event ID, or a lowercased email address are good keys. A freshly generated UUID, a timestamp, or a row count are bad keys, because they change on every run and so never match a prior attempt. The whole point is that a repeat of the same event produces the same key.

Does Zapier or n8n prevent duplicates automatically?

Only at the trigger, and only sometimes. Zapier deduplicates polling triggers on the item's id field, so a polled item triggers once. It does not dedupe webhook (Catch Hook) triggers or any action step. n8n has a Remove Duplicates node with a 'Remove Items Processed in Previous Executions' mode that remembers a key across runs, but you have to place the node and choose the key yourself. Make has no automatic action-level dedup; you build it with a Data store.

How long should a dedup gate remember a key?

At least as long as the duplicate can arrive. Stripe prunes idempotency keys after about 24 hours, which matches a typical retry window. If your trigger can re-deliver for a day, your store needs to remember the key for more than a day. If the key is a value that should only ever occur once, like an order ID, you can keep it for the life of the record. Too short reopens the gap; too long can block a legitimate repeat.

Is preventing duplicates the same as fixing retries?

Related but not the same. A retry is one source of duplicates, and the answer there is an idempotency key on the write. But most operator duplicates come from re-fired triggers and overlapping reads, where the destination tool accepts no key at all. The general fix is a deduplication gate you enforce in the workflow, which covers retries, re-deliveries, and overlaps in one place.

How to stop an automation from creating duplicates

An automation creates duplicates for one of three reasons: the trigger fired more than once, a step retried after a write had already gone through, or two scheduled runs read the same record before either finished. The fix is the same in all three cases. Put a deduplication check in front of every step that creates, sends, or charges, and key that check on a value derived from the business event, like an order ID or a webhook's event ID, not on a fresh value generated inside the run. The catch most guides skip: the tools operators actually write to, Google Sheets, Airtable, a CRM, an inbox, accept no idempotency key of their own, so you enforce the check in the workflow, not the destination.

That last point is the whole reason this is harder than it looks. Stripe and a handful of billing and email APIs accept an Idempotency-Key header and collapse repeats for you. The systems where most duplicates actually land do not. Append a row to a sheet twice and you get two rows. Send a Gmail twice and the customer gets two emails. There is no header to fix that. The dedup logic has to live one step earlier, in your flow, before the side effect runs.

Where duplicates actually come from

Before you build anything, name the source. The three causes need the same fix but they look different in the run history, and knowing which one you have tells you what to key on.

The first is a re-fired trigger. Webhooks use at-least-once delivery, which means a provider that does not get a fast 2xx back will send the same event again. A form behind a slow page gets submitted twice by an impatient user. A polling trigger with a too-short interval reads the same new record on two consecutive polls. In every case the automation starts twice for one real-world event.

The second is a retry of a non-idempotent write. The step failed, the platform re-ran it, and the original attempt had actually succeeded. This is the one that turns one failed run into two charges. It has its own full treatment in when should an automation retry a failed step, so I will not re-derive it here. The short version: a write that creates or sends something is only safe to retry if a repeat of it is recognized as the same operation.

The third is an overlapping read window. Two scheduled runs, or a manual run on top of a scheduled one, both query "records added since last time," and because neither has finished writing its results back, they both pick up the same record and both process it. This is the quiet one. It does not show up as an error anywhere. You just find two of something and cannot explain it. The duplicate-trigger and webhook-double-send failure modes are covered alongside the others in why automations silently break.

The fix is a key derived from the event, not generated in the run

Here is the part the top results get wrong. Most guides tell you to "add an idempotency key" and then show a node that generates a UUID. A UUID generated inside the run is useless for this. It is different on every execution, so the second attempt's key never matches the first, and the gate that was supposed to catch the duplicate waves it straight through. You have built a lock and thrown away the only thing it was meant to recognize.

A dedup key has to be deterministic: identical every time the same business event happens, and distinct across different events. That means you derive it from the data, not invent it. If the same order is processed twice, both runs must compute the same key from that order. The simplest correct keys are the IDs the source system already assigns.

Side effect	Bad key (changes every run)	Key derived from the event
Create a CRM contact	a fresh UUID	the lead's email, lowercased and trimmed
Send an order confirmation	a random message ID	`order_id` plus `"confirmation"`
Append a row to a sheet	the current row count or `now()`	the source record's primary ID
Post a Slack alert for a ticket	the timestamp	`ticket_id` plus the status it alerts on
Charge a card	a UUID generated per attempt	the invoice or order ID

Read the right-hand column and the pattern is obvious: the key is something that was already true about the event before your automation ran. The order had an ID. The webhook carried an event.id. The lead had an email. You are not creating identity, you are reusing the identity that already exists. Once you have that key, the gate is mechanical. Before the side effect, look the key up in a store. If it is there, stop. If it is not, record it, then run the step. The only real design decisions left are where the store lives and how long it remembers, and the platforms differ sharply on both.

What each tool dedupes for you, and what it does not

The dangerous assumption is that your platform already handles this. It handles a slice of it, at the trigger, and leaves the rest to you. Here is the honest map.

Tool	What it dedupes natively	What it does not
Zapier	Polling triggers dedupe on the item's `id` field; a polled item fires once and Zapier remembers seen IDs	Webhook (Catch Hook) triggers and every action step. Nothing after the trigger is deduped for you
n8n	The Remove Duplicates node, in "Remove Items Processed in Previous Executions" mode, remembers a key across runs and drops repeats	It is not automatic. You place the node, pick the key field, and set the scope (per-node or per-workflow) yourself
Make	Nothing at the action level	You build the gate with a Data store: search for the key, branch if found, write the key when you proceed
Destination API (Stripe and similar)	An `Idempotency-Key` header collapses repeats and returns the original result for about 24 hours	Only where the API supports it. Most CRMs, sheets, and email sends do not, so the header has nowhere to go

Zapier's trigger dedup is genuinely useful and worth understanding precisely: it works only when your data carries a unique id, and it covers the trigger, not the actions downstream. If your Zap's trigger is a webhook rather than a poll, even that protection is off. n8n gives you the most direct tool of the three, a node built for exactly this, but it does nothing until you add it and tell it what to key on. Make leans on its Data store, which is the same hand-built pattern as a check-then-write in any other system. The through-line is simple. Trigger-level dedup is partial and conditional. Action-level dedup, the kind that stops a duplicate charge or a duplicate email, is on you in every one of these tools.

This is also why "just turn on the platform's duplicate handling" is not an answer. There is no switch that covers the action steps. There is a node, a Data store, or an API header, and each one needs a deterministic key you chose on purpose.

How long should the gate remember a key

A dedup store is only as good as its memory. Remember a key for too short a window and the late duplicate sails through after the entry has expired. Remember it too long with a key that is supposed to repeat, and you block a legitimate run.

The sizing rule: remember a key for at least as long as the duplicate can plausibly arrive. Stripe sets the reference point by pruning idempotency keys after about 24 hours, which is tuned to a normal retry window. Match that logic to your source. If a webhook provider re-delivers failed events for up to a day, your store has to outlast a day. If a polling overlap can only happen within a few minutes, a short window is fine and keeps the store small.

Then check the nature of the key itself. A key that should occur exactly once in the life of the business, like an order ID or an invoice number, can be remembered for the life of that record with no downside. A key that is meant to recur, like a daily-summary job keyed only on the date, must be allowed to fire again tomorrow, so the key needs the date baked in rather than a fixed string. Get this backwards and you build the opposite bug: an automation that refuses to run when it should, which is harder to notice than a duplicate because nothing happens at all. Pair the gate with a clear run record so that when you do find two of something, you can tell which run created which, and when.

What to do next

Open your highest-stakes automation and find every step that creates, sends, charges, or deletes. For each one, answer two questions in order. What deterministic value identifies the business event behind this step, and where will I store that value to check it before the step runs. If the answer to the first is "a UUID I generate in the flow," you do not have a dedup key yet, you have a placeholder. Go back to the source data and find the ID that was already there.

Then decide the window. Match it to how long a duplicate can arrive from your specific trigger, and make sure any key that is meant to recur carries the recurring part inside it. We build this gate into every workflow automation system we ship: a deterministic key on the event, a check-then-write in front of every side effect, and a window sized to the trigger, so a re-delivery or a retry collapses into a single action instead of a second charge. If you have an automation that keeps producing duplicates and you cannot pin down which of the three sources is doing it, send us the flow and we will trace it.

How to stop an automation from creating duplicates

Where duplicates actually come from

The fix is a key derived from the event, not generated in the run

What each tool dedupes for you, and what it does not

How long should the gate remember a key

What to do next

Frequently Asked Questions

SOURCES & CITATIONS

About Alexey Yushkin

Related reading

Rolling back a broken automation isn't recovery

Webhook or polling trigger: which should you use?

How to get alerted when an automation stops running

Want this kind of system in your business?