Why automations silently break: 6 failure modes and fixes
Automations fail silently in six recurring ways: duplicate triggers, partial failure mid-flow, missing idempotency keys, webhook-retry double-sends, polling race conditions, and no dead-letter handling. Each has a specific fix in n8n, Zapier, and Make that prevents double-charges and dropped records.
Most automations do not fail with a red error. They fail quietly. The same invoice goes out twice, a lead never reaches the CRM, a customer gets charged again when a webhook retries. These are not random bugs. They are six recurring failure modes, and each one has a specific fix in n8n, Zapier, and Make. If your flow touches money, messages, or records, you have to design for them before they cost you a client.
The hard part is that these failures are invisible while you build. A workflow that double-sends looks perfect in testing, because you only fired the trigger once. The problem shows up later, in production, when the provider retries or two records land in the same second. Here is the full list, what each one looks like from the operator's seat, and how to close it in each tool.
The six ways an automation fails silently
Read this as symptoms first. You almost never see the technical cause. You see the result a customer or a teammate reports.
| Failure mode | What you actually see | Root cause |
|---|---|---|
| Duplicate trigger firing | The flow ran twice for one event | The trigger fired more than once, or two processes both picked up the job |
| Partial failure mid-flow | Half the steps ran, half did not | A step failed after an earlier irreversible step already committed |
| No idempotency key | The same source record processed twice | Nothing checks whether this exact item was already handled |
| Webhook retry double-send | A duplicate charge, email, or message | The provider re-delivered the same event, as most webhooks are allowed to |
| Polling race condition | New records skipped or reprocessed | The source API returned items out of order or without a stable cursor |
| No dead-letter handling | An item silently never arrived | A run failed and the failed item was discarded with no error you would notice |
Five of these six are about doing something twice or not at all. That is the whole game in automation reliability. The systems you connect mostly promise to deliver each message at least once, which is a polite way of saying sometimes more than once. Your job is to make a second delivery harmless.
The fix grid: n8n, Zapier, and Make
Most reliability advice is written for one platform, or for developers wiring raw webhooks in code. Operators run no-code and low-code tools. This is the same six failure modes mapped to the specific node, setting, or pattern that fixes each one in the three tools small businesses actually use.
| Failure mode | Fix in n8n | Fix in Zapier | Fix in Make |
|---|---|---|---|
| Duplicate trigger | On self-hosted queue mode set N8N_DISABLE_PRODUCTION_MAIN_PROCESS=true so only workers run jobs; on Cloud this is handled | Polling triggers dedupe on the id field automatically; instant (REST hook) triggers do not, so add a Filter or Storage check | Check a Data Store for the record key before the action, or tighten the schedule so two runs cannot overlap |
| Partial failure | Order steps so the irreversible one is last; add an Error Trigger workflow to catch and log the rest | Reorder so the irreversible step is last; Autoreplay retries transient failures | Set the failing module to Break so the run goes to Incomplete Executions and resumes from that module, not the start |
| No idempotency key | Remove Duplicates node, "Remove Items Processed in Previous Executions", keyed on the event id (history defaults to 10,000 items) | Use the source record's stable id as the dedupe key; for updates, synthesize id + "-" + updatedAt | Data Store keyed on the event id; look it up and only continue if it is new |
| Webhook retry double-send | Dedupe on the provider's event id before the side-effecting node | Storage check on the event id before the action step | Data Store lookup on the event id before the action |
| Polling race | Sort the source newest-first, or dedupe on a monotonic key with Remove Duplicates | Return items in reverse chronological order keyed on id; that is how Zapier dedupes polling | Sort the search module and store a last-seen id or timestamp as a cursor |
| No dead-letter | Error Trigger workflow that writes failures to a table or Slack; set node-level retries | Watch Zap History and route failures to a "failed items" sheet; enable Autoreplay on paid plans | Break directive sends failures to the Incomplete Executions queue for retry; enable storing incomplete executions in scenario settings |
This grid is the article. The rest is one worked example and the checklist you can run on a flow you already have.
A worked example: the Stripe payment that charges twice
Take a common small-business flow. A customer pays through Stripe, and your automation creates an invoice record and posts a "new sale" message to Slack. You test it once, it works, you ship it.
Two weeks later a customer emails: they got two invoices for one purchase. Your Slack channel shows the sale twice. Here is what happened. Stripe states plainly that a webhook endpoint "might occasionally receive the same event more than once," and it retries delivery for up to three days with exponential backoff in live mode. Your flow has no memory. Every time the event arrives, it creates another invoice.
The fix is one step, placed first. After the webhook trigger, before the invoice and the Slack message, you add a dedupe gate keyed on the Stripe event id, the value that starts with evt_. If you have seen that id before, stop. If it is new, record it and continue. In n8n that is the Remove Duplicates node set to remove items processed in previous executions. In Zapier it is a Storage lookup or a Filter on the event id. In Make it is a Data Store check. Stripe itself recommends exactly this: log the event ids you have processed, and do not process already-logged events.
One detail matters for the store you dedupe against. Because Stripe retries for up to three days, the record of handled ids has to outlive that window. n8n's Remove Duplicates node keeps a default history of 10,000 items, which is fine for most small-business volume. If you genuinely process more than 10,000 events in three days, raise the history size or move the dedupe to a database table. Most operators never hit that, but it is the kind of thing that bites at the worst time, so size it on purpose.
The order of operations problem
Idempotency stops repeats. It does not save you from a flow that dies halfway. That is the partial-failure mode, and the cheapest defense costs nothing: reorder your steps.
Put the irreversible action last. If your flow charges a card, sends an email, or creates an external record, do everything that can fail cheaply first, then do the one thing you cannot take back. A flow that validates the data, looks up the customer, builds the message, and only then sends it will fail before the send when something is wrong. A flow that sends first and validates second has already done the damage when it errors.
When you cannot reorder, catch the failure instead of letting it vanish. This is the difference between Make's Break directive and its default behavior. Make gives you five error directives: Ignore, Resume, Commit, Rollback, and Break. Break is the one that matters for reliability, because it sends the failed run to an Incomplete Executions queue where you can fix the issue and reprocess it without losing the data. The default is to discard. n8n's equivalent is an Error Trigger workflow that fires when any run fails and logs the item somewhere you will see it. Zapier surfaces failures in Zap History, with Autoreplay to retry transient ones on paid plans. Pick one per tool and turn it on. A failure you can see and retry is an inconvenience. A failure that disappears is a lost customer.
The six-line pre-launch reliability check
Before you turn on any flow that touches money, messages, or customer records, run it against these six questions. We use a version of this on every client build, including the lead-routing pipeline behind leads.geninfos.com.
- Does every irreversible step have an idempotency key, a stable id you check before acting?
- Is the irreversible action the last step, after everything that can fail cheaply?
- If the trigger is a webhook, are you assuming it can fire the same event more than once?
- When a run fails halfway, does the item land somewhere visible and retryable?
- On self-hosted n8n in queue mode, is
N8N_DISABLE_PRODUCTION_MAIN_PROCESS=trueso the main process is not double-running jobs? - Do you get an alert when a run fails, or do failures only surface when a customer complains?
If you cannot answer yes to a line, that line is your next hour of work. Question one and question four catch the most expensive failures, so start there.
How to harden a flow you already have
You do not have to rebuild anything. Open your highest-stakes automation, the one that touches payments or leads, and walk it through the grid above. Find the first side-effecting step and ask what happens if the trigger fires twice. If the answer is "two of something the customer sees," add the dedupe gate. Then check where a failed item goes today. If the answer is "nowhere," wire up the error path for your tool.
This is unglamorous work, and it is the difference between an automation that saves you time and one that quietly creates cleanup. We build workflow automation systems with these defenses in from the start, and we also run free reviews of flows operators have already built. Bring the one that scares you most, and we will tell you where it can fail twice.
Frequently Asked Questions
SOURCES & CITATIONS
- Receive Stripe events in your webhook endpoint — Stripe Documentationhttps://docs.stripe.com/webhooks
- How deduplication works in Zapier — Zapier Platform Docshttps://docs.zapier.com/platform/build/deduplication
- Remove Duplicates node documentation — n8n Documentationhttps://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.removeduplicates/
- Error handlers — Make Help Centerhttps://help.make.com/error-handlers
About Alexey Yushkin
Alexey is the founder of GENERAL INFORMATICS LLC. He designs and ships AI and automation systems for small businesses and operators across the US.
Related reading
Want this kind of system in your business?
We build practical AI and automation systems for operators. Send us your current workflow and we will show you what to automate first.
Request a Workflow Review