Workflow Automationn8nZapierMakeOperations

How to get alerted when an automation stops running

Built-in error alerts in n8n, Zapier, and Make only fire when a workflow runs and fails, so they cannot catch the failure that matters most: the automation that silently stops running. To catch that, add a heartbeat or dead man's switch that alerts on the absence of an expected signal, not on the presence of an error.

Alexey YushkinFounder, GENERAL INFORMATICS3 min read

Your automation platform's built-in error alert has a blind spot, and it is the failure that costs the most. Error alerts fire when a workflow runs and a step fails. They cannot fire when the workflow never runs at all. A deactivated flow, an expired OAuth token at the trigger, a schedule that quietly stopped, the instance itself being down: none of these produce an error, because none of them produce a run. To catch the automation that silently stopped, you need the inverse of error alerting. You need a heartbeat that pages you when an expected run does not arrive.

This is the failure operators discover the slow way. The intake automation that has run flawlessly for four months stops one Tuesday, and nobody notices until Friday when a customer asks why they never heard back. The platform was silent the whole time, and it was right to be, by its own logic. Here is why error alerts miss this, and the three-layer setup that closes the gap.

Why your error alerts have a blind spot

Every error notification in n8n, Zapier, and Make is downstream of an execution. Something has to run before something can fail. n8n's error workflow, set with an Error Trigger node, fires when a workflow execution fails. Zapier surfaces failures in Zap History and will turn a Zap off if 95 percent of its runs error in the last 7 days. Make routes a failed run to Incomplete Executions when you set a Break directive. All three are real and worth turning on. All three share the same precondition: a run has to happen.

So picture the failures that produce no run. The workflow got deactivated, maybe by you during a fix, maybe by the platform after a string of errors, and never got turned back on. The trigger's OAuth credential expired, so the polling trigger cannot even start. The scheduled trigger stopped firing after a platform incident and did not resume. The self-hosted n8n container crashed and did not restart. In every one of these, the count of failed runs is zero, because the count of runs is zero. There is nothing for an error alert to react to. The automation is dead and the dashboard is green.

This is not a bug in those platforms. It is a category they do not cover. Error monitoring answers "did a run fail?" It does not answer "is this thing still alive?" Those are different questions, and the second one needs a different tool.

The two kinds of failure, and what catches each

Sort every automation failure into two buckets by one test: does it produce an error event?

FailureProduces an error event?What catches it
A step throws mid-run (bad data, API 500, timeout)YesBuilt-in error alert or error workflow
A run partially completes then failsYesError workflow, Make Incomplete Executions
Workflow deactivated and not re-enabledNoHeartbeat / dead man's switch
Trigger credential (OAuth) expiredUsually noHeartbeat, plus credential-expiry reminders
Schedule silently stopped firingNoHeartbeat
Self-hosted instance downNoHeartbeat (hosted off the instance)
Runs fine but processes zero real itemsNoBusiness-metric / volume check

The top two rows are the ones your platform already handles. The bottom five are the ones that take down a working automation for days, and not one of them shows up as an error. The last row is the sneakiest of all, and we will come back to it, because a flow that runs green while doing nothing useful is its own failure mode.

What "monitoring on absence" actually means

A heartbeat, also called a dead man's switch, flips the logic. Instead of waiting for something to go wrong, it waits for something to go right, and alerts you when that something is late. Your workflow sends a short HTTP ping to a monitoring service every time it finishes successfully. The service knows how often to expect that ping. As long as pings arrive on time, it stays quiet. The moment one is overdue past its grace window, it pages you.

Healthchecks.io describes exactly this model: it listens for pings from your jobs and stays silent while they arrive on schedule, then raises an alert as soon as one does not. The reason it works for the silent failures is that it does not depend on your automation running. It depends on your automation having run. If the flow is deactivated, the ping never comes, and the absence is the signal. The monitor lives outside your automation platform, so it survives the platform itself going down, which an in-platform check cannot.

You set two numbers per check: the expected period and a grace time. A flow that should finish hourly gets a 60-minute period and maybe a 15-minute grace, so you are alerted after 75 minutes of silence rather than on the first slightly slow run. That grace window is what keeps a heartbeat from crying wolf on normal variation.

What each platform gives you, and the gap it leaves

This is the part most "monitor your automations" advice skips. Each tool has real built-in handling, and each leaves the same hole.

PlatformBuilt-in failure handlingWhat it does not catch
n8nError Trigger workflow fires on a failed execution; node-level retriesAn inactive workflow, a dead instance, or a schedule that never fired produces no failed execution to trigger on
ZapierZap History plus auto-off when 95 percent of runs error in 7 days, with an owner email and a 24h (Team) or 72h (Enterprise) grace periodA trigger that stops returning items, or a manually-paused Zap, never crosses the error threshold, so no email is sent
MakeBreak directive sends failed runs to Incomplete Executions; scenario auto-deactivates after repeated errors with a noticeA scenario you forgot to re-enable, or one whose trigger went quiet, generates no error and no deactivation notice

Read down the right column. It is the same sentence three times: the safety net is woven from error events, and these failures throw none. That is not a knock on the tools. It is the precise reason a heartbeat is not optional for any automation you actually depend on.

A three-layer alerting setup you can build this week

You do not need an observability stack. You need three layers, each answering a different question, and you can stand all three up in an afternoon.

Layer 1, the heartbeat: did it run at all? Create one check per critical workflow in a heartbeat service. Add a final step to the workflow that pings the check's URL only on success. In n8n that is an HTTP Request node at the end of the happy path. In Zapier it is a Webhooks by Zapier POST as the last action. In Make it is an HTTP module after your last real step. Set the period and grace to match the schedule. This single layer catches all five silent failures from the table above, because every one of them stops the ping.

Layer 2, the error alert: did a run fail? Turn on what the platform already offers. Build the n8n Error Trigger workflow that writes failures to Slack or a table. Set Make modules to Break so failures land in Incomplete Executions. In Zapier, confirm error notifications go to an inbox a human reads, not a shared alias nobody checks. This layer catches the loud failures, the top two rows.

Layer 3, the business-metric check: did it do the right amount of work? This is the layer almost nobody builds, and it catches the failure that hides in plain sight. An intake flow that normally creates 15 to 30 CRM leads a day can run green while creating zero, because an upstream form provider changed its payload and now every record fails a filter silently. No error, healthy heartbeat, empty pipeline. The fix is a second scheduled workflow that counts the real output over a window and alerts if it falls outside a sane range. "Alert if today's processed-lead count is under five by 4 p.m." catches the quiet drift that the first two layers wave through. If you already keep a structured run log, this check reads straight off it. (For more on why working automations break without a sound, see our piece on why automations silently break.)

Three questions, three layers: is it alive, did a run fail, and is it producing the right volume. Most teams build the middle one, assume it covers them, and find out otherwise during an outage.

How to start

Pick your single highest-stakes automation, the one whose silent death would cost a customer or a sale, and give it Layer 1 today. Sign up for a heartbeat service, create one check, add the success ping as the last step, and set the period to the flow's real cadence. That one move converts your worst blind spot into a page within the hour. Add Layer 2 and Layer 3 to that same flow over the week, then repeat for the next automation down your list. You do not have to instrument everything. You have to instrument the ones you cannot afford to lose quietly.

This is the operational layer we build into every system we ship, alongside the run logging and dead-letter handling that make a failure visible instead of invisible. If you want monitoring designed in from the start, that is the core of our workflow automation systems and operational intelligence systems. Or bring us the automation that would hurt most if it died on a Friday, and we will help you wire the alert that wakes you before your customer does.

Frequently Asked Questions

SOURCES & CITATIONS

  1. Error handling n8n Documentationhttps://docs.n8n.io/flow-logic/error-handling/
  2. How to troubleshoot errors in Zap workflows Zapier Help Centerhttps://help.zapier.com/hc/en-us/articles/8496037690637-How-to-troubleshoot-errors-in-Zap-workflows
  3. Incomplete executions Make Help Centerhttps://help.make.com/incomplete-executions
  4. How to Monitor Cron Jobs with Healthchecks.io Healthchecks.io Documentationhttps://healthchecks.io/docs/monitoring_cron_jobs/

About Alexey Yushkin

Alexey is the founder of GENERAL INFORMATICS LLC. He designs and ships AI and automation systems for businesses and operators across the US.

Related reading

Want this kind of system in your business?

We build practical AI and automation systems for operators. Send us your current workflow and we will show you what to automate first.

Request a Workflow Review