What is the difference between AI and rule-based automation?

A rule is deterministic IF-THEN logic: the same input always produces the same output, and you can read the logic to know what it will do. An AI step interprets rather than matches, which lets it handle messy input and fuzzy categories, but its output is not guaranteed to be identical on repeat runs. Most workflows need both: rules for the steps that have one correct mechanical answer, a model for the steps that require reading or writing language.

Can I unit-test an AI step in an automation?

Not the way you test a rule. A rule has a fixed expected output you can assert against, but a model is non-deterministic, so you cannot assert that a given input yields one exact string. You can test the boundary instead: constrain the model's output to a closed set of values and assert it returns one of them. Steps that generate open text cannot be checked this way and should be gated or logged, not trusted blind.

Should I use an LLM to route or categorize inputs?

Only when the input is unstructured. Routing by a number, a ZIP code, or an exact field match is a lookup, and a lookup is faster and never wrong. Categorizing a free-text email or a support message by topic is a language problem, so a model fits, but constrain its output to a fixed list of categories so the result is validatable and the next step is a plain rule.

Is it cheaper to use rules instead of AI?

A single model call is cheap, often a fraction of a cent, so cost is rarely the deciding factor. The real cost of an unnecessary model step is operational: latency, a result you cannot reproduce, and a step you cannot unit-test or replay cleanly. Use a rule where a rule works because it is more reliable and easier to debug, not because it saves a tenth of a cent.

When should an automation use AI instead of a rule?

Q: When should an automation use AI instead of a rule?

Use AI on a step only when the input is unstructured (free text, an email body, a PDF) and a rule cannot parse it, or when the output is genuinely open (a written reply, a summary) with no fixed list of answers. If the input is structured and the output is a value from a set you control, that step is a rule, even when the rest of the workflow uses AI. The decision is per step, not per workflow.

Decide per step, not per workflow. A workflow is a sequence of steps, and most of those steps have exactly one correct mechanical answer that a rule produces faster and more reliably than any model. Reach for AI on a step only when one of two things is true: the input is unstructured (free text, an email body, a PDF) and a rule cannot parse it, or the output is genuinely open (a written reply, a summary) and no fixed list of answers exists. If the input is structured and the output is a value from a set you control, that step is a rule, even when the rest of the workflow is full of AI.

That framing matters because the usual advice stops one level too high. You will read that rules handle the predictable 80 percent and AI handles the fuzzy 20 percent. Directionally true, and useless when you sit down to build, because you do not ship "20 percent AI." You ship named steps, and each one is either a rule or a model call. The decision lives at the step.

Why "should I use AI or rules" is the wrong question

Asked at the workflow level, the question produces mush. A lead-handling flow is not "an AI workflow" or "a rules workflow." It is: receive the form, look up the ZIP to assign a territory, read the free-text "what do you need" field to guess intent, dedupe against existing contacts, and draft a first-touch email. Five steps, and the right answer is different for each one. Territory assignment is a lookup table. Dedupe is an exact match. Intent from free text is a language problem. The email draft is open generation. Only two of those five steps want a model.

When you decide at the workflow level instead, you get one of two predictable failures. You wrap a model around the whole thing and end up paying latency and unpredictability for steps that were a lookup table with worse manners. Or you swear off AI, then hand-code regex to pull a name and address out of a free-text paragraph and spend the next month patching edge cases that a model would have read correctly on the first try. Both come from answering the question at the wrong altitude.

The two axes: is the input structured, is the output fixed

Two questions decide every step. First: is the input structured? A number, a date, a ZIP code, a dropdown value, a database field, all structured. An email body, a chat message, a scanned invoice, a voice transcript, all unstructured. Second: is the output a fixed set you control? "Approver A or approver B," "urgent, normal, or spam," "match or no match," all fixed. A sentence written for a specific customer, a summary of a call, all open.

Those two answers land the step in one of four boxes.

	Output is a fixed set you control	Output is open-ended
Input is structured	Rule: IF/THEN, lookup, regex	Template, or a model only if the text is truly bespoke
Input is unstructured	Model: extract or classify into the set, then a rule acts	Model: generate, then gate and log

The top-left box is most of your workflow and none of your AI budget. The bottom-left is the box people miss: the input is messy, but the answer is a value from a list you already defined, so the right move is to let a model read the mess and pick from your list, not to let it improvise. The top-right is a quiet trap. Structured input plus open output usually means you are filling a template (Hi {name}, your {product} ships {date}), and a template is a rule, not a model job. Only the bottom-right, unstructured in and open out, is a step the model genuinely owns: summarizing, drafting, rewriting. That is also the one box you cannot fully trust, for the reason that comes next.

The consequence the rules-vs-AI posts skip: a model step can't be replayed

Here is what the 80/20 articles never tell you, and it is the reason the per-step decision matters. A model step is non-deterministic. Run it twice with the identical input and you can get two different outputs, even with the sampling temperature set to zero. OpenAI describes its API as only "mostly deterministic." Anthropic notes that temperature zero is not fully deterministic. A 2025 analysis from Thinking Machines Lab traced the real cause to how inference servers batch concurrent requests, which is to say it happens on the provider's side and you cannot switch it off from yours.

That has two hard operational consequences. First, you cannot unit-test a model step the way you test a rule. A rule has a fixed expected output, so you write "input X should produce output Y" and assert it. A model has no fixed Y. The best you can do is constrain the output and assert it falls inside a known set, which only works if you designed the step that way. Second, a model step breaks clean replay. When a run fails halfway and you re-run it, every rule step reproduces exactly what it did before, but every model step may now decide differently, so you cannot reason about a retried execution the way you reason about when an automation should retry a failed step. The same goes for your audit trail: the run record from what to log in every automation captures what the model said this time, not what it would say if you asked again. A rule's behavior is in its source. A model's behavior is only in the log.

This is why the default is a rule and the model has to earn its slot. Not because models are bad, but because every model step you add is a step you can no longer fully test or replay. You want as few of those as the job allows.

Push the model early and narrow: extract and classify, don't decide

When a step does need a model, make it do the smallest possible job: turn unstructured input into structured data, then hand the actual decision to a rule. Read the email and label it; do not read the email and route it. The labeling is the language problem the model is good at. The routing is a lookup the model has no business owning, because a lookup is testable and a model is not.

The key move that makes this safe is constraining the output. OpenAI's Structured Outputs feature uses constrained decoding so the model cannot return a value outside the schema you define: if a field can only be "approved" or "rejected," the model cannot emit "maybe." OpenAI reports 100 percent schema adherence in its evals with this enabled, against the unreliable hand-parsed JSON you get from just asking the model to "respond in JSON." Anthropic's tool use and most serious model APIs offer the same kind of constraint. Once the model can only return one of N values you chose, the step is testable at the boundary (you can assert the output is one of the N) and the next step is a plain rule reading a clean field.

Here is the per-step decision applied to eight steps you will actually find in a small-business workflow.

Step	Input	Output	Right mechanism
Route an invoice over $10k to a senior approver	amount (number)	which approver (fixed)	Rule
Assign a lead to a sales territory by ZIP	ZIP (structured)	territory (fixed list)	Rule, lookup table
Dedupe a contact by email	email (structured)	match or no match	Rule, exact compare
Tag an inbound email as invoice, complaint, or inquiry	email body (unstructured)	one of three labels	Model: classify into a closed enum, then a rule routes
Pull the PO number and total from a vendor PDF	PDF (unstructured)	structured fields	Model: extract, then a rule validates the format
Decide whether a refund meets policy	order data (structured)	approve or deny	Rule if the policy is codable; otherwise gate to a human
Draft a personalized reply to a customer	the thread (unstructured)	written text (open)	Model: generate, then gate or log
Summarize a sales call transcript	transcript (unstructured)	summary (open)	Model: generate

Notice the refund row. When the policy is "refund any order under $50 within 30 days," that is a rule, full stop. When it depends on judgment a model cannot be trusted to make alone, it is not an AI step either; it is a human-approval step. "I cannot write this as a rule" does not automatically mean "let the model decide." Sometimes it means a person decides.

What to do next

Open your most important workflow and write out every step as one line. For each line, answer the two questions: is the input structured, and is the output a fixed set you control. Two structured answers means it is a rule, so if a model is doing that step today, replace it and remove a thing you cannot test. An unstructured input with a fixed output means a model should classify or extract, with its output constrained to your list, and a rule should act on the result. Only the open-output, unstructured-input steps belong fully to the model, and those are the ones to gate or log, not trust blind.

The systems that hold up in production are mostly rules with a few narrow model calls placed exactly where language enters or leaves. We build every workflow automation system this way, and the same discipline governs the AI assistants and agents we ship: the model reads and writes the language, and deterministic logic makes every decision it can. If you are not sure which steps in your flow actually need a model, send us the flow and we will mark each step rule, classify, extract, or generate.

When should an automation use AI instead of a rule?

Why "should I use AI or rules" is the wrong question

The two axes: is the input structured, is the output fixed

The consequence the rules-vs-AI posts skip: a model step can't be replayed

Push the model early and narrow: extract and classify, don't decide

What to do next

Frequently Asked Questions

SOURCES & CITATIONS

About Alexey Yushkin

Related reading

When should an automation require human approval?

Rolling back a broken automation isn't recovery

Webhook or polling trigger: which should you use?

Want this kind of system in your business?