AI AgentsOperationsSmall BusinessContext Windows

Single vs Multi-Agent AI: When More Agents Pay Off

Use multiple AI agents only when a job splits into independent subtasks that can run in parallel, overflows a single context window, or needs isolated tools and permissions. Anthropic's own data shows multi-agent systems use about 15x the tokens of a single chat, so for the sequential, interdependent tasks most small businesses run, one tool-using agent is cheaper, faster to debug, and usually just as good.

Alexey YushkinFounder, GENERAL INFORMATICS2 min read

Use more than one AI agent only when the job genuinely splits: independent subtasks that can run in parallel, work that overflows a single context window, or parts that need different tools and permissions. If it passes none of those, one agent does it better. The reason is cost. Anthropic, which ships some of the most capable agents in production, reports that a single agent already uses about four times the tokens of a normal chat, and a multi-agent system uses about fifteen times. Splitting a job across agents is not a capability upgrade you turn on for free. It is a parallelism bet, and for most of what a small business automates, the bet does not pay.

This is the question you reach after you have already decided you need an agent at all. That earlier call, agent versus a plain workflow with a model inside a step, screens out most projects. The few that survive it, the jobs whose path really is unknowable until they run, then face a second fork that vendors love to push you toward: one agent or a team of them. The honest answer is usually one.

Multi-agent is a parallelism bet, not a capability upgrade

The case for multiple agents is real, but narrow. Anthropic built a multi-agent research system where a lead agent plans an approach and spins up subagents to explore different directions at the same time. On their internal research eval, that setup "outperformed single-agent Claude Opus 4 by 90.2%." That is a big number, and it is the one that gets quoted in every pitch deck. The part that does not get quoted is why it worked and what it cost.

It worked because research is the ideal shape for splitting. The question is large, the directions are independent, and three subagents reading three different sources at once finish faster than one agent reading them in sequence. Anthropic is explicit about the boundary: "multi-agent systems excel at valuable tasks that involve heavy parallelization, information that exceeds single context windows, and interfacing with numerous complex tools." And they are just as explicit about where it falls down. "Some domains that require all agents to share the same context or involve many dependencies between agents are not a good fit for multi-agent systems today. For instance, most coding tasks involve fewer truly parallelizable tasks than research."

So the 90.2% is not evidence that more agents are smarter. It is evidence that a parallelizable task got parallelized. If your task does not parallelize, you do not get the upside, but you still pay the bill.

The three-gate split test

Before you split a job across agents, run it through three gates. If it clears at least one, a multi-agent design earns its cost. If it clears none, build one agent and stop.

Gate one: are the subtasks independent and parallel? Can two parts of the job run at the same time without waiting on each other's output? Researching five competitors, where each lookup is its own thread, passes. A five-step approval chain where step three needs the result of step two fails. If the parts have to happen in order, there is nothing to parallelize, and a second agent just sits idle or duplicates work.

Gate two: does the work overflow one context window? Modern context windows are large, often a million tokens. If a single agent can hold everything it needs to reason about in one window, splitting only adds the overhead of passing context between agents. You split for context when one job legitimately spans more material than a single model can hold at once, like reading hundreds of documents, not because the window feels full.

Gate three: do the parts need isolated tools or permissions? Sometimes the reason to separate agents is not performance but blast radius. An agent that drafts public replies should not also hold your payment credentials. Giving each agent a narrow toolset is a real reason to split, and it overlaps with the question of how much access any single agent should have. This is the one gate where multi-agent is about safety, not speed.

Most business automations clear none of these. They are one chain of dependent steps, comfortably inside one context window, touching one set of tools. That is a single-agent job, full stop.

What a second agent actually costs

The token math is the part that turns an architecture preference into a budget decision. Take Anthropic's two published multipliers at face value: a single agent runs at roughly 4x the tokens of a chat, a multi-agent system at roughly 15x. That makes a multi-agent design about three to four times more expensive than a single agent doing comparable work, because every subagent re-reads the shared context and runs its own chain of model calls, and the orchestrator spends tokens planning and merging on top of that.

Put real numbers on it. Suppose a single tool-using agent handles one task in about 25,000 input and 7,000 output tokens. On Claude Sonnet 4.6, priced at 3 dollars per million input and 15 dollars per million output as of June 2026, that is roughly 18 cents a run. Rebuild the same task as an orchestrator plus three subagents and you are near four times the tokens, call it 72 cents a run. At 1,500 runs a month, that is the difference between about 270 dollars and about 1,080 dollars. You are paying an extra 800 dollars a month, and unless the task actually needed the parallelism, you bought nothing with it. These figures are illustrative, but the ratio is the point: the multiplier is structural, not a tuning detail you can optimize away.

Cost is only half of it. The other half is failure surface. A single agent is one loop, one toolset, one place to look when something goes wrong. An orchestrator with three subagents is four independent model loops plus three handoffs, the orchestrator briefing each subagent, the subagents returning results, and the orchestrator merging them. Each handoff is a spot where a truncated, dropped, or misformatted result can quietly corrupt the final answer without throwing an error. You went from one thing to debug to seven. For anything that touches customers or money, that is the cost that hurts long after the token bill stops surprising you.

Most small-business tasks are the wrong shape

Here is the uncomfortable part for anyone shopping multi-agent platforms. The tasks small businesses actually automate are almost all sequential and interdependent, which is the precise shape Anthropic flags as a poor fit. Each step feeds the next, so there is nothing to run in parallel, and splitting the chain across agents only adds handoffs to a job that was a straight line.

TaskShapeBuild it as
Lead comes in: enrich, score, route to a rep, notify SlackSequential chain, one toolsetSingle agent or workflow
Support triage: read ticket, classify, draft reply, escalate if neededDependent steps, one contextSingle agent
Reconcile invoices against the bank feed, flag mismatchesFixed sequence, one contextWorkflow with a model in a step
Follow-up sequence: check status, pick next message, sendStrictly ordered, no parallelismSingle agent or workflow
Research 8 prospects at once, each an independent lookupIndependent, parallel threadsMulti-agent earns it
Read 400 contracts and summarize obligations across all of themExceeds one context windowMulti-agent earns it
Draft public replies while a separate agent holds billing accessNeeds isolated permissionsMulti-agent for blast radius

The pattern is clean. The rows that justify multiple agents are research-shaped: many independent threads, more material than one window holds, or a hard wall between what each agent is allowed to touch. The rows that do not are the daily operational work of running a business, where the steps are a line, not a fan. That kind of system is most of what we build in custom software and AI platforms: one agent, scoped tightly, doing the judgment-heavy work inside a sequence we can test and replay. We have shipped exactly this pattern in production tools like field2flow.geninfos.com, where the win came from one well-bounded system, not a swarm.

How to decide this week

Take the agentic task you are weighing and run the three gates out loud. Are the subtasks independent enough to run in parallel? Does the work genuinely exceed a single context window? Do different parts need isolated tools or permissions? If you cannot answer yes to at least one, build a single agent, and put the money you saved into testing it well and watching what it does in production. The cost discipline here is the same one behind what an AI automation should cost per month: pay for capability you use, not architecture you were sold.

The instinct to reach for a team of agents comes from the same place as agent-washing in general, the assumption that more moving parts means more intelligence. Usually it means more tokens and more handoffs. Start with one agent that does the whole job, measure it, and split only when a specific gate forces your hand. If you want a second opinion on whether your task is one agent or several, tell us what it does and we will run the gates with you and recommend the build that ships for less.

Frequently Asked Questions

SOURCES & CITATIONS

  1. How we built our multi-agent research system Anthropichttps://www.anthropic.com/engineering/multi-agent-research-system
  2. Building Effective AI Agents Anthropichttps://www.anthropic.com/research/building-effective-agents
  3. Claude API Pricing Anthropichttps://platform.claude.com/docs/en/about-claude/pricing

About Alexey Yushkin

Alexey is the founder of GENERAL INFORMATICS LLC. He designs and ships AI and automation systems for businesses and operators across the US.

Connect on LinkedIn

Related reading

Want this kind of system in your business?

We build practical AI and automation systems for operators. Send us your current workflow and we will show you what to automate first.

Request a Workflow Review