Workflow AutomationAIn8nTool Use

Why Your AI Automation Returns Broken JSON

Broken JSON from an AI step in an automation is fixed by the provider's structured-output mode, which uses constrained decoding to guarantee your exact schema, not by a better prompt. Schema-valid output is still not guaranteed correct, so the automation must validate the values and route failures to human review.

Alexey YushkinFounder, GENERAL INFORMATICS2 min read

When an AI step in your automation returns JSON the next node cannot parse, the fix is not a better prompt. It is the model provider's structured-output mode, which constrains the model's tokens during generation so the response always matches the exact schema you defined. OpenAI has offered this since August 2024, and Anthropic made it generally available in late 2025. Turn it on and the parse failures stop. What it does not fix is the values inside that JSON, and that second failure mode is the one that actually costs you money.

This is the part most guides skip. They show you the API parameter, the parse error disappears, and they call it done. In a production automation, getting valid structure is the easy half. The hard half is knowing that valid structure and correct content are two different guarantees, and only one of them comes from the API.

Why "return only JSON" prompts keep breaking

Telling the model to "respond with only valid JSON, no other text" works in a demo and fails on a schedule. The model wraps the output in a markdown code fence. It adds a polite "Here is the JSON you requested" preamble. It emits a trailing comma, uses smart quotes, or hits the token limit and truncates mid-object. Any one of those turns your downstream parse into an error, and the automation either dies or, worse, swallows the error and moves on with empty data.

Regex extraction to pull the JSON out of the prose is the usual patch. It is also the thing that breaks silently three weeks later when the model phrases its preamble differently. You are debugging string cleanup instead of running a workflow. There is a better tier, and it has been available for a while.

The three reliability tiers, and why you want the top one

There are three ways to get JSON out of a model. They are not equal, and the gap between them is exactly where automations break.

ApproachWhat it guaranteesWhat still breaks
Prompt and parse ("return only JSON")NothingCode fences, preamble text, trailing commas, truncation, wrong keys
JSON mode (json_object)Output is syntactically valid JSONYour keys, types, and required fields. Valid JSON can still be the wrong shape
Structured outputs (json_schema, constrained decoding)The exact schema you defined: keys, types, required fields, enumsThe correctness of the values

The middle row trips up the most people, because "JSON mode" sounds like the answer. It only promises that the string parses. The model can hand you {"status": "ok", "data": null} when your next node expected {"customer_email": "...", "amount": 123.45}. That is valid JSON and a broken workflow at the same time.

The top row is what you want for anything that runs unattended. Use it as the default for every AI step that feeds a structured destination.

What structured outputs actually do

Structured outputs work by compiling your JSON schema into a grammar and then restricting which tokens the model is allowed to produce at each step of generation. The model literally cannot emit a token that would violate the schema. This is called constrained decoding, and it is a property of the inference engine, not a request you are hoping the model honors.

The reliability difference is large and measured. In OpenAI's own evaluation of complex schema following, gpt-4o-2024-08-06 with structured outputs scored a perfect 100 percent, while gpt-4-0613 using prompting alone scored under 40 percent. Anthropic ships the same capability, generally available across Claude Sonnet 4.5 and 4.6, Opus 4.5 through 4.8, and Haiku 4.5, set through the output_config.format field and also available on Amazon Bedrock and Google Vertex AI. Both vendors use grammar-constrained decoding, so the schema is enforced rather than suggested.

If you are wiring an AI node into a workflow automation system, this is table stakes in 2026. The structure problem is solved. The next problem is not.

Schema-valid is not the same as correct

Here is the point the API documentation will never make for you: constrained decoding guarantees the shape of the answer, not the truth of it. A schema that requires a number field will always get a number. It will not always get the right number.

A field extraction step pulling an invoice total can return a clean, well-typed {"invoice_total": 0.00} when it failed to find the figure. A category classifier constrained to a fixed enum will always return one of your valid categories, including when none of them fit and it picks the closest wrong one. The JSON parses every time. The data is wrong some of the time. If your automation treats a successful parse as a successful answer, you have built a machine that creates clean, confident, incorrect records.

This is the failure that survives the fix. Operators turn on structured outputs, watch the parse errors vanish, and assume the reliability problem is closed. It moved. It did not close.

The pattern that holds in production

Three layers, in order, every time an AI step feeds a structured destination.

First, use structured outputs so the shape is guaranteed. This removes the parsing failure mode entirely and is not optional.

Second, validate the values before you trust them. Required fields should be non-empty, not just present. Numbers should fall in a sane range. Dates should be real dates, not the year 0001. Add a confidence or found field to the schema and check it. This validation lives in a step after the AI node, and it is plain logic, not another model call.

Third, branch on failure. When validation fails, the workflow routes to a human review queue or a holding state. It does not pass the record downstream. This is the difference between an automation that flags the one bad invoice out of forty and one that quietly files all forty, including the wrong one.

A concrete version: an AI step reads an inbound invoice email and a "create record" step writes it to your system. With structured outputs, the record always arrives in the right shape. Your validation step checks that invoice_total is greater than zero and vendor_name is non-empty. If either check fails, the item goes to a review queue instead of becoming a row. We run typed extraction like this in our own permit data pipeline at ma-permits.geninfos.com, where a malformed field is caught and held rather than written into a record people rely on. Deciding which failures route to a person is its own design question, covered in when to require human approval.

How to wire it without new failure modes

Keep the schema small and flat. Deeply nested schemas are slower to compile, harder to validate, and give the model more room to fill a field it should have left blank. Ask for the five fields you actually use, not the twenty you might.

Give the model a way to say "I did not find this." Because a required field plus constrained decoding forces the model to emit some value, a required string with no escape hatch invites fabrication. Add an explicit null option or an "unknown" enum value so declining is a legal move. This single change removes a large share of the confident-but-wrong outputs, because the model no longer has to invent a value to satisfy the structure.

Log the raw model output alongside the validated result. When a record looks wrong a week later, you want to see exactly what the model returned and which validation rule did or did not catch it. The schema tells you the shape; the log tells you what happened. Before any of this runs unattended, put it through a real values check, which is the whole point of testing AI automation before production. And if the extraction logic is doing real work, it is usually worth building as a proper step rather than a chain of patches, which is where a custom software platform earns its place.

What to do next

Open the AI step that has been returning broken JSON and switch it to structured outputs with a defined schema. That alone ends the parse errors. Then add the two layers that the API will not give you: a validation step that checks the values, and a failure branch that routes bad output to a person instead of downstream. Keep the schema flat, add an explicit "unknown" option, and log the raw output.

If you have an automation that is producing clean JSON full of wrong values and you want a second set of eyes on the validation and fallback design, tell us what it is feeding and we will walk through where it should branch.

Frequently Asked Questions

SOURCES & CITATIONS

  1. Introducing Structured Outputs in the API OpenAIhttps://openai.com/index/introducing-structured-outputs-in-the-api/
  2. Structured model outputs guide OpenAIhttps://developers.openai.com/api/docs/guides/structured-outputs
  3. Structured outputs Anthropichttps://platform.claude.com/docs/en/build-with-claude/structured-outputs

About Alexey Yushkin

Alexey is the founder of GENERAL INFORMATICS LLC. He designs and ships AI and automation systems for businesses and operators across the US.

Connect on LinkedIn

Related reading

Want this kind of system in your business?

We build practical AI and automation systems for operators. Send us your current workflow and we will show you what to automate first.

Request a Workflow Review