Does the OpenAI or Anthropic API train on my data?

No, not by default. OpenAI has excluded API and business data from model training since 2023, and Anthropic's commercial terms (the API, Claude for Work, and API access through Amazon Bedrock or Google Vertex AI) do not use your inputs or outputs for training. You have to actively opt in to share API data. An automation built on the paid API is the safe door.

Does ChatGPT or Claude train on what I type into the app?

Usually yes, unless you opt out. ChatGPT Free and Plus train on your conversations unless you turn off 'Improve the model for everyone' in settings. As of the August 2025 terms update, Claude Free, Pro, and Max train on your chats and coding sessions unless you opt out. The consumer Gemini app uses your activity to improve Google products as well. These are different products from the paid APIs, with different defaults.

Did Anthropic change its policy on training in 2025?

Yes. On August 28, 2025, Anthropic updated its consumer terms so that Claude Free, Pro, and Max accounts are used to train new models unless the user opts out. Existing users had to make a selection by October 8, 2025. If training is left on, retention extends to five years; if you opt out, it stays at 30 days. The change does not touch Claude for Work, the API, Claude for Government, or Claude for Education.

Is the free Gemini API safe for customer data?

Treat the unpaid Gemini API tier (the free Google AI Studio key) as a consumer surface, not a business one. On the unpaid tier Google uses your prompts and responses to improve its products, and human reviewers may read API input and output. Once you enable billing on the Gemini Developer API, or use Vertex AI, Google no longer uses your data for training and retains it only for abuse and security.

Does 'not used for training' mean my data is deleted right away?

No. Training and retention are two different questions. The paid APIs do not train on your data, but they still keep a copy for a window (OpenAI retains API data for about 30 days for abuse monitoring, with zero-data-retention available on eligible endpoints). And your own automation logs keep the full payload of every step long after the model forgets it. No training does not mean no copy.

Does AI train on your business data?

Whether an AI vendor trains on your business data is not a property of the company. It is a property of which door you walked through. The paid APIs from OpenAI, Anthropic, and Google do not train on your inputs by default, so any automation built on the API is the safe door. The consumer chat apps (ChatGPT Free, Claude Free, Pro and Max, the Gemini app) do train on what you type unless you opt out. Same logos, opposite defaults. So the honest answer to "does AI train on my data" is a question back: which product are you actually using?

This matters because most operators ask the question about the wrong surface. They worry about the automation, which is almost always built on the API and almost always fine. Meanwhile a salesperson is pasting a customer's full account history into a personal ChatGPT account to draft a reply, and that is the surface that trains.

The answer is set by the tier, not the vendor

Here is where each major product stands as of June 2026. The split is consistent across all three vendors: consumer apps train by default, paid API and enterprise tiers do not.

Product you actually use	Trains on your data by default?	What happens to the data
ChatGPT Free / Plus (consumer app)	Yes, unless you turn off "Improve the model for everyone"	Used to improve models until you opt out or delete
ChatGPT Team / Enterprise / Business	No	Excluded from training by default
OpenAI API	No, since 2023	Retained ~30 days for abuse monitoring, then deleted; zero data retention available on eligible endpoints
Claude Free / Pro / Max (consumer app)	Yes, unless you opt out (Aug 2025 terms)	5 years if training stays on, 30 days if you opt out
Claude for Work, API, Bedrock, Vertex	No	Governed by commercial terms, not used for training
Gemini app (consumer)	Yes	Activity used to improve Google products; human review possible
Gemini API, unpaid tier (free AI Studio key)	Yes	Used to improve products; human reviewers may read input and output
Gemini API paid tier / Vertex AI	No	Retained only for abuse and security

The pattern is the whole point. If you build on the API and pay for it, your data is not training the next model. If you or your team use the free chat app, it usually is. The vendor name tells you almost nothing. The tier tells you everything.

Your automation runs through the safe door

The good news for anyone shipping AI automations: the thing you are most worried about is the thing least at risk. An n8n, Zapier, or Make workflow that calls an LLM does it through the paid API, with an API key tied to a billed account. That is the no-training door for all three vendors. OpenAI has excluded API data from training since 2023. Anthropic's commercial terms cover the API directly, including access through Amazon Bedrock and Google Vertex AI. Google stops using your data once billing is enabled on the Gemini Developer API.

So when a customer's order details or a support ticket flows through your automation and into a model to be classified or summarized, that content is not becoming training data. It is processed, a response comes back, and the inputs sit in a short-lived retention window before deletion. The automation is the boring, safe case. We build on the API precisely because it is the controllable surface: one billed account, one set of terms, one place to enforce zero data retention if a contract requires it.

If you run AI inside a workflow automation system, the training question is essentially settled the moment you chose the API over a copy-pasted chat. The work that remains is about retention and access, not training.

The leak is the side door your team uses

The real exposure is not the pipeline you designed. It is the personal AI accounts your team opened without telling you. A free ChatGPT login, a personal Claude Pro subscription, the Gemini app on a phone. Every one of those is a consumer surface, and on a consumer surface the default is to learn from what gets typed in.

Anthropic made this concrete in 2025. On August 28, 2025, it changed its consumer terms so that Claude Free, Pro, and Max accounts are used to train new models unless the user opts out, with existing users required to choose by October 8, 2025. Leave the setting on and your chats are retained for five years. Opt out and it drops back to 30 days. This was a real reversal from the prior consumer policy, and most people clicked "accept" on the pop-up without reading it. If a member of your team did that and then pasted a spreadsheet of customer emails into Claude to clean it up, that data is now inside a five-year training window. Not because the technology failed. Because the wrong door was open.

Google's unpaid Gemini tier carries a sharper version of the same risk. On the free AI Studio key, not only is your data used to improve products, but human reviewers may read, annotate, and process the API input and output. Google disconnects it from your account first, but a person can still see the content. That is fine for a throwaway prototype. It is not fine for a real customer record.

This is why a written rule beats a policy debate. Decide which AI accounts are allowed to touch customer data, make those the paid API or the enterprise tier, and tell the team plainly that the free apps are for public information only. The control that works is removing the side door, not trusting everyone to read terms-of-service pop-ups.

What "not used for training" does not mean

No training is not the same as no copy. These are two separate questions and conflating them is how operators talk themselves into a false sense of safety.

The paid APIs keep your data for a window even though they never train on it. OpenAI retains API inputs and outputs for about 30 days for abuse monitoring before deleting them, with zero data retention available on eligible endpoints for customers who need it. That is a reasonable, short, documented window. But it is not zero by default, and "they don't train on it" was never a claim that the data vanishes the instant the response comes back.

The bigger copy is the one you make yourself. Your automation platform's execution history saves the full input and output of every step by default, which means the customer data you sent to the model is sitting in your own logs long after the vendor deleted its copy. We wrote about this in automation logs hold more PII than your database, and it is the retention surface most teams never check. The model forgetting your data does nothing about the verbatim copy in your run history, on a retention clock you set by accident.

So the full picture has three layers, not one. Does the vendor train on it (tier-dependent, and the API says no). How long does the vendor keep it (a short abuse-monitoring window on the API). How long do you keep it (often far longer, in your own logs). Answering only the first question and stopping is the common mistake.

What to do next

Sort your AI usage by door, not by vendor. Anything that touches customer or proprietary data goes through the paid API or an enterprise tier, where the default is no training and you can enforce a retention policy. Anything on a free consumer app is restricted to public information, in writing, so a team member cannot accidentally feed a customer list into a five-year training window.

Three concrete moves this week. First, inventory the AI accounts in use, including the personal ones, and identify which are consumer surfaces. Second, on any consumer account that must exist, turn off the training setting (off by choice in ChatGPT, opted out in Claude) so the default does not work against you. Third, audit your own automation logs, because the longest-lived copy of customer data is usually the one you are keeping, not the one the vendor is.

If you want help drawing that line cleanly, moving real customer data onto the API door, scoping retention, and keeping the free apps away from anything sensitive, that is the kind of work we do in a data intelligence platform build. Tell us what your team is pasting where and we will map which doors are open.

Does AI train on your business data?

The answer is set by the tier, not the vendor

Your automation runs through the safe door

The leak is the side door your team uses

What "not used for training" does not mean

What to do next

Frequently Asked Questions

SOURCES & CITATIONS

About Alexey Yushkin

Related reading

Do you need to fine-tune an AI model, or just prompt it?

How to test an AI automation before you trust it

The 5 small-business workflows worth automating before anything else

Want this kind of system in your business?