Does AI train on your business data?
Whether an AI vendor trains on your business data is decided by which product tier you use, not by the company. The paid APIs from OpenAI, Anthropic, and Google do not train on your inputs by default, so an automation built on the API is the safe door. The consumer chat apps (ChatGPT Free, Claude Free/Pro/Max, the Gemini app) do train on what you type unless you opt out, which is why the real leak is staff pasting customer data into a personal AI account, not the automation.
Whether an AI vendor trains on your business data is not a property of the company. It is a property of which door you walked through. The paid APIs from OpenAI, Anthropic, and Google do not train on your inputs by default, so any automation built on the API is the safe door. The consumer chat apps (ChatGPT Free, Claude Free, Pro and Max, the Gemini app) do train on what you type unless you opt out. Same logos, opposite defaults. So the honest answer to "does AI train on my data" is a question back: which product are you actually using?
This matters because most operators ask the question about the wrong surface. They worry about the automation, which is almost always built on the API and almost always fine. Meanwhile a salesperson is pasting a customer's full account history into a personal ChatGPT account to draft a reply, and that is the surface that trains.
The answer is set by the tier, not the vendor
Here is where each major product stands as of June 2026. The split is consistent across all three vendors: consumer apps train by default, paid API and enterprise tiers do not.
| Product you actually use | Trains on your data by default? | What happens to the data |
|---|---|---|
| ChatGPT Free / Plus (consumer app) | Yes, unless you turn off "Improve the model for everyone" | Used to improve models until you opt out or delete |
| ChatGPT Team / Enterprise / Business | No | Excluded from training by default |
| OpenAI API | No, since 2023 | Retained ~30 days for abuse monitoring, then deleted; zero data retention available on eligible endpoints |
| Claude Free / Pro / Max (consumer app) | Yes, unless you opt out (Aug 2025 terms) | 5 years if training stays on, 30 days if you opt out |
| Claude for Work, API, Bedrock, Vertex | No | Governed by commercial terms, not used for training |
| Gemini app (consumer) | Yes | Activity used to improve Google products; human review possible |
| Gemini API, unpaid tier (free AI Studio key) | Yes | Used to improve products; human reviewers may read input and output |
| Gemini API paid tier / Vertex AI | No | Retained only for abuse and security |
The pattern is the whole point. If you build on the API and pay for it, your data is not training the next model. If you or your team use the free chat app, it usually is. The vendor name tells you almost nothing. The tier tells you everything.
Your automation runs through the safe door
The good news for anyone shipping AI automations: the thing you are most worried about is the thing least at risk. An n8n, Zapier, or Make workflow that calls an LLM does it through the paid API, with an API key tied to a billed account. That is the no-training door for all three vendors. OpenAI has excluded API data from training since 2023. Anthropic's commercial terms cover the API directly, including access through Amazon Bedrock and Google Vertex AI. Google stops using your data once billing is enabled on the Gemini Developer API.
So when a customer's order details or a support ticket flows through your automation and into a model to be classified or summarized, that content is not becoming training data. It is processed, a response comes back, and the inputs sit in a short-lived retention window before deletion. The automation is the boring, safe case. We build on the API precisely because it is the controllable surface: one billed account, one set of terms, one place to enforce zero data retention if a contract requires it.
If you run AI inside a workflow automation system, the training question is essentially settled the moment you chose the API over a copy-pasted chat. The work that remains is about retention and access, not training.
The leak is the side door your team uses
The real exposure is not the pipeline you designed. It is the personal AI accounts your team opened without telling you. A free ChatGPT login, a personal Claude Pro subscription, the Gemini app on a phone. Every one of those is a consumer surface, and on a consumer surface the default is to learn from what gets typed in.
Anthropic made this concrete in 2025. On August 28, 2025, it changed its consumer terms so that Claude Free, Pro, and Max accounts are used to train new models unless the user opts out, with existing users required to choose by October 8, 2025. Leave the setting on and your chats are retained for five years. Opt out and it drops back to 30 days. This was a real reversal from the prior consumer policy, and most people clicked "accept" on the pop-up without reading it. If a member of your team did that and then pasted a spreadsheet of customer emails into Claude to clean it up, that data is now inside a five-year training window. Not because the technology failed. Because the wrong door was open.
Google's unpaid Gemini tier carries a sharper version of the same risk. On the free AI Studio key, not only is your data used to improve products, but human reviewers may read, annotate, and process the API input and output. Google disconnects it from your account first, but a person can still see the content. That is fine for a throwaway prototype. It is not fine for a real customer record.
This is why a written rule beats a policy debate. Decide which AI accounts are allowed to touch customer data, make those the paid API or the enterprise tier, and tell the team plainly that the free apps are for public information only. The control that works is removing the side door, not trusting everyone to read terms-of-service pop-ups.
What "not used for training" does not mean
No training is not the same as no copy. These are two separate questions and conflating them is how operators talk themselves into a false sense of safety.
The paid APIs keep your data for a window even though they never train on it. OpenAI retains API inputs and outputs for about 30 days for abuse monitoring before deleting them, with zero data retention available on eligible endpoints for customers who need it. That is a reasonable, short, documented window. But it is not zero by default, and "they don't train on it" was never a claim that the data vanishes the instant the response comes back.
The bigger copy is the one you make yourself. Your automation platform's execution history saves the full input and output of every step by default, which means the customer data you sent to the model is sitting in your own logs long after the vendor deleted its copy. We wrote about this in automation logs hold more PII than your database, and it is the retention surface most teams never check. The model forgetting your data does nothing about the verbatim copy in your run history, on a retention clock you set by accident.
So the full picture has three layers, not one. Does the vendor train on it (tier-dependent, and the API says no). How long does the vendor keep it (a short abuse-monitoring window on the API). How long do you keep it (often far longer, in your own logs). Answering only the first question and stopping is the common mistake.
What to do next
Sort your AI usage by door, not by vendor. Anything that touches customer or proprietary data goes through the paid API or an enterprise tier, where the default is no training and you can enforce a retention policy. Anything on a free consumer app is restricted to public information, in writing, so a team member cannot accidentally feed a customer list into a five-year training window.
Three concrete moves this week. First, inventory the AI accounts in use, including the personal ones, and identify which are consumer surfaces. Second, on any consumer account that must exist, turn off the training setting (off by choice in ChatGPT, opted out in Claude) so the default does not work against you. Third, audit your own automation logs, because the longest-lived copy of customer data is usually the one you are keeping, not the one the vendor is.
If you want help drawing that line cleanly, moving real customer data onto the API door, scoping retention, and keeping the free apps away from anything sensitive, that is the kind of work we do in a data intelligence platform build. Tell us what your team is pasting where and we will map which doors are open.
Frequently Asked Questions
SOURCES & CITATIONS
- Updates to Consumer Terms and Privacy Policy — Anthropichttps://www.anthropic.com/news/updates-to-our-consumer-terms
- How your data is used to improve model performance — OpenAIhttps://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance
- Gemini API Additional Terms of Service — Googlehttps://ai.google.dev/gemini-api/terms
- Enterprise privacy at OpenAI — OpenAIhttps://openai.com/enterprise-privacy/
About Alexey Yushkin
Alexey is the founder of GENERAL INFORMATICS LLC. He designs and ships AI and automation systems for businesses and operators across the US.
Related reading
Want this kind of system in your business?
We build practical AI and automation systems for operators. Send us your current workflow and we will show you what to automate first.
Request a Workflow Review