Does fine-tuning teach an AI model new facts about my business?

Mostly no. Fine-tuning teaches a model how to format and behave, not what is true. Research from Gekhman and co-authors (EMNLP 2024) found that models learn fine-tuning examples carrying genuinely new facts much more slowly than examples that match what they already know, and that as those new facts finally stick, the model's tendency to hallucinate rises. If the problem is that the model does not know your current prices or policies, use retrieval or a longer prompt, not fine-tuning.

Is OpenAI fine-tuning being shut down?

OpenAI is winding down self-serve fine-tuning on a phased schedule published on its deprecations page. As of May 7, 2026, only organizations that had already run fine-tuning can create new jobs; from July 2, 2026, that narrows to organizations with recent inference activity on a fine-tuned model; and on January 6, 2027, even active customers can no longer create new fine-tuning jobs. Inference on models you already fine-tuned keeps working until the underlying base model is deprecated. Google Vertex AI and Amazon Bedrock still offer fine-tuning for some models.

When should a small business actually fine-tune a model?

Rarely, and only after prompting has already proven the behavior works. The surviving case is a narrow, repetitive task at high volume, like classifying thousands of tickets a day, where you want to copy a behavior a large model performs well into a smaller, cheaper, faster model to cut cost and latency. If you run a few thousand jobs a month, that math almost never works out in favor of fine-tuning.

Fine-tuning versus RAG versus prompting, which is cheapest?

Prompting is the cheapest to start and the cheapest to change. Retrieval (RAG) or a long-context prompt is the right way to give a model knowledge it does not have, and it stays current because you update the source, not the model. Fine-tuning has the highest upfront cost (a curated dataset, a training run) and the highest ongoing cost (re-training when behavior drifts, and a custom model welded to a base model the vendor will eventually retire).

Can I fine-tune Claude or Gemini instead of GPT?

Yes, with limits. Google Vertex AI supports supervised fine-tuning for Gemini 2.5 models (Pro, Flash, and Flash-Lite as of mid-2026), and Amazon Bedrock has offered fine-tuning for Claude 3 Haiku since November 2024. Both add a training charge plus an hourly hosting fee for the tuned model on top of normal inference. The newest frontier models are generally not yet available for self-serve fine-tuning.

Do you need to fine-tune an AI model, or just prompt it?

Fine-tuning changes how a model writes, not what it knows. If your goal is for the AI to answer from your prices, policies, or product docs, fine-tuning is the wrong tool, because it teaches form and behavior, not facts, and pushing new facts in through training actually makes a model hallucinate more. Most "we should train our own AI on our data" projects are knowledge problems, and a retrieval step or a longer prompt solves them better and cheaper. In 2026 the case shrank again from the supply side: OpenAI is winding down self-serve fine-tuning, which leaves it a narrow tool for one job, copying a behavior you have already proven into a smaller, cheaper model at high volume.

This matters because "train it on our data" is the single most common way a small AI project starts in the wrong place. The phrase sounds like fine-tuning. It almost never is.

Fine-tuning teaches form, not facts

There are three ways to shape what an AI does, and they are not interchangeable. Prompting gives instructions and examples in the request itself. Retrieval, often called RAG, fetches the relevant facts at query time and puts them in the prompt. Fine-tuning continues training the model's own weights on a set of your examples so it absorbs a pattern of behavior.

The distinction that gets lost is what each one moves. Prompting and retrieval change the information in front of the model. Fine-tuning changes the model's instincts: the shape of its answers, its tone, its default format, how it handles a recurring kind of input. It is good at "always reply in this exact JSON structure" or "classify every message into one of these six buckets the way our senior rep would." It is bad at "know that our return window changed to 45 days last week."

This is not a style opinion. A 2024 study by Gekhman and co-authors, presented at EMNLP, tested what happens when you fine-tune a model on facts it did not already know. Two findings stand out. The model learns those new-fact examples much more slowly than examples that agree with what it already knew. And as the new facts finally take hold, the model's rate of hallucination rises in step. The authors' read is that models mostly acquire facts during pre-training, and fine-tuning teaches them to use what they have more efficiently. Translated for an operator: you cannot reliably teach a model your data by fine-tuning, and the harder you try, the more confidently it will make things up.

What you actually want, mapped to the right tool

Almost every fine-tuning request we get is really one of these wishes wearing the wrong label. Here is the version we keep on hand.

What you want the AI to do	The real example	The right tool
Answer in the same fixed structure every time	"Return name, email, and intent as JSON"	Structured outputs / schema-constrained prompt
Know your current prices, policies, or docs	"Answer from our handbook and quote the right number"	Retrieval or a long-context prompt, kept current
Sound like our brand and follow our tone	"Write replies the way our team writes them"	System prompt plus a few good examples
Follow a long, multi-rule procedure consistently	"Apply our 14-step intake SOP without skipping"	The SOP in the prompt, plus a checking step
Do one narrow task at very high volume, cheaply	"Classify 8,000 tickets a day at low cost"	Fine-tune a small model (the surviving case)
Choose the next step based on the last result	"Decide what to do depending on what it found"	That is an agent design question, not training

The first four rows cover the overwhelming majority of small-business AI work, and none of them need fine-tuning. The format and tone wishes are solved by deciding per step what the model should and should not own and writing a tight prompt. The knowledge wishes are solved by retrieval. Only the fifth row is a genuine fine-tuning case, and it has conditions.

Why "train it on our data" is almost always a knowledge problem

When a client says "train the AI on our data," picture what the data usually is: a handbook, a price list, a policy set, a folder of past support answers. That is reference material the model should read, not behavior it should absorb. The moment you treat reference material as training data, you walk into two failure modes we have watched break real builds.

The first is the stale snapshot. Fine-tuning bakes the data in at the moment you train. Your prices change, a policy gets revised, a product is renamed, and the model keeps answering from the version it was trained on. Nothing errors out to warn you. A retrieval step reads the live document, so when you update the source the answers update with it. A fine-tuned model has to be retrained to learn that the return window moved, and most teams never do.

The second is the confident hallucination, which is the Gekhman finding showing up in production. You feed the model a few hundred question-and-answer pairs hoping it learns the answers. What it actually learns is the shape of a plausible answer in your domain. Ask something just outside the training set and it fills the blank with a response that looks exactly right and is wrong, delivered with the same confidence as a real one. That is worse than a model that says it does not know, because nobody catches it.

If the underlying need is "answer from our documents," the build decision is retrieval versus a long-context prompt, not fine-tuning at all. We wrote the whole decision out separately in whether you need a vector database for AI on your docs. Fine-tuning is not even on that ballot.

The 2026 shift: OpenAI is winding down self-serve fine-tuning

The market just made this argument for us. OpenAI, which for years was the default place a small team would fine-tune a model, is phasing out self-serve fine-tuning. Its deprecations page lays out the schedule. As of May 7, 2026, only organizations that had already run a fine-tuning job can create new ones. From July 2, 2026, that narrows again to organizations with inference activity on a fine-tuned model in the prior 60 days. On January 6, 2027, even active customers can no longer create new fine-tuning jobs. Inference on models you already fine-tuned keeps working, but only until the underlying base model is itself deprecated.

Read that last clause twice, because it names the failure mode operators never price in: the deprecation treadmill. A fine-tuned model is welded to one specific base model. When the vendor retires that base, your custom model goes with it, and you are back to retraining on whatever replaces it. You do not control that clock. The companies that fine-tuned on older GPT snapshots are living through exactly this now.

Fine-tuning has not vanished. It has moved to the cloud platforms and narrowed to specific models. Google Vertex AI offers supervised fine-tuning for Gemini 2.5 models (Pro, Flash, and Flash-Lite as of mid-2026, with the newest 3.x line not yet supported). Amazon Bedrock has offered fine-tuning for Claude 3 Haiku since November 2024. Both charge for the training run and then add an hourly hosting fee to keep your tuned model available, on top of normal per-token inference. The frontier models you would most want to fine-tune are generally the ones you cannot fine-tune yet.

The one case where fine-tuning still earns its keep

Here is the surviving case, stated precisely so you can check whether you are in it. You have a narrow, repetitive task. A large model already does it well when you prompt it. You run it at high enough volume that the per-call cost or the latency of the large model genuinely hurts. So you fine-tune a small, cheap model to copy the large model's behavior on that one task, and you swap it in to save money and time.

Notice the order. You prove the behavior with prompting first, on the best model you have, and only then consider distilling it down. Fine-tuning is the optimization at the end, not the starting move. If you have not already got the task working with a prompt, fine-tuning will not rescue it, it will just bake your current mistakes into the weights.

And notice the volume bar. Distillation pays off when you are running tens of thousands of calls a day and the gap between a frontier model and a small one is real money. At a few thousand jobs a month, a Haiku-class or Flash-class model with a good prompt is already cheap enough that the training cost, the dataset curation, the hosting fee, and the retraining-on-deprecation tax never earn back. The dataset alone is a project: you need hundreds of clean, correctly-labeled examples, and you have to maintain that set as the task drifts. Most operators who think they want fine-tuning want a better prompt and a cheaper base model, which they can have today with no training at all.

How to decide this week

Run three checks before anyone writes a training script. First, name what you are actually trying to change: the model's format and behavior, or the facts it has access to. If it is facts, stop, you want retrieval or a longer prompt. Second, if it is genuinely behavior, try to get it with a system prompt and a handful of examples, because most format and tone goals fall out of a good prompt in an afternoon. Third, only if a prompt provably cannot hold the behavior at the volume and cost you need, scope a fine-tune, and scope it as distilling a proven behavior into a smaller model, not as teaching the model your business.

The honest answer for most small and mid-sized operators in 2026 is that you will not fine-tune anything, and you will ship faster and maintain less for it. When we build custom AI software for a client, fine-tuning is the rare exception we reach for after prompting and retrieval are exhausted, not the headline. If you are staring at a "train it on our data" request and are not sure which of the three tools it really calls for, tell us what you are building and we will point you at the simplest version that works.

Do you need to fine-tune an AI model, or just prompt it?

Fine-tuning teaches form, not facts

What you actually want, mapped to the right tool

Why "train it on our data" is almost always a knowledge problem

The 2026 shift: OpenAI is winding down self-serve fine-tuning

The one case where fine-tuning still earns its keep

How to decide this week

Frequently Asked Questions

SOURCES & CITATIONS

About Alexey Yushkin

Related reading

Does AI train on your business data?

Does Your AI Agent Need Memory or Just a Database?

Do you need a vector database for AI on your docs?

Want this kind of system in your business?