Do AI models remember previous conversations?

No. A language model is stateless: each API call sees only the text you send in that request and nothing from any earlier call. Any sense of memory comes from infrastructure you build around it, either resending the conversation each turn, storing it server-side, or writing facts to a store the agent can query later. OpenAI's Responses API, for example, is stateless by default and keeps stored responses for 30 days unless you turn that off.

Does my AI agent need a vector database for memory?

Usually only for one slice of it. Structured facts like a customer's plan tier, last order, or open ticket belong in a database row keyed by customer ID and retrieved with an exact lookup. A vector store is justified only for unstructured recall, past interactions in the customer's own words with no clean field to key on, and even then you should scope the search to that one customer.

What is the difference between agent memory and RAG?

RAG retrieves business knowledge such as your docs, policies, and FAQs to help answer a question. Memory is about retaining context across turns or across runs: what was said earlier in this session, or what happened last time with this customer. They overlap in mechanism because both can use a vector store, but they solve different problems, and conflating them is a common reason agents pull the wrong thing back.

What does Anthropic's memory tool do?

Launched September 29, 2025, it lets Claude create, read, update, and delete files in a memory directory you host, so information persists across conversations. It is paired with context editing that automatically drops stale tool results as the agent nears its token limit. Anthropic reported a 39 percent gain on an internal agentic-search evaluation and an 84 percent token reduction over a 100-turn task. It is a working scratchpad and context manager, not a replacement for your system of record.

Where should customer data for an AI agent live?

In your existing system of record, your CRM or application database, with the agent given a tool that looks records up by key. That keeps retrieval exact and auditable and keeps one source of truth. Copying customer data into a separate memory store creates a second, fuzzier copy that drifts out of sync and complicates deletion requests.

Does Your AI Agent Need Memory or Just a Database?

Before you give an AI agent a memory, figure out which of three different problems you are actually solving, because two of them are not memory at all. What teams lump together as "agent memory" splits cleanly into conversation state (what was said earlier in this session), business knowledge (facts about your company the agent should know), and cross-run recall (what happened last time with this customer). The largest of the three is usually plain structured data, and the right place for it is a database row keyed by customer ID, retrieved with an exact lookup, not a semantic memory store. Reach for a vector-backed memory by default and you build a system that fuzzily forgets the exact facts it should have looked up.

Almost every "our agent needs memory" conversation we get is really one of those three things wearing the same coat. Sorting them apart is the difference between a build that returns the right answer every time and one that occasionally blends two customers together.

Why an AI model forgets between calls

A language model has no memory of its own. Each call to it is independent: it processes the tokens you send in that one request and nothing carries over from the last call, not from yesterday and not from the message a second ago. There is no built-in spool of history. Statefulness is something you bolt on around the model and feed back in.

OpenAI's Responses API is explicit about this. It is stateless by default, so on each turn you either resend the conversation yourself or pass a previous_response_id and let the platform stitch the prior turns back in. Stored responses are kept for 30 days unless you set store to false. The point is that "memory" is never something the model possesses. It is something you assemble and hand it. So the only real question is what to assemble and from where, and that is where the three buckets matter.

The three things operators call memory

Pull "memory" apart and it is always one of these three, each with a different cheapest answer.

Conversation state. Within a single chat or a single task run, the agent needs to know what was already said. This is the easiest, and it is not really memory, it is state. You keep the transcript and pass it back on the next turn, or you let the platform hold it for you with a thread object or a response ID. It is exact, it is cheap, and it expires when the session ends. Nobody needs a vector database for this.

Business knowledge. Facts about your company the agent should always have on hand: your products, your prices, your return policy, the answers to your top fifty support questions. This is knowledge, not memory, and the full decision for it lives in do you need a vector database. The short version: if the corpus fits in a long-context window and changes rarely, a cached prompt beats RAG, and a vector store earns its place only past real size, freshness, and access-control thresholds.

Cross-run recall. What happened last time, across sessions and across days: this customer's plan tier, their last three orders, the ticket they opened in March. This is the bucket people mean when they say the agent "needs memory," and it is where the expensive mistake hides. Most of what lives here is structured, and structured data has a home that is not a memory product.

Where each kind of memory actually belongs

Here is how the things an agent might "remember" sort by what they really are and where they should live.

What you want it to remember	What it really is	Right store	Why
What the user said earlier in this same chat	Conversation state	Passed-back transcript, or the platform's thread / response ID	Exact, and it expires with the session
Your products, prices, policies, FAQs	Business knowledge	A cached prompt if it fits the window, a vector store only if it does not	Knowledge, not memory; run the vector-database test first
This customer's email, plan, last order, open ticket	Structured cross-run facts	A row in your CRM or app database, keyed by customer ID	You can look it up exactly, with no retrieval guesswork
What this customer asked three months ago, in their words	Unstructured cross-run recall	A semantic / vector memory store, scoped to that customer	The one case that genuinely needs semantic memory
What the agent worked out mid-task and needs ten steps later	Working scratchpad	The agent's context window or a memory file the run writes to	Lives and dies with the run

Four of those five rows are not jobs for a vector memory store. That is the whole point.

The mistake: a vector store for facts you could key

The default move in 2026 is to pipe everything the agent should remember into a vector memory product and let semantic search pull it back. That works for one slice and quietly breaks the rest.

A customer's email address, plan tier, and last order date are structured facts with a key, the customer ID. The right store is the row you already have in your CRM or your orders table, and the right retrieval is a lookup: give me the record for this customer. It returns the same exact value every time, and you can audit it.

Push those same facts into a vector store and retrieval becomes approximate. Semantic search returns the chunks nearest to the query, not the one true record, so the agent can surface a stale order, blend two customers whose notes read alike, or miss the field entirely when the phrasing does not match. You have taken data that supported an exact lookup and turned its recall into a probability. For a plan tier or a balance, "probably right" is the wrong guarantee.

The rule we use: anything you can key, key it. Store it in a real table, give the agent a tool that fetches it by ID, and reserve semantic memory for the genuinely unstructured slice, the past conversation in the customer's own words that has no clean field to live in. Even then, scope the memory to the customer so the search runs inside their history, not across everyone's. This is the same instinct behind the agent-versus-workflow call: match the tool to the shape of the work, and do not let a fashionable default pick for you.

What the new memory tools actually solve

In late 2025 the platforms shipped real memory features, and it is worth being precise about which bucket each one fills, because the marketing blurs them.

Anthropic released a memory tool and context editing on September 29, 2025. The memory tool lets Claude create, read, update, and delete files in a directory you host, so information persists across conversations, and context editing automatically removes stale tool calls and results as the agent nears its token limit. In Anthropic's own evaluation the two together lifted agentic-search performance by 39 percent, and context editing alone cut token use by 84 percent over a 100-turn web-search task. OpenAI's Responses and Conversations APIs give you server-side conversation state and a durable conversation object you can carry across sessions and devices.

These are real and genuinely useful. But notice what they are for. The memory tool is a scratchpad and a context manager for a long-running agent, the working-memory slice: it keeps a model from drowning in its own context over a hundred turns. The Conversations API is bucket one, conversation state, handled for you. Neither is a substitute for your system of record. They make an agent better at holding a long task together. They do not, and should not, become the place your customer data lives. Your CRM stays the source of truth, and the agent reaches it through a tool that queries it by key.

How to decide what to store where

For each thing you want your agent to remember, ask one question: does it have a key. If it has a key, a customer ID, an order number, a ticket ID, it belongs in a table, and the agent gets it with a lookup tool, exact and auditable. If it is the live back-and-forth of the current session, it is state, and the platform or a passed-back transcript holds it. If it is company knowledge, run the vector-database test before you build any index. Only the unstructured residue, the past interactions with no clean field, justifies a semantic memory store, and even that should be scoped per customer.

Most of what we are asked to give an agent "memory" for turns out to be a query against data the business already has, wired up as a tool the model can call. We build that lookup layer into the custom software and AI assistant work we ship, and the customer record behind it is usually the same kind of lead and account data that drives leads.geninfos.com. If you are scoping an agent and trying to decide what it should remember and where, tell us what it needs to recall and we will sort it into the three buckets and point each one at the cheapest store that returns the right answer.

Does Your AI Agent Need Memory or Just a Database?

Why an AI model forgets between calls

The three things operators call memory

Where each kind of memory actually belongs

The mistake: a vector store for facts you could key

What the new memory tools actually solve

How to decide what to store where

Frequently Asked Questions

SOURCES & CITATIONS

About Alexey Yushkin

Related reading

AI Agent vs Workflow: When You Need an Agent

Do you need a vector database for AI on your docs?

When should an automation use AI instead of a rule?

Want this kind of system in your business?