AI News HubLIVE
In-site rewrite6 min read

Agent Memory: From Conversation History to Persistent Knowledge

This article explores the concept of agent memory in AI, detailing various memory types including conversational, semantic, episodic, procedural, entity, working, and summary memory. It discusses the challenges of building effective memory systems and how Oracle's AI Agent Memory Package (OAMP) leverages an AI database to provide a unified memory solution.

SourceO'Reilly AI & ML RadarAuthor: Angie Jones

The following article originally appeared on Angie Jones’s LinkedIn page and is being republished here with the author’s permission.

I’m fascinated by the concept of agent memory. LLMs are stateless by design, meaning they have no memory or awareness of past interactions. Each prompt you send to an LLM is treated as a completely isolated event.

When you have a continuous chat with an AI agent, it feels like the AI remembers previous messages. However, the interface itself is faking it. Behind the scenes, your agent takes the entire conversation history and resends all of it to the LLM as one giant, combined prompt.

Companies, researchers, and even indie devs are all trying to crack agent memory. Because once an agent can remember, the entire interaction changes. It can build on what it learned, adapt to the user, resume work after a restart, and develop a sense of continuity.

Recently, I spent time with Richmond Alake, who has been in the trenches working on agent memory at Oracle.

Richmond Alake, the agent memory guru

We talked about the different kinds of memory, why memory is harder than it sounds, and what it takes to build a memory system that is actually useful in production.

That conversation made something very clear to me. When people say, “agent memory,” they often mean very different things.

So let’s unpack the various types of memory.

Conversational memory

Conversational memory is the one most people think of first. It stores the messages exchanged between the user and the assistant.

This makes sense. If I ask, “What did I say was the ultimate goal of this task?” the agent needs access to the conversation in order to answer. Without that history, every turn starts from zero.

But this is also where many memory systems go wrong.

The most common first attempt is to keep appending prior messages to the prompt. For example:

User: I’m building a customer support agent.

Assistant: Great, what should it do?

User: It should look up past tickets and draft replies.

Assistant: Got it.

User: Also, I prefer Python and FastAPI.

Then on the next call, we send all of that back to the model along with the new question.

This works for a short conversation, but the agent only “remembers” because we keep reminding it. This is not really memory engineering.

Eventually, the conversation gets too long and the model receives a giant blob of context where some details are important, some are stale, and some are completely irrelevant. The agent may technically have the information, but that doesn’t mean it can use it well.

So yes, conversation history is a valid and important type of memory. But it shouldn’t be the whole memory strategy. Real agent memory requires deciding what should be stored, where it should be stored, how it should be retrieved, and when it should be summarized, forgotten, or compressed.

Semantic memory

Semantic memory stores durable facts.

These are things that should outlive the exact conversation where they were learned:

The user prefers Python over TypeScript for backend work.

The customer support agent needs access to past tickets.

The production system handles 50,000 queries per day.

This is different from conversational memory because the exact wording and sequence are less important. What matters is the meaning.

If the agent needs to recall what stack the user is using, it should retrieve the memory even if the user never says those exact words again.

Vector search is useful for this. The memory can be embedded and retrieved by semantic similarity.

The benefit is that the agent doesn’t need to replay the full conversation. It can retrieve the few durable facts that are relevant to the current request.

Episodic memory

Episodic memory stores events.

This is the “what happened” layer of memory:

The agent searched the web for recent API gateway patterns.

The agent generated a draft response for ticket #4821.

The workflow failed at the compliance review step.

Episodic memory is especially useful for debugging, auditing, and long-running workflows.

For example, if an agent makes a decision, I may want to know what happened right before that decision (e.g., What tools did it call? What data did it retrieve?).

This type of memory often benefits from structured storage.

For example:

Find all failed tool calls from the mortgage approval workflow in the last 24 hours.

That is a database query problem, not just a vector search problem.

Procedural memory

Procedural memory is about how to do things.

For example:

When investigating a failed deployment, check logs first, then recent config changes, then dependency updates.

When drafting a customer support reply, include the ticket summary, likely cause, recommended fix, and next step.

When creating a database-aware agent, scan table comments, column comments, constraints, and recent workload patterns.

This is the kind of memory that helps an agent improve its process. That’s powerful because agents are often asked to operate in messy real-world environments. With procedural memory, it can reuse proven approaches.

The value extends beyond just knowing things to actually knowing how to proceed.

Entity memory

Entity memory stores facts about specific people, accounts, projects, systems, tickets, or objects.

For example:

Angie prefers practical examples over abstract explanations.

Customer Acme Corp has strict data residency requirements.

Ticket #4821 is related to a billing reconciliation issue.

Entity memory matters because many agent tasks are scoped around a particular thing.

If I ask, “What do we know about Acme Corp?” I don’t want every memory in the system. I want memories attached to that customer.

This is also where memory safety becomes important.

Agents should not accidentally mix memories between users, customers, or projects. A memory system needs strong scoping so one user’s context does not leak into another user’s response.

Working memory

Working memory is the short-term scratchpad for the current task.

This is where the agent keeps temporary information while reasoning through a problem.

Working memory is usually not meant to last forever. It’s useful during the task, but it may not deserve to become durable memory.

If an agent stores every temporary thought as long-term memory, the memory store gets noisy very quickly. The agent may later retrieve half-baked assumptions as if they were facts, which is dangerous.

Not everything the agent observes or thinks should be remembered permanently.

Summary memory

Summary memory is one many agent users are familiar with. It deals with the problem of context windows being limited.

Even with large context models, you can’t keep appending forever. At some point, you need to compress.

Summary memory stores a compact version of a longer thread or context window. The original details can still live in the thread, but the prompt gets a smaller representation.

For example, instead of sending 80 turns of conversation, the agent might send:

The user is building a SaaS customer support agent. They prefer Python and FastAPI, deploy on OCI, and want the agent to retrieve past tickets before drafting replies. They are currently evaluating memory strategies for production usage.

Why memory is hard for agents

At first, memory sounds straightforward: store things, retrieve them later.

But the hard part is judgment, not storage.

What should be remembered? If the user says, “I usually prefer Python,” that’s probably worth remembering. If they say, “Let’s try Python for this one experiment,” maybe not. The agent needs to distinguish durable details from temporary context.

When should memory be updated? People change their minds, and systems and requirements change. If a user used to prefer FastAPI but now works mostly in Java, should the old memory be deleted, overwritten, or kept with a timestamp? A memory system needs a correction strategy.

How much memory should be retrieved? Retrieving too little means the agent misses important context. Retrieving too much means the prompt becomes noisy. This balance matters as more context isn’t always better.

How do we prevent memory leaks? If memories are shared across users, agents, or tenants, scoping is critical. The agent should only retrieve memories it’s allowed to use. This is especially important in enterprise systems where agents may operate across many customers, teams, or workflows.

How do we know whether memory helped? Memory should improve the agent’s behavior. It should reduce repeated questions, improve continuity, lower token usage, and help the agent produce more relevant responses. If memory just adds complexity without improving outcomes, it isn’t doing its job.

How Oracle is approaching agent memory

Richmond was gracious enough to share how Oracle is tackling this with the Oracle AI Agent Memory Package (OAMP), built on top of Oracle AI Database 26ai.

Yes, an AI database! Think of it as a database that can store and query the kinds of data AI applications need, not just rows and columns. That includes embeddings and JSON documents along with text search and regular SQL. These live together in the database, so an agent does not have to bounce between separate systems just to gather context.

The idea is to make Oracle AI Database the memory core for agents. Instead of stitching together a vector database, a relational database, a document store, and custom thread management, OAMP provides agent-friendly memory primitives on top of a database that already supports multiple data access patterns.

At a high level, OAMP gives you:

Users and agents to scope memory ownership

Memories for durable facts and extracted knowledge

Threads for conversation history and continuity

Context cards for compact, prompt-ready memory retrieval

Summaries for long-running conversations

Vector search for semantic recall

Database-backed persistence so memory survives restarts

This matters because, again, agent memory is not only a vector search problem. Some memory needs semantic retrieval. Some need ordered reads or exact SQL filtering. A database-backed memory system gives you room to support all of those patterns.

Here’s a small example of what that looks like in code:

from oracleagentmemory.core import OracleAgentMemory

from oracleagentmemory.core.llms import Llm

client = OracleAgentMemory(

connection=connection,

embedder="text-embedding-3-small",

llm=Llm("gpt-5.5"),

extract_memories=True,

schema_policy="create_if_necessary",

)

client.add_user(

"angie",

"Developer exploring agent memory patterns."

)

client.add_agent(

"memory-demo-agent",

"Assistant that demonstrates Oracle AI Agent Memory."

)

client.add_memory(

"Angie is fascinated by agent memory and prefers practical examples over abstract explanations.",

user_id="angie",

agent_id="memory-demo-agent",

)

There are a few important ideas packed into this snippet.

The OracleAgentMemory client is the bridge between the agent application and Oracle AI Database. The database connection tells OAMP where memory lives. The embedder tells it how to turn memory text into vectors for semantic retrieval. The LLM enables automatic memory extraction and summary generation. And schema_policy="create_if_necessary" lets OAMP manage the underlying memory schema instead of making every application reinvent it.

The user and agent registration may look like simple setup code, but it’s actually part of the memory model. Memories need ownership. In a real system, you don’t want one user’s preferences showing up in another user’s session, and you don’t want memories written by one agent casually mixed with another agent’s context. The user ID and agent ID give the memory layer a way to scope what gets stored and retrieved.

The add_memory() call stores a durable fact. This is a piece of information the agent may need later, even if the exact conversation has moved on.

Given

[truncated for AI cost control]