Skip to main content
AI Agents & Automation

⏱ About 20 min20 XP

Short-Term vs. Long-Term Memory

Human memory is often described in terms of two broad systems: working memory, which holds a small amount of information actively in mind right now, and long-term memory, which stores a vast archive of knowledge, experiences, and skills that can be retrieved when needed. AI agents face a structurally identical problem, and the solution architecture maps onto the same two-tier structure. Short-term memory for an agent is the context window — everything the model can see in the current call. Long-term memory is any persistent store outside the model — a database, a vector store, a file system — that survives between calls and can be selectively retrieved.

The Two-Tier Architecture

Short-term memory: the context window. Fast, immediately available to the model, but bounded and ephemeral — it disappears when the call ends. Long-term memory: an external persistent store. Unlimited capacity, survives forever, but requires explicit retrieval before the model can use it.

Short-Term Memory: The Context Window as Working Memory

Working memory in a cognitive science sense holds roughly seven items simultaneously, give or take two. The context window is a far more generous working memory — 128,000 tokens can hold an entire book — but it shares the essential characteristic of working memory: it is finite and it resets. When the call ends, whatever was in the context is gone unless the application code saves it somewhere. For an agent, short-term memory includes everything explicitly placed in the current prompt: the system prompt, conversation history replayed so far, any tool outputs retrieved this step, the current task description, and any intermediate reasoning the model has produced in chain-of-thought format. The key property is that the model can directly attend to all of this — it requires no lookup, no retrieval step, no translation. It is simply there, immediately and uniformly accessible.

The costs of relying exclusively on short-term memory are predictable. First, cost: long prompts consume more tokens and cost more per call. Second, capacity: even a 200,000-token window fills up — a month of agent interactions will far exceed it. Third, noise: a context packed with mostly irrelevant history degrades model performance. The model has to attend over everything, including material that is no longer relevant to the current step.

Long-Term Memory: Persistent Stores Outside the Model

Long-term memory for an agent is any external system that persists information beyond a single model call. In practice, agents use several distinct types of external store depending on what they need to remember. Relational databases store structured records: user profiles, task logs, completed steps, configuration. They support precise queries — find all tasks completed since yesterday, get the user's preferred language — but struggle with fuzzy conceptual similarity searches. Vector databases store dense numerical embeddings of text. They are optimized for semantic similarity search: given a query like 'papers about climate adaptation in agriculture,' retrieve the ten stored documents whose meaning is closest to that query. This is the engine behind retrieval-augmented generation, covered in depth in the next lesson. Key-value stores like Redis give agents a fast scratchpad for structured state: the current task's status, intermediate counters, session flags. They are simple and extremely fast but unstructured — the agent must know exactly what key to look up. File systems store raw text, code, or binary data. An agent that processes large documents often stores raw content on disk and reads it back only when needed, rather than keeping it in the context.

Match each storage type to the scenario where it is the best choice for agent long-term memory.

Terms

Relational database
Vector database
Key-value store
File system
In-context (short-term only)

Definitions

Persisting a 200-page PDF the agent downloaded so it can be re-read in future calls without re-downloading
Finding the three most semantically relevant past conversations when a new user query arrives
Tracking which of 500 tasks the agent has completed, with timestamps and status codes
Holding the last three tool results that the model must reason about simultaneously right now
Storing a session flag indicating whether the user has already confirmed their billing address

Drag terms onto their definitions, or click a term then click a definition to match.

The Write-Read Cycle

Long-term memory is only useful if the agent both writes to it and reads from it at the right moments. The write step happens after the model produces output worth keeping: a completed task, a useful piece of retrieved information, a decision the agent made that future steps must respect. The read step happens before a model call that needs that information: retrieve relevant records from the store and inject them into the prompt. This write-read cycle is explicit and engineered — it does not happen automatically. An agent that never writes to long-term memory cannot benefit from it. An agent that writes but never retrieves is just filling a store that does nothing. Getting both steps right, at the right moments in the agent loop, is the core skill of memory system design.

Design Both Directions

When designing an agent's memory system, always specify both the write trigger (what event causes this information to be stored, and in what format) and the read trigger (at what point in the agent loop is this information retrieved and injected into the prompt). A store with no read path is a black hole.

An agent is helping a user plan a week-long trip. On day 1 of the planning session, the user specifies they are vegetarian. On day 3, the user asks for restaurant recommendations. The agent suggests a steakhouse. What memory failure occurred?

Which statement best describes the primary advantage of long-term memory over short-term memory for agents?

Flashcards — click each card to reveal the answer

Memory Tier Assignment

  1. You are designing memory for an AI coding assistant agent. The assistant helps developers across multiple sessions, each session potentially days apart.
  2. Below is a list of information the agent might need. For each item, decide: (A) short-term memory only, (B) long-term memory only, or (C) both — written to long-term after discovery, retrieved into short-term when needed. Justify each decision.
  3. 1. The user's preferred programming language (Python vs TypeScript)
  4. 2. The current file the user is editing right now
  5. 3. A bug fix the agent applied in a previous session that must not be reverted
  6. 4. The last three error messages the model is currently debugging
  7. 5. The full git history of the repository (thousands of commits)
  8. 6. The fact that the user prefers concise explanations over verbose ones
  9. 7. The current function the model is in the middle of rewriting
  10. After completing the table, answer: what single memory failure would cause the most damage to this agent, and why?