Skip to main content
AI Agents & Automation

⏱ About 20 min20 XP

Designing a Memory System

Every working agent has a memory system. The question is whether that system was designed or merely accumulated. An accidental memory system — one that grew from ad hoc decisions made during development — tends to be fragile, difficult to debug, and riddled with the failure modes studied in the previous lesson. A designed memory system starts from explicit requirements, makes deliberate tradeoffs, and produces a clear architecture that any engineer can understand, test, and improve.

Designing a memory system means answering five questions before writing any code: What does the agent need to remember? For how long? In what format? How is it written? How is it retrieved? These questions force the designer to think about memory as a first-class concern rather than an implementation detail bolted on after the core logic is working.

Memory as a First-Class Concern

Professional agent engineers design the memory system at the same time as the agent loop, not after. Memory requirements are discovered by asking: what information does the model need at each step to make the right decision? That list of requirements drives the memory architecture.

Step 1: Identify What Must Be Remembered

Walk through the agent's loop step by step and ask, at each step: what information does the model need here that it cannot derive from the current step's inputs alone? Every answer to that question is a memory requirement. Group the requirements by lifespan. Some information is needed for only one step (ephemeral — it can live in the context and be dropped after the step). Some information is needed for the entire duration of the current session (session-scoped — keep it alive in context or a session store). Some information must persist across sessions, potentially forever (persistent — requires a durable store). Also group by type. Facts about the world (world state) age out and need refresh. Decisions made by the agent (committed actions) must never be lost. User preferences accumulate and should survive indefinitely. Error history prevents repeated mistakes.

Step 2: Choose a Storage Strategy for Each Category

Each category of information identified in step 1 maps to a storage strategy. The mapping is driven by the information's lifespan, its query pattern, and its sensitivity to staleness. Ephemeral, single-step information: context window. Simple, immediate, no overhead. Session-scoped conversational history: context window with compression. Replay the full history in each prompt until it approaches the context limit, then apply a rolling window with pinned messages or LLM-generated summaries. Structured persistent facts: relational database. User ID, preferences, task history, account data. Queried by exact key, updated transactionally, never lost. Semantic knowledge that requires fuzzy search: vector database. Document corpora, prior conversation summaries, knowledge base articles. Retrieved by similarity. Fast mutable session flags: key-value store. Current task status, loop counters, temporary flags that must be visible across steps within a session.

Match each information category to the correct storage strategy for a long-running agent.

Terms

The user's account ID and billing tier
The three most relevant policy paragraphs for the current user question
Whether the current loop iteration has already sent a notification email
The last 8 messages of the current conversation
A running summary of the 200-message conversation from two hours ago

Definitions

Context window — replayed in the prompt for immediate model access
Compressed summary injected into context — compacted history with pinned key facts
Relational database — structured, persistent, queried by exact key
Vector database — retrieved by semantic similarity at query time
Key-value store — fast mutable flag scoped to the current session

Drag terms onto their definitions, or click a term then click a definition to match.

Step 3: Define Write and Read Triggers

For every piece of information stored outside the context window, the designer must specify two triggers explicitly. The write trigger answers: what event causes this information to be written to the store, and in what format? Common write triggers are: after a tool call completes (write the result), after the model makes a decision (write the decision and its rationale), after a session ends (write a session summary), at a regular interval (checkpoint the current state). Without an explicit write trigger, information accumulates in context but never makes it to the store. The read trigger answers: at what point in the agent loop is this information retrieved from the store and injected into the prompt? Common read triggers are: at the start of every loop iteration (inject persistent user preferences), before a specific category of action (inject error history before attempting a risky operation), when a semantic similarity threshold is met (inject retrieved documents when the user question matches stored content). Without an explicit read trigger, information sits in the store unused.

Document Both Triggers

For every external store in your memory architecture, write two sentences in your design doc: one describing the write trigger, one describing the read trigger. If you cannot write either sentence clearly, the store is not fully designed yet.

Step 4: Plan for Failure

A memory system that works only when everything goes right is not production-ready. Design explicitly for failure cases. What happens if the vector database returns zero results? The agent should not call the model with an empty context block — it should fall back to a broader query, use a default context, or ask the user to clarify. What happens if the relational database is temporarily unavailable? The agent should either use a cached version of critical data (acceptable if slightly stale) or fail gracefully with an explanation rather than crashing or hallucinating. What happens if an LLM-generated summary introduces an error? The agent should maintain the original pinned messages for critical facts so that a corrupt summary cannot override them. What happens when context approaches the limit unexpectedly? The agent should detect this condition proactively and trigger compression before hitting the ceiling, not after.

Putting It Together: A Memory Architecture Document

The output of the design process is a memory architecture document — a concise specification that lists every category of information the agent uses, its storage strategy, its write trigger, its read trigger, its lifespan, and its failure handling. This document serves as the single source of truth for the memory system. When a memory failure occurs in production, engineers consult this document to trace the root cause. In frameworks like LangChain and LangGraph, much of this architecture is expressed explicitly in code: memory objects are declared, their read and write methods are wired into the graph nodes, and their storage backends are configured. The framework enforces the structure; the design document captures the reasoning behind the choices.

A designer says: 'We store the user's dietary preferences in a relational database.' An engineer asks: 'When do we read them back into the prompt?' The designer has no answer. What design step was skipped?

An agent's memory architecture includes a vector store for product knowledge. The vector store returns zero results for a customer's unusual question about a discontinued product. What should the agent do?

Designing a memory system begins by identifying what the agent must at each step. Each category of information is assigned a strategy based on its lifespan and query pattern. For every external store, the designer must define both a trigger and a trigger. The design is completed by planning for cases such as empty retrieval results or store unavailability.

Memory Architecture Document

  1. You are building a memory system for an AI job-application assistant that helps users track job applications, tailor resumes, and prepare for interviews — across many sessions over several weeks.
  2. Step 1: List at least six distinct categories of information this agent must track. For each, specify its lifespan (ephemeral, session, persistent).
  3. Step 2: For each category, assign a storage strategy from: context window, relational database, vector database, key-value store, or compressed summary.
  4. Step 3: For every category stored outside the context window, write one sentence each for the write trigger and the read trigger.
  5. Step 4: Identify two realistic failure cases (e.g., what if a user changes their target job role mid-session?) and describe how your memory system handles each.
  6. Step 5: Draw a simple diagram showing the flow of information: where each category lives, and the arrows showing when it is written and when it is injected into the prompt.