Perceive: Gathering Information
A surgeon cannot operate on a patient they cannot see. A navigator cannot plot a course without knowing their current position. An AI agent faces exactly the same constraint: before it can think or act, it must perceive. Perception is the foundation of the entire agent loop, because every decision and every action is only as good as the information it is based on.
What Does Perception Mean for an Agent?
For a human, perception means using senses — sight, sound, touch, smell. For an AI agent, perception means receiving data from whatever inputs the agent has access to. The specific inputs depend entirely on what kind of agent it is and what environment it operates in. A text-based assistant perceives the words a user types. A web browsing agent perceives the HTML content of a web page. A robotic arm perceives depth measurements from a camera. A financial trading agent perceives live price feeds. A smart home agent perceives temperature readings, door sensor states, and calendar events. In every case, perception is the act of converting raw external data into something the agent can reason about.
Perception is the process by which an agent collects information about its environment. The inputs can be text, images, sensor readings, database records, API responses — anything the agent is designed to receive.
The Context Window: Working Memory for Language Agents
For language-based AI agents — the kind that process and produce text — perception flows into what is called the context window. The context window is the block of text the agent can currently see: the user's message, the conversation history, tool outputs, and any other information that has been loaded into it. Think of the context window as a whiteboard. Everything written on the whiteboard is available to the agent's reasoning. Information not on the whiteboard is invisible to the agent, even if it exists somewhere in the world. This is why agents must be careful about what they perceive and what they include in their context — the whiteboard has a size limit. When an agent searches the web and gets back a list of results, it perceives those results by adding them to the context window. When it finishes a task and starts fresh, the whiteboard is wiped. Perception is therefore not just passive sensing — it is an active process of choosing what to pay attention to.
The context window is the total information currently visible to a language-based agent. It functions like working memory: rich and detailed in the moment, but bounded in size and lost when the session ends.
Garbage In, Garbage Out
There is a classic phrase in computing: garbage in, garbage out. It means that if you feed a system bad input, you will get bad output no matter how smart the system is. For agents, this principle applies directly to perception. If an agent perceives incomplete data, its reasoning will have blind spots. If it perceives outdated information, its actions will be based on facts that are no longer true. If it perceives misleading or manipulated data — a real attack called prompt injection, where malicious text tries to hijack the agent's behavior — it may take harmful actions. The quality of what an agent perceives sets the ceiling for everything it can achieve.
Prompt injection is a security risk where malicious text embedded in data (like a web page or a document) tries to trick an agent into following the attacker's instructions instead of the user's. Designers must validate what agents perceive.
Flashcards — click each card to reveal the answer
A language agent is asked to summarize a document, but the document was not loaded into the context window. What will happen?
Which of the following best describes a prompt injection attack?
Design an Agent's Senses
- Step 1: Choose one of these agent scenarios: (A) a homework-helper agent, (B) a weather-alert robot for farmers, or (C) a hospital room monitor.
- Step 2: List at least five distinct types of information the agent would need to perceive at the start of each loop cycle.
- Step 3: For each input, identify its source — where does this data come from? (user message, sensor, database, API, etc.)
- Step 4: Identify one way each input could be wrong, missing, or manipulated — and what bad outcome that might cause.
- Step 5: Write a one-paragraph 'perception design statement' for your agent, explaining what it senses and why each input matters.