Agents and Long-Horizon Tasks
Most interactions with AI systems follow a simple pattern: you give a prompt, the model responds, the interaction ends. The model produces a fixed output in one step. This is powerful but limited. A model that can only respond to a single prompt cannot book your flight while also checking your calendar, emailing your hotel, and updating a shared spreadsheet — at least not without you manually carrying each output into the next step. AI agents break this pattern. An agent is an AI system that perceives its environment, makes decisions, and takes sequences of actions over time in pursuit of a goal. Rather than producing one output and stopping, an agent plans, acts, observes what happened, and decides what to do next. This capacity for sequential, goal-directed behavior — agency — is what makes frontier AI systems capable of long-horizon tasks: jobs that unfold over many steps, minutes, or even hours.
The Agent Loop
An AI agent operates in a loop: Perceive, Plan, Act, Observe, repeat. In concrete terms: Perceive: the agent receives input — a user instruction, the current state of a web page, the output of a tool call, an error message from a code execution environment. This input updates its context window. Plan: the agent uses its language model to decide what action to take next. For complex tasks, this planning may involve decomposing the goal into subgoals, considering alternatives, and prioritizing steps. Act: the agent executes an action by calling a tool — sending an HTTP request, running code, querying a database, clicking a button on a web page, writing a file. Observe: the result of the action — a web page's content, the output of the code, the response from an API — is returned to the agent and added to its context. The cycle repeats. This loop can iterate dozens or hundreds of times for a single high-level task. The agent that books your flight might browse three airline sites, parse fare tables, check your calendar for conflicts, fill out a booking form, and send a confirmation email — each step producing output that informs the next.
An agent's context window — the sequence of text it can attend to — functions as its working memory. As the agent acts and observes, the accumulating history of actions, observations, and plans fills this window. Long tasks can exceed the context limit, requiring the agent to summarize or compress earlier history. Managing context across long-horizon tasks is one of the key engineering challenges in building reliable agents.
What Long-Horizon Tasks Look Like
Research and synthesis: an agent is given a question — 'What are the five largest lithium producers by output in 2023, and what were their year-over-year changes?' — and autonomously searches multiple databases, cross-references sources for accuracy, resolves contradictions, and produces a cited report. Software engineering: an agent is given a GitHub repository and a bug report. It reads relevant files, identifies the root cause, writes a fix, runs the test suite, iterates until tests pass, and opens a pull request. Personal task completion: an agent manages a complex multi-step workflow — researching vendors, drafting emails, scheduling meetings, updating a shared project management tool — based on a single high-level instruction. Scientific experimentation: an agent in a computational biology lab reads a hypothesis, identifies relevant datasets, writes analysis code, runs it, interprets results, and suggests follow-up experiments — in an autonomous loop that might run overnight. These are not hypothetical: all of these capabilities are demonstrated by current frontier agent frameworks, including Anthropic's Claude Computer Use, OpenAI's Operator, and open-source frameworks like AutoGPT and LangChain agents.
Match each agent component to its role in the agent loop.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
Long-horizon agentic capability introduces risks that do not apply to single-turn models. An error early in the task can compound: if the agent misidentifies a user's goal in step one, it may execute dozens of actions in the wrong direction before the user notices. Actions may be irreversible: a deleted file, a sent email, a financial transaction, a deployed update cannot always be undone. The agent may encounter adversarial content: a malicious web page that contains text designed to hijack the agent's behavior — a technique called prompt injection. The field is actively developing mitigations: human approval checkpoints for irreversible actions, sandboxed execution environments, careful scope-limiting of agent permissions, and monitoring systems that flag anomalous agent behavior. But agent safety remains one of the most active and unsolved research areas in frontier AI.
An agent acting autonomously over many steps can cause harm that would not result from any single step in isolation. A mistake in understanding the user's intent, multiplied across fifty autonomous actions, can produce an outcome far from what was intended — and some of those actions may be impossible to reverse. The degree of autonomy granted to an agent should be proportional to the reliability with which its behavior can be verified.
Flashcards — click each card to reveal the answer
An AI agent is asked to book the cheapest available flight to a given destination. At step 12 of its 50-step task, it misreads a price table and identifies a more expensive flight as cheapest. What is the most concerning property of this scenario?
What distinguishes a prompt injection attack from a standard prompt input to an agent?
Design an Agent for a Real Task
- Design a hypothetical AI agent system for a task of your choosing.
- Step 1: Choose a long-horizon task — something that requires at least 10 distinct steps. Examples: planning and booking a team trip, conducting a literature review, managing a small e-commerce inventory.
- Step 2: Write out the complete agent loop for your task: list at least 10 Perceive-Plan-Act-Observe cycles in sequence, specifying what the agent perceives, what it plans, what action it takes, and what it observes.
- Step 3: Identify the three highest-risk steps — where an error would be most damaging or irreversible — and propose a specific safeguard for each (e.g., human approval checkpoint, sandboxing, confirmation email).
- Step 4: Identify one point where a prompt injection attack could occur — where external content the agent reads might contain malicious instructions. How would you defend against it?
- Present your design to the class and evaluate each other's risk assessments.