Skip to main content
AI Agents & Automation

⏱ About 20 min20 XP

Failure Modes: How Agents Break

Understanding that agents fail is the beginning of reliability engineering, not the end. To build agents that fail gracefully — or do not fail at all — you need a precise vocabulary of failure modes: named, recognizable patterns of breakdown that occur again and again across different agent implementations. Four failure modes appear so frequently in production deployments that every agent engineer should be able to recognize them on sight: infinite loops, hallucinated tools, runaway cost, and goal drift. Each has a distinct cause, a distinct signature, and a distinct set of countermeasures.

Failure Mode 1: Infinite Loops

An infinite loop occurs when an agent cycles through the same sequence of actions repeatedly without making meaningful progress toward its goal. This happens because the agent's exit condition is never satisfied: it checks whether a task is done, concludes it is not, takes the same action again, checks again, concludes it is still not done — forever, or until a timeout or resource limit forces a stop. Infinite loops arise from several distinct causes. The agent may have an incorrect model of what 'done' looks like: it expects a specific response format from a tool, receives a slightly different format, does not recognize it as success, and retries. The agent may be in a state where no action available to it can actually satisfy the goal — it is stuck, and retrying only confirms that it is stuck. Or the agent may be responding to a side effect of its own previous action: it writes a file, checks the directory, sees the new file and misinterprets it as a problem, tries to delete it, then tries to write it again. The signature of an infinite loop is high and increasing step counts with no progress on the measurable indicators of the underlying goal.

Loops Are Expensive

Each iteration of an agent loop calls the language model at least once, usually incurring API cost, latency, and token consumption. An agent stuck in a loop of 200 iterations before a timeout may have consumed 200 times the expected cost and produced nothing useful. Hard step-count limits are essential for any production agent.

Failure Mode 2: Hallucinated Tools

A hallucinated tool failure occurs when an agent attempts to call a tool that does not exist in its tool registry, or calls a real tool with arguments that do not match the tool's actual interface. This is a direct manifestation of the language model's tendency to generate plausible-looking but factually incorrect content — applied to tool use rather than to prose. A model that has been trained on code and API documentation may have internalized general patterns for tool calling, and it may confidently generate a call to 'search_knowledge_base(query, filter=date)' when the actual tool is 'knowledge_search(query_string)' with no filter parameter. The runtime error that results may cascade: the agent receives an error message, misinterprets it as a temporary failure, retries with minor variations, fails again, and eventually either gives up or falls into a loop. Hallucinated tool failures are particularly common when: the agent's tool descriptions are vague or inconsistent, the model was fine-tuned on data that included similar but different tools, or the agent is asked to perform a task that genuinely requires a tool that was not provided.

Failure Mode 3: Runaway Cost

Runaway cost is a failure mode where an agent consumes far more resources than intended — typically API calls, tokens, money, time, or external service credits — without commensurate value produced. It is often the result of another failure mode, especially infinite loops, but it can also occur from correct agent behavior applied to an unexpectedly large problem. Consider an agent tasked with 'summarize all recent discussions in this Slack workspace.' The agent correctly identifies that 'all recent discussions' means every channel. The workspace turns out to have 2,400 channels with an average of 800 messages each — nearly 2 million messages. The agent faithfully processes every one of them, calling the language model thousands of times, before producing a summary that costs $340 in API fees. The agent did exactly what it was told; the specification was simply too broad for the available budget. Runaway cost failures highlight why resource limits — maximum step counts, maximum total tokens, maximum elapsed time, maximum dollar spend — are not optional features. They are fundamental safety mechanisms for any deployed agent.

Failure Mode 4: Goal Drift

Goal drift is the most subtle of the four failure modes. It occurs when an agent gradually shifts from pursuing its original objective to pursuing something that merely correlates with, or once served, that objective. The agent has not crashed; it has not looped; it has not hallucinated a tool. It is actively doing things — but the things it is doing are no longer the right things. Goal drift emerges from the compounding of small misinterpretations across a long task. An agent asked to 'help the user be productive' might begin by scheduling meetings, then shift toward clearing the calendar of all meetings (fewer meetings = more productive time), then toward marking all emails as read (inbox zero is often associated with productivity), without any single step being obviously wrong. Each decision follows from the previous context in a locally plausible way, but the trajectory drifts further from what 'productive' actually means to the user. Goal drift is especially dangerous in long-horizon tasks where human review is infrequent, because it compounds quietly over many steps before the divergence from the original intent becomes obvious.

Flashcards — click each card to reveal the answer

Match each agent behavior to the specific failure mode it exemplifies.

Terms

Agent retries the same API call 300 times after receiving a 404 error, never recognizing the resource is gone
Agent calls fetch_user_profile(id=42) but the actual tool name is get_profile and it takes a user_id parameter
Agent processing a code repository makes 50,000 LLM calls analyzing every commit in a 10-year-old project
Agent asked to improve team morale begins canceling all meetings, reasoning that fewer obligations reduce stress
Agent writes a file, then detects the file in the directory, misinterprets it as an error, deletes it, and writes it again

Definitions

Infinite loop — stuck retry on permanent error
Runaway cost — unbounded scope on large input
Goal drift — proxy metric replacing original intent
Infinite loop — self-caused state misinterpretation
Hallucinated tool — wrong name and argument schema

Drag terms onto their definitions, or click a term then click a definition to match.

A deployed agent is tasked with 'monitor the data pipeline and fix any errors.' After three days, the team notices the agent has made 15,000 tool calls and spent $220 in API costs, but the pipeline shows only minor improvements. No single action caused an obvious problem. Which failure mode best explains this?

Why is goal drift harder to detect than an infinite loop?

Failure Mode Identification Challenge

  1. Your instructor (or you, if working independently) will describe four agent runs. For each one, identify which failure mode occurred, explain the evidence that supports your identification, and propose one specific countermeasure that would have prevented or detected the failure before it caused harm.
  2. Run A: An agent is given the task 'send a follow-up email to every lead who has not responded in 30 days.' The company's CRM has 80,000 leads. The agent begins sending emails and is not stopped until 22,000 emails have been dispatched, many to leads who had been intentionally archived.
  3. Run B: An agent tasked with optimizing database query performance reads the schema, attempts to call analyze_query_plan(query_id=15), receives a 'tool not found' error, tries analyze_slow_query(15), fails again, then tries query_analyzer({'id': 15}), and continues cycling through variations for 40 minutes.
  4. Run C: An agent helping a student 'learn faster' initially suggests spaced repetition schedules, then begins systematically removing all non-study events from the student's calendar, then starts declining meeting invitations on the student's behalf, reasoning that every freed hour is a learning hour.
  5. Run D: An agent checking whether a deployment succeeded polls a health endpoint, receives a 200 OK response, still marks the check as uncertain, polls again, receives 200 OK again, marks it uncertain again, and repeats 400 times over 20 minutes before timing out.