Error Handling in Tool Calls
Every tool call is a bet that the real world will cooperate. Sometimes it does not. The external API is down. The record you are looking for does not exist. The model produced an argument value that passes schema validation but makes no real-world sense. The tool runs successfully but returns data that contradicts what the model expected. In each case, the agent must do something intelligent — and 'intelligent' almost never means 'crash silently' or 'hallucinate a result.' Error handling in tool-using agents is a first-class design concern, not an afterthought.
A Taxonomy of Tool Failures
Tool failures fall into four distinct categories, each calling for a different response strategy. Schema validation failures occur before the tool even runs. The model emits a tool call with a missing required argument, a value outside an allowed enum, or a type mismatch. Modern AI APIs catch most of these at the API layer and return an error without ever invoking the executor. The fix: improve schema definitions and parameter descriptions so the model produces valid calls. If these errors occur in production, they indicate schema quality problems. Execution failures occur when the tool runs but encounters an error: network timeout, database unavailable, rate limit hit, permission denied, external service returns a 500 error. These are transient or environmental failures. The appropriate response is often a retry with exponential backoff. After a configurable number of retries, the executor should return a structured error to the model rather than continuing to retry indefinitely. Semantic failures are the trickiest. The tool executes successfully and returns a syntactically valid result — but the result is wrong in ways the model may not detect. A geocoding tool returns coordinates for London, UK when the user meant London, Ontario. A product search returns results for 'Apple' the technology company when the user was asking about Apple the fruit. Semantic failures require validation logic that goes beyond type checking. Logic failures happen when the model's tool-call plan is wrong — it calls tools in the wrong order, uses a tool for a purpose it was not designed for, or constructs arguments from previous results incorrectly. These require examining the model's reasoning and often improving the system prompt or tool descriptions.
When a tool fails, the executor should return a structured error message to the model — not just log the error and return null. A model that receives {error: 'RATE_LIMITED', retry_after_seconds: 30, message: 'Too many requests to the weather API.'} can reason about it: tell the user there is a temporary delay, try a fallback tool, or ask the user if they want to wait. A model that receives null has nothing to reason about and will often hallucinate a result.
Designing the error response the tool returns is as important as designing the success response. A minimal but well-formed error object contains: error_code: A short, machine-readable identifier (USER_NOT_FOUND, RATE_LIMITED, PERMISSION_DENIED, TIMEOUT, INVALID_ARGUMENT). This lets the model and executor make programmatic decisions without parsing prose. message: A human-readable explanation of what went wrong. Detailed enough to be useful; brief enough to not clutter the model's context. recoverable: A boolean indicating whether this error might succeed if retried. TIMEOUT is often recoverable; USER_NOT_FOUND is not. suggested_action: Optional guidance for the model — 'Try searching by email instead of ID' or 'Use the fallback_search tool for approximate matches.' This structure lets the model make an informed decision about its next action. Without it, the model must guess.
Match each failure type to its most appropriate executor response strategy.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
Input Validation and Graceful Degradation
Input validation in tool-using agents has two layers. The first layer is schema validation, which the API enforces automatically: required fields are present, types match, enum values are from the allowed set. This catches structural errors but not semantic ones. The second layer is business logic validation inside the tool itself. A date_range parameter might be structurally valid (two ISO date strings) but semantically nonsensical (the end date is before the start date). A user_id might be correctly typed as an integer but fall outside the valid range. A search query might be syntactically valid but contain characters that would cause a SQL injection if passed directly to a database query. Business logic validation catches these cases. Graceful degradation means the agent produces a useful response even when some tools fail. This requires designing fallback paths. If the primary product database is unavailable, perhaps a cached copy can serve approximate results. If real-time stock prices cannot be fetched, perhaps the agent can tell the user 'I cannot retrieve live prices right now, but as of yesterday's close, the price was...' rather than failing entirely. The distinction between a brittle agent and a robust one is almost always error handling quality. Brittle agents work perfectly in testing (when external services are up and inputs are clean) and fail unpredictably in production. Robust agents are designed around failure as the expected condition.
The most dangerous error-handling pattern is catch-all exception handling that returns an empty result or null when anything goes wrong. The model receives nothing, cannot reason about the failure, and may fabricate a plausible-sounding answer based on its training knowledge. In a medical, financial, or legal context, a confidently hallucinated answer that should have been an error acknowledgment can cause serious harm. Always surface errors explicitly.
Fill in the four categories of tool-call failures.
An agent calls a send_sms tool with a phone number that passes JSON schema validation (it is a string) but is formatted as '555-1234' instead of '+15551234'. The tool runs and silently fails to deliver the message. Which failure type is this, and what is the correct fix?
After a tool call fails with a RATE_LIMITED error, what is the correct executor behavior?
Design an Error Handling Strategy
- You are building an AI-powered travel booking agent with these tools: search_flights, book_flight, search_hotels, book_hotel, and process_payment.
- Step 1: For each tool, list the three most likely failure modes (not just 'it fails' — be specific: what type of failure, what causes it?).
- Step 2: For each failure mode, specify: (a) is it recoverable? (b) what should the executor do? (c) what structured error should be returned to the model?
- Step 3: process_payment is critically non-idempotent. Describe a concrete idempotency mechanism: what unique key would you use, where would you store it, and how would the tool check it before charging?
- Step 4: Describe a graceful degradation path for the scenario where search_flights is down entirely. What can the agent still do? What should it tell the user?
- Step 5: Write the error-response object for a payment failure where the card was declined. Include all fields from the structured error format described in this lesson.
- Goal: practice designing error handling as a first-class feature, not an afterthought.