Skip to main content
AI Agents & Automation

⏱ About 20 min20 XP

Structured Outputs

Language models are trained to produce fluent natural language, which is exactly what makes them powerful for conversation — and exactly what makes them unreliable for machine-to-machine communication. When another piece of software needs to parse a model's output — to extract a list of entities, read a confidence score, or pass data to the next step in a pipeline — natural language is the wrong format. A model that sometimes writes '42%' and sometimes 'about forty-two percent' and sometimes 'approximately 42 percent, give or take' is useless to a parser expecting a numeric field. Structured outputs solve this by constraining the model to produce output in a predictable, machine-parseable format, almost always JSON.

JSON Mode and Response Schemas

The simplest form of structured output is JSON mode — an API parameter that instructs the model to always produce syntactically valid JSON. JSON mode prevents the model from outputting prose, markdown formatting, or partial JSON objects, but it does not constrain what keys or values appear. The model might produce {'result': 42} or {'answer': '42', 'unit': 'percent'} depending on context. For many use cases this is insufficient. Response schemas go further: you provide a JSON Schema definition describing exactly the structure you expect, and the model's output is guaranteed to match that structure — every required key present, every type correct, every enum value from the allowed set. This is called constrained decoding or structured generation. Under the hood, the inference engine masks out any token that would make the output non-conformant, guiding the model to produce only tokens that lead to valid schema-conformant JSON. OpenAI calls this feature Structured Outputs (capital S, the product name). Anthropic's tool-use API achieves the same effect through response schemas attached to tool definitions. Google's Gemini API supports response_schema in its generation config. The mechanism differs by provider but the guarantee is the same: you get back exactly the JSON shape you declared.

Constrained Decoding Is a Hard Guarantee

Response schemas using constrained decoding are not probabilistic — the output is guaranteed to be schema-conformant, not just likely to be. This is fundamentally different from instructing a model in a prompt to 'please respond in JSON.' Prompt-based instructions can fail; constrained decoding cannot, because invalid tokens are removed from the sampling distribution entirely.

Here is a concrete example. Suppose you are building a content moderation pipeline that classifies user-submitted text. You need downstream code to act on the classification result. Compare two approaches: Prompt-only approach: 'Classify the following text as SAFE, WARN, or BLOCK, and explain your reasoning.' The model might output: 'This text appears to be safe. Classification: SAFE.' Or it might write: 'I would classify this as WARN because...' Or: 'The appropriate label here is BLOCK.' All three are semantically correct but structurally incompatible — your parser must handle infinite variation. Structured output approach: You define a response schema with two required fields: label (enum: SAFE, WARN, BLOCK) and reason (string, max 100 characters). The model always outputs: {"label": "WARN", "reason": "Contains mildly aggressive language toward a named individual."}. Your downstream code does result['label'] — one line, no parsing logic, guaranteed to work. The structured approach does not reduce the model's analytical capability. It channels the same reasoning into a reliable container.

Match each structured output concept to its precise definition.

Terms

JSON mode
Response schema
Constrained decoding
Enum constraint
Prompt-only JSON instruction

Definitions

A JSON Schema definition the model's output is guaranteed to conform to exactly
A probabilistic approach that often works but can silently produce malformed or inconsistently structured output
The inference technique that masks invalid tokens to force schema-conformant generation
Limits a field's value to a fixed set of allowed strings, preventing invented variations
API setting that guarantees syntactically valid JSON output but does not constrain its structure

Drag terms onto their definitions, or click a term then click a definition to match.

When and Why Structured Outputs Matter

Structured outputs matter most in three situations. First, when model output feeds directly into code: if a function receives a model's response and calls result['score'] on it, that field must always be present and always be a number. A single malformed response crashes the pipeline. Second, when building multi-step agent systems: the output of one model call often becomes the input to the next. Structural inconsistency compounds — a slightly wrong format at step two can cause a catastrophic parse failure at step five. Third, when you need to audit or store model decisions: a standardized structured record is far easier to log, search, and analyze than freeform text. Structured outputs are also valuable for extraction tasks — pulling entities, facts, or data fields from unstructured text. If you ask a model to extract all dates and dollar amounts from a contract, a schema like {'dates': [string], 'amounts': [number]} gives you clean data in one step. Without a schema, you get prose that requires another round of parsing. The tradeoff: response schemas constrain the output space. Some nuance can be lost if the schema is too rigid. For genuinely open-ended conversation, natural-language output is correct. For machine integration, structured output is almost always correct.

Always Add a Reason Field

When using structured outputs for classification or decision-making, include a reason or explanation field in your schema. A model that classifies a loan application as REJECT but also outputs 'Primary income below required threshold, debt-to-income ratio 0.62' gives you auditable, human-readable justification alongside the machine-readable decision. This is a best practice for any system where decisions affect people.

Complete the key statements about structured output mechanics.

Structured outputs constrain a model to produce that conforms to a . The technique that makes this a hard guarantee by masking invalid tokens during generation is called .

A developer instructs their model in the system prompt: 'Always respond in valid JSON.' They later find that 3% of responses contain prose wrapped around a JSON block, breaking their parser. What is the correct fix?

You are building a pipeline that extracts key information from medical reports and passes it to a billing system. Which approach is most appropriate?

Design a Response Schema

  1. You are building an AI system that reads customer support tickets and produces a structured triage record for a support team dashboard.
  2. Step 1: Identify at least 5 fields that would be useful to extract from a support ticket (consider: urgency level, issue category, sentiment, customer tier, recommended action, summary).
  3. Step 2: For each field, specify the JSON type (string, number, boolean, array of strings) and — where appropriate — an enum of allowed values.
  4. Step 3: Write out your schema as a JSON object with a 'properties' section and a 'required' list.
  5. Step 4: Write one example ticket and the structured output your schema would produce for it.
  6. Step 5: Review your schema critically: are any fields too rigid (an enum that is too narrow)? Too loose (a string that should be a number)? Make two improvements and explain why.
  7. Goal: develop intuition for the tradeoffs between schema rigidity (reliability) and flexibility (expressiveness).