The LLM Reasoning Core
At the center of every modern AI agent is a large language model — a neural network trained to predict and generate text. In the agent context this model performs a specific role: given a context window containing the agent's current state (its instructions, its memory, the results of recent tool calls, and the user's goal), the model reasons about what action to take next and produces a structured response that the orchestration loop can parse and execute. Understanding what the LLM core does well, and what it cannot do without help, is essential to understanding why all the other components exist.
What the LLM Core Provides
A large language model trained on broad internet-scale data brings three capabilities that are uniquely hard to replicate with traditional software. First, natural language understanding: the model can parse ambiguous, imprecise, and conversational instructions that no rule-based parser could handle. When a user says 'pull together everything we know about the Q3 anomaly and write it up for the exec team,' the model understands what 'everything we know' and 'write it up' mean in context. Second, in-context reasoning: given a description of a situation, a set of facts, and a goal, the model can reason through intermediate steps — often following a chain-of-thought — to arrive at a plan or decision without being explicitly programmed to do so for that specific situation. Third, structured output generation: modern LLMs can be reliably prompted to produce JSON, function-call specifications, code, or other structured formats that downstream systems can parse — this is what makes tool selection and parameter filling tractable.
When an agent 'calls a tool,' what actually happens is: the LLM generates a text string that names the tool and specifies its arguments in a structured format (often JSON). The orchestration loop parses that string and executes the real call. Tool use is not magic — it is structured text generation, and it can fail exactly as text generation can fail.
The Hard Limits of the LLM Core
The LLM core has four fundamental constraints that the rest of the agent architecture exists to address. The first is the context window: an LLM can only process the text that fits inside its context window (measured in tokens — roughly words or word-pieces). GPT-4 Turbo supports up to 128,000 tokens; Claude 3 Opus supports up to 200,000; but a complex multi-step task can easily generate more state than fits in any window. The memory system exists to solve this. The second constraint is knowledge cutoff: the model's parametric knowledge was frozen at training time. It does not know about events after its cutoff date, current prices, live weather, or your private data. Tools that retrieve live information exist to solve this. The third constraint is no persistent side-effects: a single forward pass generates text but cannot itself write to a database, send an email, or execute code. The tool layer provides these effector capabilities. The fourth constraint is stochastic output: LLM outputs are sampled from probability distributions and can vary run to run, meaning critical decisions may need verification, retry logic, or fallback heuristics in the orchestration loop.
Flashcards — click each card to reveal the answer
The relationship between the LLM core and the rest of the agent stack is therefore compensatory: each other component exists precisely because the LLM core alone cannot handle the corresponding class of challenge. The memory system extends effective context. The tool layer grants effector capability and live information access. The orchestration loop handles iteration and error recovery. The planner converts a fuzzy goal into a specific, executable task sequence the LLM can reason about one step at a time. Remove any of these components and the agent degrades in a predictable, analyzable way.
Match each LLM limitation to the agent component specifically designed to compensate for it.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
An agent is asked to draft a summary of all customer support tickets from the past week. The agent has access to a database tool. Its LLM core has a 128,000-token context window. There are 4,000 tickets averaging 500 tokens each, totaling 2 million tokens of raw text. What must the agent do?
A developer notices that their agent gives different answers to the identical question on different runs. This is most directly a consequence of which LLM core property?
Benchmarks measure the LLM core in isolation — answering questions, passing exams, completing coding challenges with no tools. An agent's real-world capability is determined by its full stack: model quality, tool design, memory architecture, and orchestration logic. A weaker model with a well-designed agent stack often outperforms a stronger model used naively.
Limitation Audit
- Choose any AI agent application you know about or can imagine — a coding assistant, a customer service agent, a research helper.
- Step 1: List the four LLM core limitations from this lesson: context window, knowledge cutoff, no persistent side-effects, stochastic output.
- Step 2: For each limitation, write one concrete sentence describing how that limitation would manifest as a visible problem for your chosen agent.
- Step 3: For each problem, write one sentence describing which component (memory, tool layer, orchestration loop, planner) would address it and how.
- Step 4: Are there any limitations that your chosen agent's use-case is especially sensitive to? Which ones matter most, and why?
- Goal: internalize that every agent design decision traces back to a specific LLM limitation.