Shared State and Coordination
Parallel agents do not operate in isolation. Most real-world multi-agent systems require agents to share information: a common task queue that multiple workers pull from, a shared document that several agents are collectively drafting, a global state object that tracks which parts of a long-running task are complete. Shared state is what makes parallel agents a team rather than a collection of individuals working in separate rooms. But shared state introduces coordination problems that do not exist when agents are fully independent. Understanding these problems and the patterns used to solve them is fundamental to building reliable orchestration systems.
The Core Problem: Concurrent Modification
The most fundamental shared-state problem is concurrent modification — what happens when two agents try to modify the same data at the same time. Imagine a shared task queue. Agent A reads the queue and sees task T42 is available. Before Agent A marks T42 as claimed, Agent B also reads the queue and also sees T42 as available. Both agents now begin working on T42 simultaneously. This is called a race condition, and the result is duplicate work, conflicting writes, and potentially corrupted state. This is not a hypothetical edge case. In any system where multiple agents share state without coordination, race conditions occur regularly under normal operating conditions. The frequency increases with the number of parallel agents and the rate of state changes. Production systems must be designed from the start with concurrency in mind — retrofitting coordination onto a concurrent system after the fact is significantly harder. Race conditions are particularly insidious in LLM-based agent systems because the agents themselves do not raise exceptions when they encounter inconsistent state. A human might notice that something seems wrong; an LLM agent will reason from whatever state it observes, even if that state is partially corrupted, and produce confident-sounding but incorrect output.
A traditional software process will often crash or throw an error when it reads inconsistent data. An LLM-based agent will reason from whatever state it observes — coherently, confidently, and incorrectly. This makes undetected concurrent modification especially dangerous in AI systems. Defensive data architecture is not optional.
Coordination Patterns
Software engineering has developed several patterns for coordinating concurrent access to shared state. Each involves a tradeoff between consistency and performance. Locking (mutual exclusion): Only one agent may hold a lock on a piece of state at a time. When Agent A wants to modify the task queue, it acquires a lock, makes its change, and releases the lock. Agent B must wait if the lock is held. Locking guarantees consistency but reduces parallelism — locked state is unavailable to other agents while it is held. Distributed locks (implemented via Redis, ZooKeeper, or similar systems) extend this to agent networks across multiple machines. Optimistic concurrency control: Agents read state freely but include a version number in their write. If the state's version number has changed since the agent last read it (meaning another agent modified it in between), the write is rejected and the agent must re-read and retry. This approach assumes conflicts are rare and avoids locking overhead in the common case. It is well-suited to agent systems where state reads are frequent but writes are occasional. Event sourcing: Rather than agents reading and writing a shared mutable state object, all state changes are recorded as an immutable log of events. Any agent can reconstruct current state by replaying the log. Conflicts are handled at the event level rather than the state level, and the full history of every change is preserved for auditing. Event sourcing adds complexity but dramatically improves debuggability in distributed systems. Immutable messages with idempotent processing: Agents never modify shared state directly. Instead, they produce immutable output messages that are stored. The coordination logic lives in how those messages are aggregated and consumed. Each agent operation is idempotent — running it twice produces the same result as running it once — so safe retry is possible without risk of double-writing.
Match each coordination pattern to the problem it is primarily designed to solve.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
Blackboard Architecture
A classical pattern for shared state in multi-agent systems is the blackboard architecture, originally developed in AI research in the 1970s and still highly relevant. In a blackboard system, a shared data structure — the blackboard — is the single source of truth for the current state of the problem being solved. Specialist agents monitor the blackboard, identify opportunities where their expertise applies to the current state, and write their contributions back to the blackboard. A control component coordinates which agent acts next. Modern multi-agent AI systems often implement a software equivalent: a shared context object (stored in a database, Redis, or in-memory store) that all agents can read and write with appropriate locking. The supervisor agent acts as the control component, deciding which worker agent should act next based on the current blackboard state. This architecture centralizes state visibility — any agent or the supervisor can inspect the complete current state of the task — and makes coordination explicit and auditable.
The most reliable coordination strategy is minimizing how much state must be shared. Design agents to be as self-contained as possible: receive all needed context in their input message, produce all needed results in their output message, and avoid reading from or writing to shared state except where genuinely necessary. Every shared state write is a potential coordination failure point.
Agent A reads from a shared task queue and sees task T77 is available. Before Agent A claims T77, Agent B reads the same queue and also sees T77 available. Both agents begin working on T77. What is this problem called and what is its direct consequence?
A team of agents is updating a shared research document. The system uses optimistic concurrency control with version numbers. Agent A reads the document at version 5 and prepares an edit. Meanwhile, Agent B also reads at version 5 and submits its edit first, advancing the document to version 6. When Agent A tries to submit, what happens?
Coordination Pattern Audit
- Consider the following multi-agent scenario: A team of 8 agents is collectively writing a technical report. Each agent is responsible for one section. All agents have read and write access to a single shared document object. There is no coordination mechanism.
- Step 1: List every coordination problem that could occur in this system. Be specific about what each agent might do that conflicts with another.
- Step 2: For each problem, propose the coordination pattern (locking, optimistic concurrency, event sourcing, immutable messages, or blackboard) that would best address it. Justify your choice.
- Step 3: Redesign the system architecture to minimize shared state. What can agents do in isolation before they need to touch shared state? How does this change the coordination problems?
- Step 4: The client adds a requirement: they want to see the full history of every change to the document, including which agent made it and when. Which coordination pattern makes this trivial to implement? Add it to your design.
- Write a one-paragraph 'coordination architecture statement' for your redesigned system.