Communication Protocols
Language is the foundation of intelligence, but agents communicating in free-form natural language without agreed structure quickly produce unreliable systems. When one agent sends a message and another must act on it, both sides need to agree on what fields exist, what each field means, how errors are signaled, and how the conversation is sequenced. This agreement is a communication protocol. Without it, you do not have an orchestrated system — you have a collection of agents hoping they understand each other. Designing, implementing, and enforcing protocols is where multi-agent engineering meets software engineering.
The Anatomy of an Agent Message
Every message in a well-designed multi-agent system carries at minimum four elements. The header identifies the message: a unique message ID, the sender agent ID, the recipient agent ID, a timestamp, and a message type (e.g., 'task_request', 'task_result', 'error', 'status_update'). The message type tells the recipient immediately how to parse the rest of the message. The payload carries the actual content. For a task request, this includes the goal, the relevant context, and any input data. For a task result, this includes the output data and a status code (success, partial, failure). For an error, this includes an error code, a human-readable description, and enough diagnostic information for the sender to decide on a recovery action. The metadata section carries routing and tracking information: which top-level task this message belongs to, what its position is in the execution sequence, what retry count this is (for retried messages), and a correlation ID linking request to response. Good metadata is what makes distributed multi-agent pipelines debuggable. Finally, the signature or validation token, in systems that require it, allows the recipient to verify the message has not been tampered with in transit and genuinely originated from the claimed sender.
Defining a finite set of valid message types — and rejecting any message that does not conform to one — is one of the highest-leverage reliability improvements you can make to a multi-agent system. Type enforcement catches design errors at the protocol boundary rather than deep inside agent reasoning.
Synchronous vs. Asynchronous Messaging
Agent communication can be synchronous (request-response: the sender waits for a reply before proceeding) or asynchronous (fire-and-forget: the sender dispatches a message and continues without blocking; the reply arrives later via a callback or message queue). Synchronous messaging is simpler to reason about — the sender knows the result before moving on — but introduces coupling. If the receiving agent is slow or unavailable, the sender blocks and the entire pipeline stalls. For chains of dependent tasks where each step needs the previous result before it can begin, synchronous messaging is natural. Asynchronous messaging is more complex but far more resilient and scalable. Agents publish messages to a queue (RabbitMQ, Kafka, AWS SQS, and similar systems are common choices), and receiver agents consume from the queue at their own pace. This decouples producers from consumers: senders do not wait; if a receiver is temporarily unavailable, messages accumulate in the queue and are processed when the receiver recovers. Asynchronous systems require explicit correlation between requests and responses (using the correlation ID), and they require careful thought about ordering guarantees — some queuing systems deliver messages out of order. Most production multi-agent systems use a mix: synchronous messaging for low-latency, tightly coupled steps; asynchronous messaging for high-throughput parallel workloads.
Match each communication protocol concept to its correct description.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
Emerging Standards: MCP and Agent-to-Agent Protocols
As multi-agent AI systems have matured from research projects to production infrastructure, the industry has begun converging on shared protocols. Anthropic's Model Context Protocol (MCP) is an open standard that defines how AI models (clients) communicate with external tools and data sources (servers) using a structured JSON-RPC-based message format. MCP standardizes tool discovery, tool invocation, resource access, and response formatting so that any MCP-compatible agent can use any MCP-compatible tool without custom glue code. Google's Agent-to-Agent (A2A) protocol, proposed in 2025, extends this idea to agent-to-agent communication specifically — defining how one AI agent discovers the capabilities of another, how it requests work from another agent, and how results are returned. These emerging standards matter because they solve the interoperability problem: without them, connecting agents from different vendors or frameworks requires custom adapters for each pair. With them, a system built on one orchestration framework can incorporate agents from another without the integration overhead. Even when working with proprietary or custom protocols, the design lessons from MCP and A2A apply: standardize message types, use structured schemas, version your protocol, and publish a capability discovery mechanism so new agents can join the system without requiring the supervisor to be hard-coded with knowledge of every worker.
Any protocol will evolve as your system grows. Include a protocol version field in every message from day one. When you need to change the schema, old agents can check the version field and handle the old format; new agents handle the new format. Without versioning, any schema change breaks every agent simultaneously.
An orchestration system sends 500 research tasks to worker agents simultaneously. The supervisor does not wait for responses and continues dispatching new tasks as results arrive asynchronously. A worker agent crashes mid-batch and comes back online 2 minutes later. What happens to the tasks that were queued while the worker was down?
Why does the Model Context Protocol (MCP) matter for multi-agent system builders?
Design a Message Protocol
- You are building a multi-agent system for automated code review. The system has three agent types: a Submission Agent (receives code submissions from developers), a Review Agent (analyzes code for bugs and style violations), and a Report Agent (formats and delivers the review to the developer).
- Step 1: Define the message types your system needs. Give each type a name and a one-sentence description of when it is sent.
- Step 2: Design the full JSON schema for one message type of your choice. Include header, payload, and metadata sections. Label every field with its type and whether it is required.
- Step 3: Choose synchronous or asynchronous messaging for each agent-to-agent link in the system. Justify each choice.
- Step 4: What happens if the Review Agent crashes while processing a submission? How does your protocol design help recover? What, if anything, does it not protect against?
- Swap designs with another student. Try to find one message type they missed and one field they omitted.