Module Check: Coordinated Intelligence
You have studied why multi-agent systems exist and when to use them, how supervisors coordinate workers, how handoffs carry context cleanly, how agents communicate through structured protocols, how to reason about parallel and sequential execution, how to protect shared state from concurrency failures, how emergent behaviors arise and how to mitigate them, and how real orchestration frameworks encode architectural opinions. This Module Check tests whether those ideas have become part of how you think about AI systems — not just a list of terms you can define, but a set of analytical lenses you can apply to novel problems.
Flashcards — click each card to reveal the answer
Module Check Questions
A solo agent is tasked with analyzing the annual reports of 200 publicly traded companies and producing a synthesis report. Which two limitations of solo agents make this a strong candidate for multi-agent treatment?
In the supervisor-worker pattern, the supervisor receives back the following from a worker: {'status': 'partial', 'completed_items': 47, 'total_items': 100, 'error': 'TIMEOUT', 'retry_safe': true}. What should a well-designed supervisor do?
A multi-agent pipeline produces an output that is internally coherent and confidently stated, but is based on a false premise introduced by Agent 2 in a 7-agent chain. Every subsequent agent built on Agent 2's output without independently verifying it. Which architectural intervention would most directly prevent this failure mode in future runs?
A developer argues: 'For our multi-agent system, we should use synchronous messaging everywhere because it is simpler to reason about.' A second developer responds: 'But we have a stage where 500 tasks are dispatched simultaneously — with synchronous messaging, the supervisor blocks until each one returns before dispatching the next.' Who is correct, and why?
You are comparing LangGraph and CrewAI for a new project. The workflow is a well-defined five-step pipeline with clear dependencies, and you need to add checkpointing so long runs can be resumed after server restarts. Which framework better fits this requirement, and why?
An orchestration system has agents A, B, C, D, and E. B and C both depend on A. D depends on B. E depends on both C and D. What is the critical path?
Capstone Synthesis: Evaluate a Broken Orchestration
- Below is a description of a multi-agent system that was deployed to production and failed. Read the description carefully and apply what you have learned in this module to diagnose and redesign it.
- The System: A startup deployed an AI system to automate customer refund approvals. The system had four agents: (1) a Complaint Parser that read incoming customer complaints and extracted key facts; (2) a Policy Checker that looked up company refund policy; (3) a Decision Agent that decided whether to approve or deny; (4) an Email Sender that sent the decision to the customer. All agents ran in parallel from the moment a complaint arrived. Agents communicated by writing to and reading from a single shared JSON object with no locking. There were no error checks or human-in-the-loop steps. The system ran for 48 hours before being shut down.
- Problems observed in production: Customers received emails before their complaints had been fully parsed, sometimes receiving emails for a different customer's complaint. The Decision Agent sometimes made decisions based on a blank policy document because it read the shared object before the Policy Checker had written to it. Multiple customers received contradictory emails about the same complaint. The system approved $140,000 in refunds that should have been denied, and denied $22,000 in refunds that should have been approved. Some customers received their decision email before they had even finished submitting their complaint.
- Your task:
- Step 1: Identify every architectural failure in this system. Name the specific concept from this module that each failure violates.
- Step 2: Draw the corrected execution DAG for this system. Which agents must run sequentially and why? Which (if any) can run in parallel?
- Step 3: Redesign the shared state architecture. What coordination mechanism would prevent the race conditions observed?
- Step 4: Add the missing safety mechanisms. Where should human-in-the-loop checkpoints be? What circuit breaker thresholds would have caught the $140,000 over-approval before it reached that level?
- Step 5: Write a one-paragraph post-mortem summary in the style of an engineering incident report: what went wrong, why, and what architectural changes prevent recurrence.
- This synthesis activity covers every major concept in the module. Work through it methodically — each failure maps to at least one lesson.