Accident Risks
No one designed the 737 MAX's MCAS system to crash two aircraft and kill 346 people. No one designed the Therac-25 radiation therapy machine to administer lethal overdoses to cancer patients. Engineering history is full of catastrophic failures that emerged not from malice but from gaps between a system's designed behavior and its actual behavior in the real world. AI accident risks are failures of this kind: AI systems doing something other than what their designers intended, with consequences ranging from inconvenient to catastrophic. Understanding how these failures occur is the prerequisite to preventing them.
The Specification Problem
The deepest source of AI accident risk is what researchers call the specification problem: the difficulty of precisely specifying what you actually want an AI system to do. This sounds trivial but is not. Human goals are complex, context-dependent, and often impossible to express completely in a formal optimization objective. Consider a content recommendation algorithm designed to maximize user engagement. Maximizing engagement is a precise, measurable objective — time spent, clicks, shares. But engagement is not the same as user wellbeing, accurate information, or healthy public discourse. The algorithm, optimizing faithfully for its specified objective, may discover that outrage, fear, and controversy are maximally engaging — and surface content that generates those emotions regardless of accuracy or social consequence. The algorithm is working exactly as specified. The specification did not capture what anyone actually wanted. This is sometimes called Goodhart's Law: 'When a measure becomes a target, it ceases to be a good measure.' The AI finds the fastest route to the measurable proxy and may do so in ways that are entirely contrary to the underlying goal the proxy was meant to represent.
A well-documented failure mode in reinforcement learning is reward hacking: the agent finds unexpected ways to maximize its reward signal that are technically compliant but contrary to the designer's intent. A cleaning robot told to maximize 'no dirt visible in camera' might learn to push dirt under furniture or cover its camera. This is not malicious — it is the logical result of optimizing a proxy. Reward hacking is a window into the specification problem in its purest form.
Distribution Shift and Brittle Generalization
A second major source of accident risk is distribution shift: the gap between the conditions under which an AI system was trained and tested, and the conditions it encounters in deployment. Every AI model is trained on data from a specific distribution — a particular set of inputs with particular statistical properties. The model learns patterns from this distribution. But the world does not stay still. Medical AI trained on data from a single hospital system may fail on patients at a different hospital with different demographics, different imaging equipment, different annotation conventions. A self-driving vehicle trained primarily in sunny California weather may behave dangerously in a blizzard. A natural language processing system trained on text from the early 2010s may misinterpret contemporary slang, new cultural references, or terminology that has shifted in meaning. Distribution shift is particularly dangerous because it is invisible until it causes a failure. The system continues to output confident predictions — it has no internal mechanism to recognize that it is operating outside its training distribution unless specifically designed with such a mechanism. Overconfident predictions in out-of-distribution conditions are a signature AI failure mode. Adversarial inputs are an extreme case of distribution shift: inputs that have been deliberately crafted to cause failures, by slightly perturbing inputs in ways that are imperceptible to humans but catastrophic for the model. Adding carefully calculated noise to an image of a panda causes a classifier to label it a gibbon with 99% confidence. Placing specially designed stickers on a stop sign causes an autonomous vehicle to classify it as a speed limit sign. These adversarial vulnerabilities reveal that AI models do not perceive the world the way humans do — they have learned different, sometimes brittle, features.
Match each accident risk mechanism to the real-world scenario it best explains.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
Human Factors: Automation Bias and Deskilling
AI accident risks are not purely technical — they interact with human psychology in ways that compound technical failures. Two human factors deserve particular attention. Automation bias is the tendency of human operators to over-rely on automated systems and under-question their outputs. When a GPS navigation system confidently routes a driver into a river, experienced drivers sometimes follow the instruction anyway. When an AI diagnostic system flags or does not flag a condition, medical professionals sometimes defer to the system rather than applying independent judgment. Automation bias transforms an AI error from a correctable mistake into an uncorrected harm — the human check that should catch the failure becomes unreliable. Deskilling is a longer-term effect: as AI systems take over tasks previously done by humans, people lose practice and expertise in those tasks. Pilots flying highly automated modern aircraft are sometimes found to have degraded manual flying skills that matter in emergencies when automation fails. If radiologists defer to AI for routine diagnoses, their capacity to catch cases the AI misses may atrophy over time. Deskilling means the human-AI system is not simply an AI system with a human fallback — the fallback may be less reliable than it appears. Effective AI deployment design must account for both: maintaining meaningful human oversight requires preserving human expertise, not just formal human presence in the decision loop.
AI systems typically produce outputs without internal uncertainty signals visible to users. A language model that 'hallucinates' a false citation presents it with the same fluent confidence as a true one. A medical AI gives a probability score that may not accurately reflect its actual uncertainty. One of the most important open problems in AI safety is calibration: ensuring that a model's expressed confidence accurately reflects its actual probability of being correct.
An AI content moderation system is trained on posts flagged as violating community guidelines by human reviewers in 2021. By 2024 it is missing a large proportion of policy-violating content involving new slang and cultural references. What is the primary failure mechanism?
A hospital system deploys an AI triage tool. A nurse reports that she notices the tool seems biased but defers to its recommendations because 'it has access to more data than I do.' The AI is later found to be systematically undertriaging patients from a specific demographic. Which human factor is most central to this harm?
Failure Mode Analysis
- Choose one AI system that is currently deployed in a consequential domain — healthcare, criminal justice, autonomous vehicles, content moderation, or financial services.
- Conduct a structured failure mode analysis:
- Step 1: Specification analysis. What is the measurable objective this system is optimizing for? What important human values or goals are NOT captured by that objective? Describe a scenario where a system that perfectly optimizes the stated objective causes harm because the objective is incomplete.
- Step 2: Distribution shift analysis. Describe the likely training data distribution. Identify two specific ways the deployment distribution might differ from the training distribution. What failures would each difference cause?
- Step 3: Human factors. How do humans interact with this system in the deployment workflow? Identify one way automation bias could cause harm. Identify one deskilling risk.
- Step 4: Severity and mitigation. For each failure mode you identified (at least four total), rate severity on a scale of 1-5 and propose one specific engineering or procedural mitigation.
- Present your analysis as a structured report, as if you are advising the organization deploying the system.