Reasoning About Uncertainty and Stakes
AI safety is a field of deep uncertainty. We do not know the precise probability that a specific AI system will fail in a harmful way, the exact timeline for transformative AI capabilities, or the likelihood that particular governance interventions will succeed. This uncertainty is sometimes used as an argument for inaction — if we do not know the probabilities, why act? This is the wrong lesson to draw. Deep uncertainty about high-stakes outcomes requires more careful reasoning, not less engagement. This lesson develops the tools for thinking clearly under uncertainty.
Expected Value and Its Limits
The expected value of an outcome is probability multiplied by magnitude: a 10% chance of a $100 harm has the same expected value as a 1% chance of a $1,000 harm. Expected value reasoning is the standard tool of risk analysis and works well in domains where probabilities can be estimated with reasonable confidence and outcomes are comparable in kind. In AI safety contexts, expected value reasoning faces challenges. First, we often cannot estimate probabilities with confidence. What is the probability that AI-driven power concentration leads to an irreversible authoritarian lock-in within 30 years? Experts hold views ranging from below 1% to above 20%. Using any single number as if it were precise would be misleading. This is a domain of Knightian uncertainty — uncertainty so deep that even the probability distribution itself is unknown. Second, some outcomes are not easily commensurable. The harm of a wrongful arrest caused by facial recognition is serious and quantifiable in some respects; the harm of civilizational-scale catastrophe is not on the same scale as a more serious version of the same kind of harm. Standard expected-value aggregation across incommensurable magnitudes is philosophically contested. Third, expected value reasoning can produce counterintuitive results when tail risks are involved. A small probability of catastrophic harm multiplied by an enormous magnitude can dominate a risk analysis, even if the probability is very hard to estimate precisely. Whether this means we should apply very heavy weight to worst-case scenarios — as some AI safety researchers argue — or whether it means we should be skeptical of arguments that rely heavily on multiplying tiny probabilities by enormous magnitudes is an active methodological debate.
Economist Frank Knight distinguished risk (unknown outcomes with known probabilities) from uncertainty (unknown outcomes with unknown probabilities). Much of AI safety analysis operates in the domain of Knightian uncertainty: we do not know the probabilities, and we may not even know how to estimate them reliably. Decision-making under Knightian uncertainty requires different tools than standard expected-value calculation.
Reversibility, Scale, and the Precautionary Principle
When outcomes are uncertain, two properties of potential harms should substantially influence how much precautionary attention they receive: reversibility and scale. Reversible harms allow for correction. If an AI system produces biased loan decisions that can be audited, reversed, and compensated, the harm — while real — is bounded and correctable. Irreversible harms do not allow for correction. If an AI system enables the development of a novel pathogen that causes a pandemic, no policy response after the fact undoes the harm. If AI-enabled power concentration produces a stable authoritarian equilibrium, reversing that equilibrium may be extremely difficult. Irreversibility justifies extra precautionary weight proportional to the probability of being in a state with no good options. Scale matters because harm that affects many people, or that affects the fundamental conditions of human flourishing for future generations, is categorically different in its moral significance from harm to a smaller number of people. This does not mean small-scale harms are not worth preventing — they are — but it does mean that high-scale risks warrant disproportionate attention relative to their raw probability. The precautionary principle holds that when an action risks harm to human welfare and there is scientific uncertainty about the risk, the burden of proof is on those taking the action to show it is not harmful, rather than on those opposing the action to show it is. The precautionary principle is not a blanket prohibition on new technology — it is a statement about where the burden of justification lies when we are uncertain and the potential harm is large. Applied to AI, it argues for robust testing and oversight before deploying AI systems in high-stakes domains, not necessarily for halting AI development.
Flashcards — click each card to reveal the answer
Cognitive Traps in Risk Reasoning
Human intuitions about risk are systematically biased in ways that can distort AI safety reasoning. Several cognitive traps deserve explicit attention. Availability bias: we judge probability by how easily examples come to mind. Dramatic, concrete AI risk scenarios — killer robots, rogue superintelligence — are memorable and therefore feel probable. Diffuse, statistical harms — systematic bias in credit scoring, gradual erosion of privacy — are hard to visualize and feel less probable even when they are more certain. Good risk analysis compensates by grounding probability estimates in evidence, not imaginative vividness. Scope insensitivity: humans do not intuitively distinguish between harms affecting 100 people, 10,000 people, and 1 million people with appropriate proportionality. The emotional response scales much less than the actual magnitude. This is dangerous for AI risk analysis, where the scale of potential harms can differ by orders of magnitude. Deliberate numerical discipline — counting actual people or dollars rather than relying on emotional response — partially compensates. Status quo bias: we systematically underweight the risks of the current situation relative to proposed changes. AI-driven change is evaluated relative to an imagined stable baseline, but the status quo also involves harms (preventable deaths from inadequate healthcare, economic inefficiency, poor educational access). Fair risk analysis evaluates AI risks against the risks of not deploying AI, not against a risk-free baseline. Overconfidence in complex systems: complex systems have emergent properties that are not predictable from their components. AI systems interacting with each other, with markets, and with human behavior produce emergent dynamics that resist confident prediction in either direction — both 'this will be fine' and 'this will be catastrophic' may be overconfident claims.
A policymaker says: 'The probability of transformative AI causing civilizational-scale harm within 30 years is unknown, so we should assume it is low and not prioritize governance investment.' Which error in risk reasoning does this most directly illustrate?
A student calculates that because the probability of a catastrophic AI failure is only 1% and affects 8 billion people, the expected harm is 80 million person-harms, which is larger than many near-term risks. A critic argues this calculation is unreliable. What is the most substantive criticism of this reasoning?
Precautionary Argument Analysis
- For each of the following AI deployment scenarios, write a structured risk analysis that applies the concepts from this lesson:
- Scenario A: An AI system that makes parole recommendations in the criminal justice system, trained on historical data.
- Scenario B: An AI system that assists in identifying potential pandemic pathogens from sequencing data for early-warning surveillance.
- Scenario C: An AI system that autonomously manages electricity grid load balancing across a region.
- For each scenario, address:
- 1. What are the potential harms? Rate each on reversibility (low/medium/high) and scale (individual/community/societal).
- 2. What is the degree of uncertainty about probability — is this more like risk (estimable probability) or Knightian uncertainty?
- 3. Apply the precautionary principle: where does the burden of justification lie? What would it take to satisfy that burden?
- 4. Identify one cognitive trap (availability bias, scope insensitivity, status quo bias, overconfidence) that might distort how people think about this scenario.
- 5. State your recommendation: deploy as-is, deploy with specific safeguards, or do not deploy — and justify it using the risk-reasoning framework from this lesson.
- Compare your conclusions with a partner. Where do you differ, and why?