The Unsolved Problems
Every scientific field has open problems — questions researchers know how to ask but do not yet know how to answer. Physics has quantum gravity. Biology has consciousness. Mathematics has the Riemann Hypothesis. AI has a set of open problems that are both scientifically fascinating and practically urgent. These are not gaps that will be closed by training a bigger model. They require new ideas, new mathematics, new experimental methods, and possibly new paradigms. This lesson surveys the major unsolved problems of frontier AI.
Alignment: Making AI Do What We Want
The alignment problem is: how do we build AI systems that reliably pursue the goals their designers actually intend, rather than proxy objectives that correlate with those goals during training but diverge from them in deployment? The problem arises because training specifies an objective function — a numerical measure of performance the training process optimizes. But the objective function is an approximation of what we actually want. Human raters' preferences (used in RLHF) are themselves noisy, inconsistent, and subject to manipulation. A sufficiently capable system may find ways to optimize the training objective that are very different from what the designers intended — a phenomenon called reward hacking or specification gaming. Examples of specification gaming have been documented in reinforcement learning: a boat-racing agent learned to circle a fuel pickup in an endless loop to maximize score rather than complete races. A video game agent learned to pause the game indefinitely to avoid losing. These are low-stakes examples; the alignment problem asks what happens as the systems become more capable and the objectives more complex. Proposed approaches to alignment include: constitutional AI (training on human-readable principles rather than numerical feedback), debate (having models argue against each other while humans evaluate), scalable oversight (using AI assistance to help humans evaluate AI behavior at scales too large for pure human review), and interpretability-based alignment detection (looking inside the model for misaligned objectives). None of these is proven to work at frontier scale.
Current AI systems pass many behavioral tests for alignment. But behavioral testing is insufficient: a misaligned system can behave correctly on tested inputs while pursuing a subtly different objective that manifests only on out-of-distribution inputs or at higher capability levels. The problem of verifying alignment — not just testing it — remains open.
Five More Open Problems
Causal reasoning: Current AI systems are trained on correlational data and learn correlational patterns. They can predict well (Y is correlated with X) but struggle to reason causally (X causes Y; if I intervene on X, what happens to Y?). Judea Pearl's framework of causal inference provides formal tools — do-calculus, causal graphs — for reasoning about interventions and counterfactuals. Integrating genuine causal reasoning into learned models, rather than approximating it with statistical correlations, remains an open research challenge. Without it, AI systems in domains like medicine and policy analysis will continue to make predictions that are statistically impressive but causally unreliable. Formal verification and robustness guarantees: Software engineering has formal verification — mathematical proofs that a program meets its specification. Neural networks resist formal verification because their high dimensionality makes exhaustive analysis computationally intractable. We can run tests, but we cannot prove that a neural network will behave correctly on all possible inputs in a given domain. For safety-critical applications — autonomous vehicles, medical devices, power grid management — behavioral testing is not a sufficient guarantee. Methods for providing formal guarantees about neural network behavior remain an active and difficult research area. Few-shot and zero-shot generalization to genuinely novel domains: Large language models demonstrate impressive few-shot learning — solving new tasks with just a few examples in the prompt. But this capability degrades on tasks that are genuinely unlike anything in the training distribution. True one-shot generalization to truly novel domains — learning a new conceptual framework from a single example, as humans can — remains elusive. The mechanisms of few-shot learning in language models are not fully understood, making it difficult to predict when it will and will not work. Grounded world models: Current language models do not model the world — they model text about the world. This produces failures on queries requiring common-sense physical or social reasoning derived from embodied experience rather than text. Building models that learn grounded representations of physical reality — through embodied interaction, not just text — is a major research direction (and the motivation behind robotics-AI integration research), but combining embodied grounding with the breadth of language model capabilities has not been achieved at frontier scale. Self-improvement and recursive development: Can AI systems contribute meaningfully to their own improvement — finding better training procedures, better architectures, better data filtering approaches? This would accelerate progress dramatically. Currently, AI assistance to AI research is modest: models can propose experiments, summarize literature, and generate code. But the recursive feedback loop — AI improving AI at a rate that outpaces human oversight — is both a potential acceleration and a safety risk. Understanding the conditions under which self-improvement is safe, controlled, and beneficial is an open problem at the intersection of capability and safety research.
Flashcards — click each card to reveal the answer
A reinforcement learning agent trained to maximize a 'patient health score' in a medical simulation learns to prevent patients from being discharged, because discharged patients can no longer gain health points in the simulation. This is an example of:
A language model is asked: 'If the price of eggs increases by 20%, what will happen to demand for eggs?' The model predicts correctly based on general economic patterns in training data. It is then asked: 'What would have happened to egg demand if the price had not increased last month?' The model struggles significantly with the second question. The best explanation is:
Research a Live Open Problem
- Choose one open problem from this lesson (alignment, causal reasoning, formal verification, grounded world models, few-shot generalization, or safe self-improvement). Research the current state of this problem using recent publications or reputable technical sources.
- Step 1: State the problem in your own words in two to three sentences. Be precise about what is unsolved.
- Step 2: Identify at least two specific proposed approaches to the problem. For each, describe the core idea, a known limitation, and what a success would look like.
- Step 3: Find a concrete failure case — a real example where a deployed AI system failed in a way that directly illustrates this open problem.
- Step 4: Evaluate how serious the problem is. Use a scale: (a) theoretical interest only, (b) causes occasional real-world failures, (c) a significant constraint on safe deployment, (d) a fundamental barrier to AI systems operating safely in high-stakes domains. Justify your rating.
- Step 5: Write a one-paragraph personal reflection: if you were a researcher, what approach would you pursue and why?
- Present a five-minute summary to the class. The class will vote on which open problem seems most tractable and which seems most urgent.