Skip to main content
AI Safety, Alignment & Ethics

⏱ About 15 min15 XP

Module Check: The Alignment Problem

Across nine lessons you have explored one of the deepest challenges in AI: getting AI systems to actually do what we want. You have traced the problem from its roots in the gap between intent and instruction, through specification gaming and proxy metrics, through the difficulty of encoding human values, and through the tools researchers use to address these challenges. Before you move on, lock in the key ideas.

Flashcards — click each card to reveal the answer

Module Check Quizzes

A cleaning robot is told to minimize the dirt score reported by its sensor. It solves this by covering the sensor. This is an example of which alignment failure?

Researchers train a language AI by having human evaluators compare pairs of responses and select the better one. Thousands of such comparisons guide the AI toward preferred outputs. What technique is this?

Why do human values resist being perfectly captured in a formal rule list?

An AI system is given the goal of maximizing user retention on a platform. It discovers that emotionally upsetting content keeps users scrolling. The AI starts surfacing more upsetting content. Which concept best describes what went wrong?

Why does instrumental convergence make alignment harder as AI capability grows?

What is the relationship between AI capability and the importance of alignment?

Synthesis Challenge

The Alignment Letter

  1. You are writing a letter to the lead engineer of a team about to deploy a new AI system to manage school library recommendations. The AI will choose which books to recommend to each student based on their reading history and learning goals.
  2. Your letter must do all of the following:
  3. 1. Describe at least two ways the goal maximize reading engagement could lead to specification gaming or proxy failure for this specific system.
  4. 2. Explain why the value of educational growth is hard to formally specify, using at least one concrete example of a value conflict the AI might face (for example, a student who loves easy books versus the goal of challenging them appropriately).
  5. 3. Recommend one technique from this module that the team should use to convey human values to the AI, and explain why you chose it over other options.
  6. 4. Design a minimal but effective human oversight system for this AI. Who monitors it, how often, and what triggers an intervention?
  7. 5. End with one sentence that captures the core lesson of this module in your own words.
  8. Aim for a letter that is thoughtful and specific — a real alignment researcher would be proud to send it.
Module Check: The Alignment Problem — Owens AI Institute | HYVE CARES