Skip to main content
AI Safety, Alignment & Ethics

⏱ About 20 min20 XP

Formal Fairness Definitions

The previous lesson surveyed where bias enters ML systems. This lesson asks a more precise question: once we want to measure fairness, what do we measure? Researchers have formalized multiple mathematical definitions of fairness. These definitions are not equivalent — each captures a different moral intuition about what equal treatment means. Understanding them precisely is prerequisite to evaluating any claim that a system is 'fair.'

Setting Up the Framework

We will use the following notation throughout. A binary classifier outputs prediction Y-hat (either 1 or 0) for each individual. The true outcome is Y (either 1 or 0). A protected attribute A divides the population into two groups: A=0 (the reference group) and A=1 (the protected group). For a loan decision, Y-hat=1 might mean 'loan approved,' Y=1 might mean 'would repay,' and A might encode gender or race. Four statistics about a classifier matter for fairness: True positive rate (TPR): P(Y-hat=1 | Y=1) — among those who truly merit a positive outcome, what fraction does the model classify positively? False positive rate (FPR): P(Y-hat=1 | Y=0) — among those who truly merit a negative outcome, what fraction does the model incorrectly classify positively? Positive predictive value (PPV): P(Y=1 | Y-hat=1) — among those the model classifies positively, what fraction truly merit it? Selection rate: P(Y-hat=1) — what fraction of all individuals does the model classify positively? Fairness definitions state conditions on how these statistics should relate across the groups A=0 and A=1.

Fairness Is Always a Comparison

Every formal fairness definition compares a statistical quantity across groups defined by a protected attribute. Fairness is not a property of a model in isolation — it is a property of a model's outputs in relation to a population structure. A model that is 'fair' according to demographic parity may be deeply unfair according to equalized odds, depending on the base rates in the population.

The Four Core Definitions

Demographic parity (also called statistical parity): The classifier satisfies demographic parity if the selection rate is equal across groups. Formally: P(Y-hat=1 | A=0) = P(Y-hat=1 | A=1). In the hiring context, this means men and women are hired at the same rate overall. Demographic parity makes no reference to the true outcome Y — it does not ask whether the selected individuals would actually succeed in the job. It is a procedural fairness criterion: equal treatment defined as equal selection rates. Equalized odds: The classifier satisfies equalized odds if both the true positive rate and the false positive rate are equal across groups. Formally: P(Y-hat=1 | Y=y, A=0) = P(Y-hat=1 | Y=y, A=1) for both y=0 and y=1. In the criminal risk context, this means qualified individuals (those who will not reoffend) face equal false positive rates, and unqualified individuals (those who will reoffend) are correctly flagged at equal true positive rates. Equalized odds is a substantive fairness criterion: it conditions on the true outcome, asking the model to make equally accurate decisions for both groups. Equal opportunity: A relaxation of equalized odds that requires only that the true positive rate be equal across groups: P(Y-hat=1 | Y=1, A=0) = P(Y-hat=1 | Y=1, A=1). This focuses on ensuring that people who genuinely qualify for a positive outcome (Y=1) are identified at equal rates across groups, without constraining the false positive rate. The intuition is that false positives are less harmful than false negatives in contexts where the negative outcome is a denial of opportunity. Predictive parity (also called calibration across groups): The classifier satisfies predictive parity if the positive predictive value is equal across groups. Formally: P(Y=1 | Y-hat=1, A=0) = P(Y=1 | Y-hat=1, A=1). This means that when the model says 'positive,' it is right at the same rate for both groups. In the loan context: if the model approves a loan, the probability that the person actually repays is the same regardless of group membership. Predictive parity is favored by those who argue that a fair model should mean the same thing (the same risk level) for everyone.

Flashcards — click each card to reveal the answer

What Each Definition Values

Each definition reflects a different moral priority. Demographic parity prioritizes equal representation in outcomes — it is the closest formal analog to affirmative action in the legal sense. It is appropriate when the underlying true outcome Y is itself influenced by historical discrimination (because conditioning on Y would perpetuate the discrimination) or when diversity in selection is intrinsically valued. Equalized odds prioritizes equal accuracy — the model should work equally well for all groups. It is appropriate when false positives and false negatives are both costly, and when it is important that the model not be more error-prone for any particular group. Criminal risk assessment is one context where this matters: a model that generates more false positives for one racial group means more innocent people from that group are treated as high-risk. Equal opportunity prioritizes access for qualified individuals — those who would succeed should be identified at equal rates. This makes intuitive sense in contexts where the main concern is equal access to opportunity (hiring, college admissions) and where false positives are relatively low-cost. Predictive parity prioritizes the interpretive consistency of the model's outputs — a score of '70% risk' should mean '70% risk' for everyone. This is the definition most naturally aligned with the model being used as an equal-information instrument. It is appropriate when decision-makers rely on the model's output to be calibrated.

A college admissions algorithm accepts 30% of applicants from Group A and 30% of applicants from Group B. Among applicants who would have graduated in four years had they been admitted, the algorithm accepts 60% from Group A and 40% from Group B. Which fairness criteria are satisfied and which are violated?

A loan approval model reports a 20% default rate among approved applicants from Group X and a 20% default rate among approved applicants from Group Y. The model approves 55% of Group X applicants and 40% of Group Y applicants. Which statements are true?

Compute Fairness Metrics by Hand

  1. A parole decision algorithm has been applied to 200 cases. The data below summarizes outcomes by group. Use this data to compute each fairness metric and assess which criteria the algorithm satisfies.
  2. Group A (100 individuals):
  3. True positives (predicted safe, actually did not reoffend): 35
  4. False positives (predicted safe, actually did reoffend): 15
  5. True negatives (predicted unsafe, actually did reoffend): 30
  6. False negatives (predicted unsafe, actually did not reoffend): 20
  7. Group B (100 individuals):
  8. True positives: 20
  9. False positives: 30
  10. True negatives: 40
  11. False negatives: 10
  12. Compute for each group:
  13. 1. Selection rate: (TP + FP) / total
  14. 2. True positive rate: TP / (TP + FN)
  15. 3. False positive rate: FP / (FP + TN)
  16. 4. Positive predictive value: TP / (TP + FP)
  17. Then determine: Does the algorithm satisfy demographic parity? Equalized odds? Equal opportunity? Predictive parity?
  18. Finally, answer: Which group suffers more from false positives (being held when they would not have reoffended)? What does this mean in human terms for members of that group?
Formal Fairness Definitions — Owens AI Institute | HYVE CARES