Audit a System for Fairness
This lesson centers entirely on practice. You now have the conceptual and mathematical vocabulary to conduct a real fairness audit: formal fairness definitions, pipeline bias sources, the impossibility results, documented real-world cases, justice frameworks, and mitigation techniques. Today you will apply all of that knowledge to audit a hypothetical but realistic AI system, using a structured protocol, and produce a written audit report.
System Description: PredictiveScholar
PredictiveScholar is a machine learning system used by a fictional state university system to make scholarship award decisions. It processes applications from incoming freshmen and produces a recommendation: Award (scholarship offered), Review (case referred to a human committee), or Decline (scholarship not offered). The system was developed by an external vendor and licensed to the university system. The university does not have access to the model's weights or training code. It has access to: the input features used, the outputs for all applications processed in the past two years, the demographic data collected from applicants, and ground-truth scholarship success data (did the student who received a scholarship graduate within four years with good academic standing?) Input features used by the system: High school GPA (unweighted) SAT or ACT score (converted to a common scale) Number of Advanced Placement (AP) courses taken First-generation college student indicator (yes/no) Zip code of home address Distance from home to nearest four-year university (in miles) Number of extracurricular activities listed Essay length in words (not content) Application submitted early (yes/no) The university has collected two years of application data: 12,000 applications from Year 1 (used to train the vendor's model) and 8,000 applications from Year 2 (the most recent cycle, available for auditing). Protected attributes available in Year 2 data: race/ethnicity, gender, first-generation status, Pell Grant eligibility (a proxy for income).
The university cannot access the model's weights or training data from Year 1. All analysis must be based on the input-output relationships observed in Year 2 application data. This means the audit can characterize the system's behavior but cannot definitively explain why it behaves as it does. This is a limitation that must be acknowledged in the final report.
Year 2 Data Summary (8,000 applications): Award rates by race/ethnicity: White applicants (3,200): 42% Award, 35% Review, 23% Decline Hispanic applicants (2,100): 31% Award, 38% Review, 31% Decline Black applicants (1,400): 28% Award, 33% Review, 39% Decline Asian applicants (900): 48% Award, 31% Review, 21% Decline Other/Unknown (400): 35% Award, 35% Review, 30% Decline Award rates by gender: Male applicants (3,900): 39% Award, 34% Review, 27% Decline Female applicants (3,800): 37% Award, 35% Review, 28% Decline Non-binary/Other (300): 33% Award, 36% Review, 31% Decline Among students who received scholarships (Award) and completed the four-year scholarship term: White scholarship recipients: 84% successful completion Hispanic scholarship recipients: 81% successful completion Black scholarship recipients: 82% successful completion Asian scholarship recipients: 87% successful completion Among applicants who received Decline and did enroll (on their own funding): Of White declined applicants who enrolled: 79% graduated in four years Of Hispanic declined applicants who enrolled: 75% graduated in four years Of Black declined applicants who enrolled: 76% graduated in four years Of Asian declined applicants who enrolled: 82% graduated in four years Suspected proxy variables: Zip code correlates with race (R-squared = 0.41 in the Year 2 data). AP course count correlates with school district wealth (R-squared = 0.53).
Full Fairness Audit of PredictiveScholar
- Using the data provided above, conduct a structured fairness audit of PredictiveScholar. Your audit report should address each section below. Write in complete sentences. Present your conclusions clearly and acknowledge uncertainty where the data does not permit definitive conclusions.
- SECTION 1: Demographic Parity Assessment
- Compute the Award rate for each racial/ethnic group. Does the system satisfy demographic parity across racial groups? Show your calculation. Interpret the finding: if parity is violated, which groups are most disadvantaged, and by how much?
- SECTION 2: Predictive Parity Assessment
- The scholarship completion data shows the rate at which awarded students successfully completed the scholarship term (a proxy for the 'true positive' outcome — the student was genuinely scholarship-ready). Does the system satisfy predictive parity across racial groups? Compare the PPV for each group. Interpret: are there groups for whom the scholarship award means different things in terms of predicted success?
- SECTION 3: False Negative Rate Assessment
- Among declined applicants who nevertheless enrolled and graduated in four years, estimate the false negative rate by race (students who would have succeeded but were declined). Which group has the highest false negative rate? What does this mean for equal opportunity?
- SECTION 4: Proxy Variable Analysis
- Zip code and AP course count are both correlated with race. Explain specifically how each could function as a proxy for race in this system. For each proxy, describe what a student from a lower-resourced background might score differently on — not because of their academic potential, but because of their circumstances.
- SECTION 5: Impossibility Analysis
- You have now computed data relevant to both predictive parity and equal opportunity. Are these two criteria simultaneously satisfied in the PredictiveScholar data? Apply the impossibility theorem: is the pattern you observe consistent with the mathematical prediction that these criteria will conflict when base rates differ?
- SECTION 6: Findings and Recommendations
- Write three to five bullet-point findings, each supported by specific data from your analysis. For each finding, write one recommendation — either a technical mitigation, an institutional change, or both. At least one recommendation must be non-technical. End with a clear statement: based on your audit, should PredictiveScholar continue to be used for scholarship decisions without modification? If not, what specific changes are necessary before it is used again?
A professional audit report must be honest about what it establishes and what it does not. This audit is based on observational data from one application cycle; it cannot prove causation, cannot examine the model's internals, and cannot account for factors the data does not capture (for example, differences in the quality of applications not captured by the listed features). These limitations belong in the report, not as a way to minimize findings, but as part of rigorous, honest analysis.
In the PredictiveScholar audit, Black applicants receive Award decisions at 28% while White applicants receive them at 42%. A university administrator argues this gap is acceptable because scholarship completion rates are similar across racial groups (82% vs. 84%). How should an auditor respond?
The audit finds that zip code has R-squared = 0.41 correlation with race in the Year 2 data. An engineer proposes removing zip code from the model's features to address proxy discrimination. What should the auditor note in response?