Skip to main content
Machine Learning & Deep Learning

⏱ About 20 min20 XP

Module Check: Supervised Learning in Depth

This lesson is a comprehensive review of everything covered in Module 2. Supervised learning is not a collection of isolated algorithms — it is a coherent framework for learning from labeled examples. The flashcards below consolidate the vocabulary of that framework. The quizzes test both recall and the deeper reasoning that distinguishes a practitioner from a reader. The capstone activity asks you to design a complete supervised learning pipeline from scratch. Approach this check honestly: if a concept feels fuzzy, return to the lesson that introduced it before attempting the questions. The goal is genuine understanding you can apply to new problems — not performance on a test you will forget next week.

Flashcards — click each card to reveal the answer

A logistic regression model learns weights w₀=-4, w₁=1.2 for a binary classification task. For a new example with x₁=3.5, what is z and approximately what is P(y=1)?

You train a 1-nearest-neighbor classifier (k=1) on 1,000 examples. What training accuracy do you expect, and does this indicate the model will generalize well?

A Random Forest of 100 trees is trained. Each tree has variance σ²=0.16 and the trees have pairwise correlation ρ=0.5. Using the formula Var(ensemble) = ρσ² + (1-ρ)σ²/B, what is the ensemble variance?

A medical classifier for cancer screening is evaluated. It has 99% specificity (low false positive rate) but only 40% sensitivity (recall). For this domain, which failure mode is most serious and why?

What property distinguishes a decision tree's decision boundary from a logistic regression boundary?

You have a tabular dataset: N=80,000, D=45 features, regression target. No interpretability requirement, good hardware. Which algorithm should you try first after your linear regression baseline?

The Through-Line of This Module

Supervised learning is a framework, not a collection of tricks. Every lesson in this module addresses the same fundamental tension: a model must be complex enough to capture the true pattern in the data, but not so complex that it memorizes noise. That tension appears as the bias-variance trade-off in theory, as the train/test gap in practice, as the choice between interpretable and powerful algorithms in deployment, and as the design of regularization and ensembles in implementation. Keep that tension in mind and you will always know which questions to ask about a new model.

Design a Complete Supervised ML Pipeline

  1. This is a capstone challenge. Work individually, then share your design with the class.
  2. Scenario: A nonprofit organization provides microloans to small business owners in underserved communities. They want to predict whether a new applicant will repay a loan within 12 months (binary target: repaid = 1, defaulted = 0). Historical data: 4,200 labeled loans. Features include: applicant age, years in business, monthly revenue (self-reported), loan amount requested, city population, industry sector (categorical, 12 values), and prior loan history (number of prior loans, repayment rate on those loans). The organization is required by their funding partner to explain every rejection to the applicant.
  3. Step 1 — Task definition: Confirm the task type, target variable encoding, and evaluation metric. Justify your metric choice — what is the cost of a false positive (approving a defaulter) vs a false negative (rejecting a repayer)?
  4. Step 2 — Data preprocessing plan: Which features need scaling? Which need encoding? Are there potential data quality issues? Describe the preprocessing pipeline.
  5. Step 3 — Algorithm choice and justification: Apply the five-dimension framework. Name your algorithm. Explain why it fits this problem better than two alternatives.
  6. Step 4 — Training and validation strategy: With N=4,200, how will you split the data? What cross-validation strategy? How will you tune hyperparameters?
  7. Step 5 — Interpretability plan: For each loan rejection, what information will your model provide to the applicant? How will you extract and present that explanation?
  8. Step 6 — Production monitoring: Name two signals you would monitor after deployment to detect model degradation or data drift.
  9. Write your design as a one-page technical memo. Be precise and honest about limitations.