Skip to main content
Machine Learning & Deep Learning

⏱ About 20 min20 XP

Designing a Learning Setup

Three modules have introduced you to three fundamentally different ways a machine can learn: supervised learning (from labeled examples), unsupervised learning (from unlabeled structure), and reinforcement learning (from delayed reward). In practice, the most consequential skill is not implementing any single algorithm — it is correctly diagnosing which paradigm fits a problem, then constructing the learning setup precisely. This lesson is a practice session in that diagnostic process.

A Framework for Paradigm Selection

Work through these questions in order for any machine learning problem: Question 1: Do I have labeled outputs for my training data? - YES — consider supervised learning as the starting point. - NO — move to Question 2. Question 2: Is the goal to learn a decision-making process across time (sequential actions with delayed feedback)? - YES — consider reinforcement learning. - NO — consider unsupervised learning. Question 3 (for supervised): Is the output categorical (class label) or continuous (number)? This determines classification vs. regression. Question 4 (for unsupervised): What kind of structure am I looking for? - Groups of similar items → clustering. - Compressed representation → dimensionality reduction. - Rare unusual items → anomaly detection. Question 5 (for RL): Can I run the environment safely and cheaply enough to collect millions of episodes? If not, the approach may be impractical. Question 6 (for any paradigm): What data do I actually have? How much? How clean? The right algorithm for a problem with 10 million clean labeled examples is often wrong for the same problem with 500 noisy samples.

Paradigms Often Combine

Real systems frequently combine paradigms. A self-driving car uses supervised learning to detect objects from cameras, unsupervised clustering to identify rare edge-case scenarios for human review, and reinforcement learning (or imitation learning) for planning and control. The diagnostic framework above applies at each sub-problem level, not just to the overall system.

Worked case studies: Case A: Email spam detection. Data: 500,000 emails, each hand-labeled 'spam' or 'not spam' by human reviewers. Output: binary category. Diagnosis: supervised classification. Labels exist, output is categorical, single-step prediction. Setup: feature extraction (word frequencies, sender reputation, link count), train a classifier (logistic regression or gradient-boosted trees), evaluate on held-out labeled test set with precision and recall. Case B: Customer segmentation for a streaming service. Data: 2 million users, each described by watch history, search queries, and time-of-day usage. No behavioral categories have been defined. Output: groups of users with similar tastes. Diagnosis: unsupervised clustering. No labels exist; the goal is to discover structure, not predict a defined target. Setup: construct user feature vectors (fraction of time spent on each genre, average session length, etc.), standardize features, apply k-means with k selected via elbow method, inspect and name clusters, validate by seeing whether cluster membership predicts different response rates to marketing. Case C: Adaptive tutoring system. Data: student interaction logs — which problems were attempted, correct/incorrect, time spent. No labels for 'optimal teaching sequence.' Output: a sequence of decisions (which problem to assign next) that maximizes long-term learning outcomes. Diagnosis: reinforcement learning. Sequential decision-making across many time steps; the best next problem depends on the student's current knowledge state; reward (improvement on assessments) is delayed. Setup: define state (student's performance profile), action (next problem to assign), reward (score improvement on end-of-unit assessment), run in simulation or with a small pilot group, train an RL policy.

Case D: Detecting network intrusions in a corporate system. Data: 50 million network packet records per day, almost all normal. No labels for which packets belong to intrusions. Output: flag anomalous packets for human review. Diagnosis: unsupervised anomaly detection. The rarity of anomalies makes labeling impractical (most would-be labels are 'normal'); the goal is to find what deviates from normal. Setup: train an Isolation Forest or autoencoder on a period of clean traffic, set a threshold for anomaly scores, alert on flagged packets, tune threshold via precision-recall analysis on a small labeled validation set. Case E: Drug dosage optimization in an ICU. Data: historical patient vital signs and dosing records, with outcomes labeled (survived, deteriorated). Output: a dosing policy that maximizes patient survival and recovery metrics. Diagnosis: both supervised and RL are candidates. The historical labeled data supports a supervised approach: train a model to predict outcome given current state and action, then use offline RL (batch RL) to derive a policy from the predictions without requiring real-world exploration. Setup: offline (batch) RL — build a world model from historical data, train a policy against that model. Validate rigorously in simulation before any clinical deployment. A safety constraint: the policy is not allowed to recommend doses more than X% outside standard clinical ranges without human override. Key lesson from Case E: the existence of labels does not preclude RL — it enables safer RL. Data from the past can train both a predictive model and a policy.

Fill in the correct learning paradigm.

When labeled outputs exist for every training example, learning is the natural starting point, while discovering groups in unlabeled data calls for .

Constructing the Learning Setup — Beyond Paradigm Choice

Choosing the paradigm is step one. A complete learning setup also requires: Data specification: - How is data collected? (passive logs, sensors, human labeling, simulation) - How much is needed? (depends on dimensionality, model complexity, and noise level) - What preprocessing is required? (normalization, handling missing values, deduplication) Evaluation design: - Supervised: held-out test set, appropriate metrics (accuracy, F1, AUROC depending on class balance). - Clustering: silhouette score, downstream task performance, domain expert review. - RL: evaluation on a separate set of episodes under the learned policy, not the training environment. Deployment constraints: - Latency: does the model need to predict in real-time? (milliseconds vs. hours) - Interpretability: does the decision need to be explainable? (loan approval, medical diagnosis) - Update cadence: how often does the model need to be retrained as the world changes? Failure mode analysis: - What happens when the model is wrong? Are false positives or false negatives more costly? - Is the deployment environment different from the training distribution? (covariate shift) - Who is harmed if the model fails, and how?

The Baseline Principle

Before deploying any ML model, define a simple baseline — a rule-based system, a majority-class classifier, or a linear model. If your sophisticated algorithm does not substantially beat the baseline, the problem may not require ML at all, or the data is insufficient. Baselines prevent over-engineering and reveal whether complexity is paying off.

A hospital wants to predict which patients will be readmitted within 30 days of discharge, using 3 years of historical patient records that include readmission labels. Which paradigm is most appropriate?

A music streaming service wants to group its 100 million users into segments for targeted playlists, with no predefined genre categories. Which setup is most appropriate?

Design Three Learning Setups

  1. For each of the following problems, produce a complete learning setup specification:
  2. Problem 1: A city wants to forecast the number of ambulance calls per hour across 50 city zones, using 5 years of historical call records with timestamps, locations, and weather data.
  3. Problem 2: A cybersecurity firm has network logs from 1,000 client companies. No attacks have been labeled, but they want a system that flags traffic for human analysts to investigate.
  4. Problem 3: A game developer wants an AI opponent for a strategy game that learns to improve as it plays against human players, starting with no pre-programmed strategies.
  5. For each problem, write:
  6. a) The paradigm choice and why.
  7. b) The input features you would use.
  8. c) The output or objective.
  9. d) How you would evaluate success.
  10. e) One major risk or failure mode to watch for.