Classification vs Regression
Supervised learning splits into two major families depending on what kind of output you need. One family predicts a category — is this tumor malignant or benign? Which digit is handwritten here: 0 through 9? The other family predicts a number — how much will this house sell for? What will the temperature be tomorrow? Getting this distinction right is the first decision in any supervised learning project, because it determines which algorithms apply, which loss functions make sense, and how you evaluate success.
Classification: Predicting Discrete Categories
Classification is the supervised task where the target variable y belongs to a finite set of categories, called classes. Binary classification involves exactly two classes. The canonical encoding is y ∈ {0, 1}, where 0 is the negative class and 1 is the positive class. Examples: - Spam detection: spam (1) or not spam (0) - Medical diagnosis: disease present (1) or absent (0) - Fraud detection: fraudulent transaction (1) or legitimate (0) Multiclass classification involves three or more classes. y ∈ {0, 1, 2, ..., K-1} for K classes. Examples: - Handwritten digit recognition: y ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} - Plant species identification: y ∈ {species_1, species_2, ..., species_200} - Document topic classification: y ∈ {sports, politics, finance, science} A worked example: a classifier receives the feature vector x = [age=45, resting_heart_rate=92, BMI=31.2] and must predict y ∈ {low_risk, medium_risk, high_risk} for cardiovascular disease. It outputs one of the three categories — not a number on a spectrum.
The output of a classifier is a discrete category. Internally the model may produce probability scores for each class — for example, 72% chance of spam, 28% chance of legitimate — but the final prediction is the class with the highest probability. That internal probability vector is called a probability distribution over classes.
Evaluating classification performance requires metrics suited to discrete outputs. The most common are: Accuracy: proportion of predictions that match the true label. Accuracy = (correct predictions) / (total predictions). Intuitive, but misleading when classes are imbalanced — a model that always predicts 'not fraud' achieves 99.9% accuracy if only 0.1% of transactions are actually fraudulent. Precision and Recall: precision measures how often positive predictions are actually positive. Recall measures how many actual positives the model caught. In medical diagnosis, high recall is often critical — you would rather flag too many potential cases than miss real ones. F1 Score: the harmonic mean of precision and recall. F1 = 2 × (precision × recall) / (precision + recall). Balances both concerns into a single number.
Regression: Predicting Continuous Values
Regression is the supervised task where the target variable y is a real number — it can take any value along a continuous scale. Examples: - House price prediction: y might be $347,000 or $891,500 — any positive number. - Temperature forecasting: y = 23.7°C tomorrow afternoon. - Stock return prediction: y = -2.3% daily return. - Age estimation from bone density scan: y = 42 years. A worked example: a regression model receives x = [square_footage=1850, bedrooms=3, distance_to_city_km=12, year_built=1998] and must predict y = sale price in dollars. The model output is a single real number. Evaluating regression performance: Mean Absolute Error (MAE): average of |ŷᵢ - yᵢ| across all test examples. Interpretable — same units as the target. A house-price model with MAE = $18,000 is wrong by $18,000 on average. Mean Squared Error (MSE): average of (ŷᵢ - yᵢ)². Penalizes large errors more heavily because squaring amplifies them. MSE = (1/N) Σ (ŷᵢ - yᵢ)². Root Mean Squared Error (RMSE): square root of MSE, restoring units to the original scale. Most commonly reported for regression.
Some targets look like categories but have a meaningful order — for example, a customer satisfaction rating from 1 to 5. These ordinal targets can be treated as regression (predict the number) or classification (predict the level). The right choice depends on whether equal spacing between levels is a reasonable assumption. If 'bad to okay' is a bigger jump than 'okay to good,' classification may be more honest.
Flashcards — click each card to reveal the answer
A model is trained to predict the exact number of minutes a patient will wait in an emergency room. Which task type is this?
A fraud-detection model is applied to a dataset where 0.05% of transactions are fraudulent. The model predicts 'legitimate' for every single transaction and achieves 99.95% accuracy. What does this reveal about accuracy as a metric here?
Task-Type Audit
- Work in groups of three.
- For each scenario below, decide: classification or regression? If classification, binary or multiclass? Record your answers and your reasoning.
- 1. Predict whether a loan application will default.
- 2. Predict a student's final GPA at the end of the semester.
- 3. Identify which of 50 languages a text document is written in.
- 4. Predict tomorrow's rainfall in millimeters.
- 5. Classify an incoming support ticket as billing, technical, or general inquiry.
- 6. Estimate a person's age from a photograph.
- For one scenario your group finds genuinely ambiguous, write a one-paragraph argument for each possible task type and decide which you would choose and why. Present your ambiguous case to the class.