Inputs, Outputs, and Examples
Before any machine learning can happen, someone has to define what problem is being solved. That definition comes down to three decisions: What information does the system receive? What should it produce? And where do the examples come from that teach it to do this? Getting these three things right is not a technical afterthought — it is the design of the entire learning task.
Inputs: What the Model Sees
The input is the raw information fed to the model at prediction time. Inputs come in many forms depending on the task. For an image classifier, the input is a grid of pixel values — numbers representing the color and brightness of each tiny square in the image. For a spam filter, the input might be a list of numbers representing how often certain words appear in the email. For a music recommendation system, the input might be a user's listening history represented as a sequence of song identifiers. In every case, the real-world information — a photo, an email, a playlist — must be converted into numbers before the model can process it. This conversion is called feature extraction or representation.
A feature is one measurable piece of information used as an input to a model. An email spam classifier might use features like: number of exclamation marks, presence of the word FREE, and whether the sender is in the address book.
Choosing the right features used to be the most important part of building an ML system — a field called feature engineering. Deep learning changed this: deep neural networks can often discover useful features on their own from raw data like pixels or audio waveforms, without a human designing them.
Outputs: What the Model Returns
The output is what the model produces in response to an input. There are two major categories. Classification: the model picks from a fixed set of categories. An image classifier outputs one of the categories it was trained on — cat, dog, car, person. A medical model outputs malignant or benign. Regression: the model predicts a continuous number. A house-price model outputs a dollar amount. A weather model might output a temperature. The output type determines the appropriate loss function and the form of the training labels. A classification system needs labeled categories; a regression system needs labeled numbers.
Classification outputs a category (which class does this belong to?). Regression outputs a number (what quantity does this predict?). Both are supervised learning tasks.
Complete the sentences about ML task framing.
Examples: Where Learning Comes From
Examples — also called training instances or data points — are the individual cases the model learns from. Each example is a pair: an input and the correct output label that goes with it. For an image classifier: one example might be [pixel values of a cat photo, label: cat]. For a price predictor: one example might be [square footage=1400, bedrooms=3, zip code=90210, label: $620,000]. The quality and diversity of examples matter enormously. A model trained only on sunny-day photos of cats will struggle with dark or blurry photos. A price predictor trained only on Manhattan apartments will give wild estimates for rural Vermont homes. The examples define what the model can and cannot learn.
If training examples are biased, mislabeled, or unrepresentative of the real world, the model learns the wrong patterns. Data quality is as important as algorithm quality — arguably more so.
Match each ML concept to its correct description.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
A model receives the area, age, and location of a house and predicts its sale price. What type of output does this model produce?
Why must real-world inputs like photos or audio be converted into numbers before a model can process them?
Frame Your Own Learning Task
- Step 1: Choose a real prediction problem — for example, predicting whether a student will enjoy a book, or whether a plant needs watering.
- Step 2: Define the inputs. List at least four features you would measure.
- Step 3: Define the output. Is it classification (which category?) or regression (which number?).
- Step 4: Describe three example data points — write out the input values and the correct label for each.
- Step 5: Identify one way your training examples might be unrepresentative and explain what wrong pattern the model might learn as a result.