Testing What It Learned
Imagine your teacher taught you all the spelling words for the week, then gave you a spelling test — but the test only had those EXACT same words on it. Would you really know if you learned spelling? Probably not — you might have just memorized the list. A REAL test uses words you have not seen yet. That way, your teacher knows you truly understand, not just remembered. Machines need the same kind of real test.
Training Data vs. Testing Data
When scientists train a machine, they are careful to split their examples into two separate groups: Training data — the examples the machine actually learns from. These are the photos, words, or numbers the machine studies during training. Testing data — a set of examples the machine has NEVER seen before, kept hidden until training is finished. When training is done, the testers show the machine the testing data to see how it does. If the machine does well on the testing data — examples it never practiced on — that is real evidence it learned the pattern and did not just memorize the answers.
Training data is what the machine learns from. Testing data is how we find out if it truly learned. Using new examples for the test is the only honest way to check.
Here is how this works in real life. A team of scientists trains a machine to recognize different types of clouds — fluffy cumulus clouds, wispy cirrus clouds, flat stratus clouds. For training, they use 1,000 cloud photos. But they hold back 200 photos and never show those to the machine during training. Those 200 become the test. After training, they show the machine the 200 hidden photos. The machine guesses the type for each one. The scientists count up how many it gets right. If the machine scores 95 out of 100 on the hidden photos — great! Its model works well on new clouds it has never seen. If it only scores 50 out of 100 — hmm. Back to training. Maybe more examples, or better-labeled ones. This is called evaluating the model. Evaluate means to judge how well something works.
Flashcards — click each card to reveal the answer
There is a sneaky mistake called overfitting. It happens when the machine does extremely well on the training examples — almost perfectly — but does badly on the testing examples. Overfitting means the machine memorized the specific training examples instead of truly learning the pattern. It is like a student who memorizes "c-a-t spells cat" but cannot spell any other three-letter word. Scientists watch out for overfitting by always checking both: how well the machine does on training data AND how well it does on fresh testing data.
The testing examples must stay hidden until training is completely done. If you accidentally show the machine its test examples during training, the test becomes meaningless — just like peeking at a quiz before you take it.
Why do scientists keep testing data hidden during training?
A machine scores 99% on training data but only 40% on testing data. What is likely happening?
Give Your Own Spelling Test
- Ask a family member or friend to teach you five words you have NEVER seen before — really unusual or made-up words are fine!
- Listen carefully but do NOT write them down.
- Now have them teach you five more words AND write those down.
- After ten minutes of doing something else, take a spelling test on ALL ten words.
- Which words did you do better on — the written ones or the listened-only ones?
- Talk about it: which group of words was more like 'training data' and which was more like 'testing data'?