Iterations and Improvement
One round of 'make a prediction, measure error, adjust weights' improves a model a tiny amount. A model trained for a single round would still be nearly useless. The secret ingredient is repetition — running that loop thousands or even millions of times. This lesson is about how those repetitions are organized, what they look like on a chart, and how to tell whether training is actually working.
Epochs and Batches
Training data is usually too large to process all at once, so it is split into smaller chunks called batches. A batch is a small group of training examples (often 32, 64, or 256 at a time). The model processes one batch, computes the loss, runs gradient descent, and updates its weights. Then it processes the next batch. An epoch is one complete pass through the entire training dataset — all batches processed once. After one epoch, every training example has been seen exactly once. Training usually runs for many epochs. A small image-recognition model might train for 20 epochs. A large language model might train for 1–3 epochs over a dataset so vast that even one pass takes weeks. Why batches instead of processing all data at once? Two reasons: (1) the full dataset often does not fit in memory, and (2) smaller batches provide noisier but more frequent weight updates, which can help escape stuck points in the loss landscape.
Batch: a small group of examples processed together before one weight update. Iteration: one batch processed = one weight update. Epoch: one full pass through all the training data = many iterations. If you have 1,000 examples and a batch size of 100, one epoch = 10 iterations.
Here is a worked example to make the numbers concrete. Suppose you have 5,000 training photos labeled cat or dog. You set batch size = 50. That means 5000 ÷ 50 = 100 iterations per epoch. If you train for 30 epochs, total iterations = 3,000. Each iteration: the model sees 50 photos, computes loss, and updates all weights once. After 3,000 updates the model has seen real signal from every photo 30 times.
Reading the Loss Curve
As training runs, practitioners plot the training loss after each epoch. This chart is called the loss curve (or learning curve). A healthy loss curve looks like a downward slope that gradually flattens — the model improves quickly at first, then squeezes out smaller gains. If the curve is flat from the start, the learning rate may be too small. If it bounces wildly up and down, the learning rate may be too large. If it drops steeply and then rises back up, the model has started overfitting (more on that in Lesson 6). Most training frameworks (like TensorFlow or PyTorch) print the loss after each epoch automatically. Reading loss curves is one of the most practical skills a machine learning practitioner has.
After enough epochs the training loss may keep falling while the model gets worse on new data — it has started memorizing instead of learning. This is overfitting. You will learn to spot and fix it in Lessons 6 and 7.
Prompt Challenge
Write a prompt to ask an AI assistant to explain the loss curve from a training run you describe.
Your prompt should…
- Describe what you want the AI to explain using the word 'loss curve'
- Specify the audience as a middle-school student
- Ask for a concrete example of what a healthy vs. unhealthy loss curve looks like
A dataset has 800 training examples and a batch size of 40. How many iterations (weight updates) happen in one epoch?
A loss curve that drops steeply for 10 epochs then suddenly rises back up is most likely showing what?
Sketch a Loss Curve
- Step 1: On graph paper or a blank page, draw axes: x-axis = Epoch (1–20), y-axis = Loss (start high, e.g., 2.0, and go down to near 0).
- Step 2: Draw a 'healthy' loss curve — falling quickly at first, then flattening around epoch 10–15.
- Step 3: On the same axes (use a different color or dashed line), draw an 'overfitting' curve — falling until about epoch 10, then slowly rising.
- Step 4: Add a third line for 'too large learning rate' — bouncing up and down wildly.
- Step 5: Label each curve and write one sentence explaining what a practitioner should do if they see each pattern.