Skip to main content
← Back to Machine Learning & Deep Learning
High School Lab

Gradient Descent

Roll a model's error downhill step by step, and discover why the learning rate makes or breaks training.

Gradient Descent

Training a model means rolling its error downhill to the lowest point. Each step moves the weight against the slope. Pick a learning rate and find out: too small and you crawl, too large and you fly off the mountain.

weight →lowest loss

Weight (w)

8.00

Loss

26.00

Slope

10.00

Step 0 / 24Stepping downhill…

Learning rate

How does it actually work?

Every step applies one rule: w ← w − (learning rate × slope). The slope tells the model which way is uphill; subtracting it walks downhill toward lower loss. This is exactly how real neural networks train — just with millions of weights instead of one.

The learning rate is the make-or-break dial. At 0.05 progress is safe but painfully slow. At 0.5 the step size happens to land exactly on the minimum. At 1.1 each step overshoots farther than the last and the loss explodes — the model diverges. Finding a good learning rate is one of the central crafts of machine learning.