Build-a-Network
Reading about neural networks and truly understanding them are two different things. This lesson puts a pencil in your hand. You will trace a complete tiny network — computing the forward pass, measuring the loss, understanding the backward nudge — using only arithmetic. No programming required. When you finish, you will have done by hand exactly what a training step in a real neural network does.
The Network We Will Trace
Here is the setup. We have a network with: 2 inputs: x1 and x2 1 hidden layer with 2 neurons: H1 and H2 (using ReLU activation) 1 output neuron: O (using sigmoid activation) The task: predict 1 if x1 + x2 is greater than 1, otherwise predict 0. We are training on one example: x1 = 0.6, x2 = 0.7. Correct answer = 1 (since 0.6 + 0.7 = 1.3 > 1). The initial weights (random, not yet trained): H1 weights: w11 = 0.4 (for x1), w12 = -0.3 (for x2) H2 weights: w21 = -0.5 (for x1), w22 = 0.8 (for x2) Output weights: wO1 = 0.6 (for H1), wO2 = 0.9 (for H2) All biases are 0 for simplicity. Your job: compute the forward pass, find the loss, understand what the gradient tells us, and determine whether the weights need to increase or decrease.
Professional ML engineers rarely compute forward passes manually. But tracing one completely is the best way to make the abstract concrete. Once you have done it by hand, every future diagram and paper will click into place. This lesson is a gift to your future understanding.
FORWARD PASS — trace it step by step: Hidden Neuron H1: Weighted sum = (0.6 × 0.4) + (0.7 × -0.3) = 0.24 + (-0.21) = 0.03 ReLU(0.03) = 0.03 (positive, unchanged) H1 activation = 0.03 Hidden Neuron H2: Weighted sum = (0.6 × -0.5) + (0.7 × 0.8) = -0.30 + 0.56 = 0.26 ReLU(0.26) = 0.26 (positive, unchanged) H2 activation = 0.26 Output Neuron O: Weighted sum = (0.03 × 0.6) + (0.26 × 0.9) = 0.018 + 0.234 = 0.252 Sigmoid(0.252) ≈ 0.563 Network prediction: 0.563 The correct answer was 1.0. The network predicted 0.563. Not catastrophically wrong — but wrong. Loss = (1.0 - 0.563)^2 = (0.437)^2 ≈ 0.191. The gradient tells us: to reduce the loss, the output weight wO2 (connecting H2 to O) should increase — H2 had a reasonable activation of 0.26 and the output was too low, so multiplying by a larger weight would push the output up toward 1. After one training step, wO2 might move from 0.9 to something like 0.918. The next forward pass would give a slightly higher output, slightly lower loss. Repeat millions of times: the network learns.
Reading the Results Like a Network Engineer
Notice several things from the trace above: H1's activation (0.03) is very small. That means its contribution to the output is tiny — it barely matters in this pass. Its weights (0.4 and -0.3) partially cancel each other. If the correct answer is consistently 1, backpropagation will push H1's weights to be more positive so H1 activates more strongly on future inputs. H2's activation (0.26) is much larger and carries most of the information to the output. The output weight wO2 = 0.9 amplifies H2's contribution. This neuron is doing more work in this example. The sigmoid output of 0.563 means the network is leaning toward class 1 but with only moderate confidence. After training, for inputs like (0.6, 0.7) where the answer is definitely 1, the network should output something like 0.93 or higher. One training step moves the weights by tiny amounts. But across thousands of examples and many passes through the dataset, the weights find values that make the output close to 1 whenever x1 + x2 > 1 and close to 0 otherwise. The math is simple; the scale is what makes it powerful.
Fill in the missing values from the worked example above.
If your hand-traced numbers differ from the worked example, check your order of operations: multiply first, then add. Also double-check signs on negative weights — it is easy to drop a minus sign. Getting clean arithmetic in a trace is a real skill.
In the worked example, why did H2 contribute more to the output than H1?
The loss for our example was approximately 0.191. What does this number mean?
Trace a New Example
- Step 1: Use the exact same network from this lesson. Same weights: H1(0.4, -0.3), H2(-0.5, 0.8), Output(0.6, 0.9).
- Step 2: Run a new input: x1 = 0.2, x2 = 0.1. The correct answer is 0 (since 0.2 + 0.1 = 0.3, which is less than 1).
- Step 3: Compute H1's weighted sum and ReLU activation.
- Step 4: Compute H2's weighted sum and ReLU activation.
- Step 5: Compute the output neuron's weighted sum and sigmoid output. (Sigmoid rule of thumb: sigmoid(0) = 0.5; values below 0 give values below 0.5.)
- Step 6: Compute the squared loss: (0 - prediction)^2.
- Step 7: Interpret: is the loss high or low? Is the network predicting in the right direction for this example?