Skip to main content
Machine Learning & Deep Learning

⏱ About 20 min20 XP

Neurons, Weights, and Biases — The Math

The artificial neuron is the atom of every neural network. It looks simple: multiply some inputs by some numbers, add them together, add one more number, then pass the result through a function. But understanding this operation precisely — knowing exactly what each parameter does and why — is the foundation for understanding everything that comes later: the forward pass, loss, and backpropagation. In this lesson we develop the neuron mathematically, step by step, with real numbers.

The Weighted Sum

Suppose a neuron receives three inputs: x1 = 0.5, x2 = -1.2, x3 = 0.8. Each input has a corresponding weight: w1, w2, w3. The weights are learned parameters — numbers the network adjusts during training. The neuron first computes the weighted sum: z = w1*x1 + w2*x2 + w3*x3 Let us say w1 = 2.0, w2 = -0.5, w3 = 1.5. Then: z = (2.0)(0.5) + (-0.5)(-1.2) + (1.5)(0.8) = 1.0 + 0.6 + 1.2 = 2.8 The weight w_i controls how much input x_i influences the neuron. A large positive weight means 'if x_i is large and positive, drive z upward.' A large negative weight means 'if x_i is large and positive, drive z downward.' A weight near zero means 'x_i barely matters to this neuron.'

What a Weight Encodes

A weight is a learned relevance score. After training, a large-magnitude weight on an input means that input strongly influences the neuron's output. A near-zero weight means the neuron has learned to mostly ignore that input. The entire expressive power of the network lives in its weights.

Now add the bias term b. The full pre-activation value is: z = w1*x1 + w2*x2 + w3*x3 + b If b = -1.0, then z = 2.8 + (-1.0) = 1.8. The bias shifts the entire output up or down, independently of the inputs. Without a bias, the neuron's output is always zero when all inputs are zero (z = 0 no matter the weights). The bias lets the neuron fire (produce a large output) even when the inputs are zero, or stay silent even when inputs are nonzero. It gives the neuron a tunable baseline. In compact vector notation, if x = [x1, x2, x3]^T and w = [w1, w2, w3]^T, then: z = w^T x + b This is a dot product plus a scalar — the simplest possible linear function of the input.

From z to Activation: The Role of the Activation Function

The value z is called the pre-activation. It is a raw linear combination. By itself, a linear combination of linear combinations is still linear — no matter how many layers you stack, the whole network would collapse to a single linear function. This is why every neuron passes z through a nonlinear activation function sigma: a = sigma(z) The output a is called the neuron's activation. Common choices for sigma are ReLU, sigmoid, and tanh (covered in detail in Lesson 3). For now, imagine sigma is the ReLU function: sigma(z) = max(0, z). Then a = max(0, 1.8) = 1.8. To summarize the full neuron computation: 1. Compute weighted sum plus bias: z = w^T x + b 2. Apply activation: a = sigma(z) That is the entire forward computation of one neuron. The network is thousands or millions of these, chained together.

Match each term to its precise definition.

Terms

Weight (w_i)
Bias (b)
Pre-activation (z)
Activation (a)
Dot product (w^T x)

Definitions

Sum of element-wise products of two vectors; compact notation for the weighted sum
The raw weighted sum plus bias, before the activation function is applied
Learned scalar multiplying one input; controls that input's influence
Learned scalar added after the weighted sum; shifts the neuron's baseline activation
The neuron's output after z passes through the nonlinear activation function

Drag terms onto their definitions, or click a term then click a definition to match.

Without the Bias, the Model Is Constrained

If every neuron lacked a bias, every neuron's pre-activation would be zero when all inputs are zero. This forces the decision boundary to pass through the origin of the input space — a severe restriction. In practice, biases are always included precisely because they remove this constraint and allow the network to fit a much broader class of functions.

A neuron has inputs x = [1.0, 0.0, -2.0], weights w = [0.5, 3.0, -1.0], and bias b = 0.5. What is the pre-activation z?

Why would stacking many neurons without any activation function fail to gain expressive power?

Hand-Compute a Neuron

  1. Step 1: Write down a neuron with two inputs. Choose your own values: x1, x2, w1, w2, b. Make at least one weight negative.
  2. Step 2: Compute z = w1*x1 + w2*x2 + b by hand. Show every multiplication and addition.
  3. Step 3: Apply ReLU: a = max(0, z). What is the activation?
  4. Step 4: Now change the bias so that the activation changes from positive to zero (or zero to positive). How much did you have to change b?
  5. Step 5: Interpret: what does the bias adjustment tell you about what the bias controls?