Neurons, Weights, and Biases
In the previous lesson you saw that a perceptron applies a threshold to a weighted sum of its inputs. Modern neural networks refine that picture slightly — they replace the hard threshold with a smooth activation function and add a bias term to the weighted sum. These may sound like minor tweaks, but the bias especially changes what a neuron can express. This lesson focuses entirely on one neuron's arithmetic: what it computes, what its parameters mean, and why the bias term matters. Nail this, and every subsequent lesson becomes cleaner.
The Neuron Equation
An artificial neuron receives n inputs x₁, x₂, …, xₙ and has n corresponding weights w₁, w₂, …, wₙ plus a single bias b. Its pre-activation value — sometimes called z, or the net input — is: z = w₁x₁ + w₂x₂ + … + wₙxₙ + b In compact notation using vectors: z = w · x + b, where the dot (·) denotes the dot product. The neuron then passes z through an activation function f to produce its output: a = f(z) You will study activation functions in the next lesson. For now, imagine f is the identity (f(z) = z) so that a = z — a plain weighted sum plus a bias. Concrete example. A neuron has three inputs: x₁ = 2.0, w₁ = 0.4 x₂ = –1.5, w₂ = 1.0 x₃ = 3.0, w₃ = –0.2 bias b = 0.5 Step 1 — multiply and sum: w₁x₁ = 0.4 × 2.0 = 0.80 w₂x₂ = 1.0 × (–1.5) = –1.50 w₃x₃ = –0.2 × 3.0 = –0.60 Sum of weighted inputs: 0.80 – 1.50 – 0.60 = –1.30 Step 2 — add bias: z = –1.30 + 0.50 = –0.80 The neuron's pre-activation value is –0.80. The activation function will then map this to the final output.
A weight wᵢ controls how much the neuron's output depends on input xᵢ. A large positive weight means the neuron is strongly activated when xᵢ is large and positive. A large negative weight means the neuron is suppressed by large positive values of xᵢ. A weight near zero means the neuron barely notices that input. During training, adjusting weights is exactly how the network 'learns' which inputs matter and in what direction.
The bias b is a learnable constant — it does not multiply any input; it simply shifts the neuron's pre-activation value up or down. Why does that matter? Consider a neuron with no bias and weights all initialized to zero. Its output is always exactly zero regardless of the input. The bias gives the neuron a 'resting activation' — a baseline value that the weighted inputs push above or below. More formally: without a bias, the neuron's decision boundary (the set of inputs that produce z = 0) always passes through the origin. With a bias, the decision boundary can be positioned anywhere. This extra degree of freedom makes networks more expressive and easier to train. Extending the example. Suppose you want the neuron above (z = –0.80 without bias) to output a positive pre-activation on the same inputs. Increasing b from 0.5 to 2.0 gives: z = –1.30 + 2.0 = 0.70 Same weights, same inputs — the bias alone shifted the output from negative to positive.
Complete the neuron equation.
Counting Parameters and Why It Matters
Each neuron in a layer is a separate set of parameters: n weights (one per input) plus one bias, for a total of n + 1 learnable values per neuron. If a layer has m neurons, each receiving n inputs, the layer has m × (n + 1) parameters. Example: a layer with 512 neurons, each receiving 784 inputs (a 28×28 image flattened to a vector), has: 512 × (784 + 1) = 512 × 785 = 401,920 parameters A typical deep learning model has millions or billions of such parameters. Training is the process of finding values for all of them such that the network performs well. This scale is why efficient hardware and careful optimization algorithms are essential — you cannot tune 10⁹ parameters by hand. Every parameter is a degree of freedom in the model. More parameters can represent more complex functions, but also risk overfitting — memorizing training data rather than learning generalizable patterns. This tension between capacity and generalization runs through all of machine learning.
It is tempting to think that a model with more parameters is 'smarter.' This is misleading. Parameters are numbers initialized roughly at random; they become meaningful only after training on data. A poorly designed or poorly trained large model can be dramatically outperformed by a smaller, well-designed one. Scale matters — but architecture and training quality matter just as much.
A neuron receives inputs x₁ = 1.0, x₂ = 2.0, with weights w₁ = –0.5, w₂ = 0.3, and bias b = 1.0. What is z?
What is the primary role of the bias term in a neuron?
Build a Neuron by Hand
- Choose your own inputs and weights for a three-input neuron.
- Write down your inputs x₁, x₂, x₃ (any numbers you like, mix of positive and negative).
- Choose weights w₁, w₂, w₃ and a bias b.
- Compute z step by step: multiply each weight by its input, sum the results, then add the bias. Show every multiplication and the running total.
- Now double the bias. Recompute z. How much did z change?
- Finally, set all weights to zero and see what happens to z. What does this reveal about the role of the bias when weights are zero?
- Write two sentences interpreting your results.