Lesson 5 of 13

Neural networks: stacking simple parts

Explain how simple units in layers learn complex patterns.

01 · Learn · the idea

One neuron is one small decision

You already built a model that was a formula with adjustable numbers: price = size × A + B. It took an input, multiplied it by a weight, added a bias, and out came a number. Simple, and it drew a straight line.

An artificial neuron is that exact idea, plus one new part: a switch. Take the inputs, multiply each by its own weight, add them up, add a bias — and then feed the total through the switch. If the total clears a line, the neuron fires — it sends out a strong signal. If it falls short, the neuron stays quiet. That’s it. A weighted sum, then a yes-or-no.

The switch is the whole difference. price = size × A + B was a neuron with no switch — it just passed its total straight out. Add a switch and the unit stops giving a smooth number and starts making a decision: is this thing here or not?

A worked example: a corner detector

Let’s build one neuron whose job is to answer a tiny question: is there a corner here?

A corner is where a vertical edge and a horizontal edge meet. So we give our neuron two inputs:

x1 — is there a vertical edge here? (1 = yes, 0 = no)
x2 — is there a horizontal edge here? (1 = yes, 0 = no)

We set its weights and bias: w1 = 0.7, w2 = 0.7, bias = −1. The switch rule: fire if the total is above 0.

Now walk three cases through it.

Both edges present (vertical and horizontal): total = 0.7 × 1 + 0.7 × 1 − 1 = 1.4 − 1 = 0.4. That’s above 0 → the neuron fires → “a corner is here.” Correct — two edges meeting is exactly a corner.

Only a vertical edge (no horizontal): total = 0.7 × 1 + 0.7 × 0 − 1 = 0.7 − 1 = −0.3. Below 0 → the neuron stays quiet → “no corner.” Correct — a single edge running down the page is not a corner.

Neither edge present: total = 0.7 × 0 + 0.7 × 0 − 1 = −1. Well below 0 → quiet → “no corner.” Correct again.

Look at what happened. By nothing more than its two weights and its threshold, this one neuron learned the rule corner = vertical AND horizontal. Nobody wrote that rule in words. It’s baked into three numbers. Change the bias to −0.5 and the neuron would fire on a single edge too — a different, worse detector. The behaviour lives in the weights.

Stacking: layers on top of layers

One neuron makes one small decision. The power comes from lining them up.

Put several neurons side by side and you have a layer. Now take the outputs of that layer and feed them in as the inputs to the next layer of neurons. Then feed those outputs into the layer after that. That chain of layers is a neural network — nothing more than neurons whose outputs become other neurons’ inputs.

Depth is what makes this worth doing, because each layer builds on the one below. Picture the network looking at a photo:

Layer 1 neurons detect the simplest things — a short edge here, a light-to-dark boundary there.
The next layer takes those edges as inputs and combines them into corners and simple shapes — like our corner detector, but learned, not hand-set.
The next combines shapes into parts: an eye, a wheel, a leaf.
The last combines parts into whole objects: a face, a car, a tree.

No layer is told what to detect. The training loop from earlier — guess, check the error, nudge every weight downhill, repeat — tunes all the weights across all the layers at once. Useful features emerge on their own, because features that help lower the error survive, and ones that don’t get nudged away.

Why depth buys you a shape a line can’t

Here’s the punchline that connects back to your straight-line model.

A single straight-line model can only draw a straight boundary. Ask it to separate two groups and it puts down one clean cut. That’s fine when the groups sit on either side of a line. But real patterns curve and tangle — a cluster of one thing wrapped inside a ring of another. No straight line can wrap around a centre. Draw any line you like across a dartboard: it never separates the bullseye from the rim.

Stacking layers breaks that limit. The layers combine the inputs — “how far from the middle?”, “which corner?”, “how bright, and where?” — into new quantities the raw inputs never named. With those, the network can bend and carve a boundary that curves, closes, wraps. That’s the payoff of depth: not one straight cut, but a shape.

And it’s worth keeping the word “neuron” at arm’s length. It’s a loose analogy. This is a math pipeline — multiply, add, switch, pass along — not a brain, not a spark of understanding. It doesn’t know what a corner or a face is. It has weights that fire on the right patterns.

On the whole

A face-recogniser is a stack of switches. Each one asks a dumb yes-or-no question about weighted inputs. Alone, each is almost nothing. Layered, they turn edges into corners, corners into eyes, eyes into faces — and the only thing that made the useful ones useful was a blind loop nudging numbers to lower one error.

That’s the machine you’re standing inside every time something recognises you. Not a mind reading meaning, but layers of small decisions stacked until a shape appears. Seeing it as switches and weights, rather than as a knowing thing, is the honest picture — and it makes it easier to guess where such a system will be sharp and where it will be strangely, confidently wrong. We’re pattern-stacks too, in our own way; a little humility about both is the right place to stand.

02 · Try · the lab

03 · Check · quick quiz

1. A single artificial neuron has two inputs. It multiplies each input by a weight, adds them, adds a bias — and then does one more thing that a plain price = size × A + B formula does not. What is it?

It passes the total through a switch: fire if the total clears a threshold, stay quiet if not
It looks the answer up in a stored table of past inputs
It averages the two inputs to smooth out noise
It asks a nearby neuron what the answer should be

Answer

It passes the total through a switch: fire if the total clears a threshold, stay quiet if not — The neuron is the same weighted sum plus a bias, with a switch (an activation) on the end. If the total clears a threshold it fires; if not it stays quiet. That yes-or-no step is exactly what a plain straight-line formula lacks — the formula just passes its total straight out.

2. A corner detector neuron uses weights w1 = 0.7 (vertical edge) and w2 = 0.7 (horizontal edge) and bias = −1, firing when the total is above 0. A spot has a vertical edge but no horizontal edge. Does it fire?

Yes — any edge at all pushes it over the threshold
No — the total is 0.7 × 1 + 0.7 × 0 − 1 = −0.3, which is below 0, so it stays quiet
Yes — the total is 0.4, which is above 0
It cannot be decided without knowing the other neurons

Answer

No — the total is 0.7 × 1 + 0.7 × 0 − 1 = −0.3, which is below 0, so it stays quiet — 0.7 × 1 + 0.7 × 0 − 1 = 0.7 − 1 = −0.3. That is below the threshold of 0, so the neuron stays quiet: one edge alone is not a corner. Only when both edges are present (0.7 + 0.7 − 1 = 0.4, above 0) does it fire. The rule 'corner = vertical AND horizontal' lives entirely in those three numbers.

3. In a deep network looking at photos, the first layer detects short edges and light-to-dark boundaries. How do the later layers come to detect whole faces, when no one told any layer what to look for?

A programmer hand-writes the face rule into the final layer
The last layer downloads a library of face templates to match against
Each layer takes the layer below as its inputs, so edges get combined into shapes, shapes into parts, parts into faces — and the training loop tunes every weight until useful features emerge
Later layers are copies of the first layer, so they detect the same edges more times

Answer

Each layer takes the layer below as its inputs, so edges get combined into shapes, shapes into parts, parts into faces — and the training loop tunes every weight until useful features emerge — The outputs of one layer become the inputs of the next, so features build up: edges → corners and shapes → parts → whole objects. No layer is told its job. The guess-check-nudge loop from earlier tunes all the weights at once, and features that help lower the error survive while useless ones fade.

4. Two groups of dots form a target: one class clustered at the centre, the other class in a ring around it. A single straight-line model tops out near 55% accuracy, but adding a hidden layer reaches ~100%. Why?

The hidden layer secretly memorises each dot's correct label
Straight lines are slower, so they run out of time before finishing
The hidden layer adds more dots so the groups become easier to split
A straight line can only make one flat cut and can't wrap a centre, but stacked layers combine the inputs into a curved boundary that can close around the middle

Answer

A straight line can only make one flat cut and can't wrap a centre, but stacked layers combine the inputs into a curved boundary that can close around the middle — No straight line can separate a centre cluster from a ring around it — one cut always leaves part of the ring on the same side as the centre. Stacking layers lets the network combine the inputs into a new quantity, like distance from the middle, and carve a curved boundary that wraps the centre. That is the payoff of depth: a shape a line can't draw.