Lesson 11 of 13
Confidence and the black box
Read a model's confidence correctly, and explain why its decisions are hard to explain.
01 · Learn · the idea
A classifier looks at a photo and prints one line: cat 91%. That number feels like a promise. Ninety-one out of a hundred — surely it’s almost certainly a cat, and surely the machine has some reason for saying so. Both feelings are wrong. The 91% is not a probability of truth, and there is no reason underneath it that anyone can put into a sentence. Two things worth understanding sit inside that one small number.
What a confidence number actually is
When the model shows “cat 91%”, it is not reporting the odds that the animal is really a cat. It is reporting how strongly the input lit up the cat pattern it built during training, compared to every other pattern it knows. It’s a match score, not a measurement of reality.
Here’s a detail that gives the game away. The scores across the options are forced to add up to 100%. Cat, dog, fox — the machine spreads a fixed hundred points across whatever labels it was trained on, and hands the most to the closest match. So it always names a most-likely option. Show it a photo of a shoe and it will still answer with a cat-or-dog-or-fox split summing to 100, because it has no box for “none of these” and no way to say “I’ve never seen anything like this.” The number is a ranking of its own patterns, dressed up to look like certainty.
And because the score is only a pattern match, it carries none of the caution a person would. Back in the item on confident errors, we saw fluency and truth come apart. This is the same split, wearing a percentage. A high score can sit squarely on a wrong answer. Ninety-one percent means strongly matches what I saw before — nothing more.
Why the number is brittle
If the score is a pattern match, then it moves whenever the input drifts away from the patterns the model was trained on. And it doesn’t drift gently. Blur the photo. Cover part of it. Sprinkle in specks of noise a person would barely register. Small nudges like these can swing the score hard, because the machine was never shown that exact roughed-up version and has no steady sense of the animal underneath — only the surface pattern, which you just disturbed.
A worked example. An image classifier is shown a clear photo of a cat. It outputs:
- cat 91% · dog 6% · fox 3% (they sum to 100)
It looks certain. Now add a light blur and a small patch over one ear — a change most people would shrug at. The very same model now outputs:
- dog 74% · cat 21% · fox 5%
Nothing about the animal changed. A tiny push outside its training patterns flipped a confident “91% cat” into a confident “74% dog.” The number stayed high; the answer became wrong. That is the whole warning: read confidence as “how strongly this matches what I saw before,” not “how sure I am of the truth.” A brittle score can be loud and still be lost.
The black box: no reason underneath
Now ask the obvious question. Why did you decide dog? And there is no plain answer to give.
The decision is the end of an enormous chain of tiny multiplications and additions — millions of tuned numbers, passed through layer after layer, as we saw when we built up the training loop and the layered net. No single number in there is “the reason.” There is no rule, no clause, no because. The output is simply what all those weighted sums happened to total on this particular input.
This is why people call it a black box. You can see the input. You can see the output. The middle is opaque — not because it’s hidden from you on purpose, but because there is no readable sentence in there to hide. Even the people who built it can open up the weights, stare at every one, and still not tell you, in words, why this photo got this answer. There are tools that can hint at which parts of an input mattered most — which pixels tugged the score toward “dog.” But a hint about what mattered is not the same as a clean reason you could read out and check. Usually, that reason simply isn’t available.
Why this matters
Put the two together and you have the shape of the risk. A confidence number is a brittle pattern-match score, not a promise of truth. And the decision behind it comes with no explanation anyone can give. For a low-stakes guess — sorting holiday snaps, suggesting the next word — that’s fine. Nobody gets hurt if “cat 91%” was really a small dog.
But the danger arrives when a confident score with no readable reason is handed to a decision that people take on trust: who gets the loan, which scan gets a closer look, whether a car brakes. There the missing “because” stops being a curiosity and becomes the whole problem. You can’t argue with a number that won’t explain itself, and you can’t easily catch it being confidently, brittlely wrong.
On the whole, the honest way to read a machine’s confidence is to shrink it back to what it is: a score saying this input resembles things it saw before, printed by a system that cannot tell you why. That’s a useful thing and a limited thing at the same time. Knowing the difference — trusting the loud number a little less, and asking for the reason a little more — is most of what it takes to use these machines without being fooled by them.
02 · Try · the lab
03 · Check · quick quiz
1. A classifier prints "cat 91%" for a photo. What does that 91% actually mean?
- There is a guaranteed 91% chance the animal is really a cat
- The input strongly matches the model's cat pattern, more than any other pattern it knows
- The model checked 91 out of 100 known cats and found a match
- The model is 91% finished analysing the image
Answer
The input strongly matches the model's cat pattern, more than any other pattern it knows — The number is a pattern-match score: how strongly the input lit up the cat pattern versus the others. It is not a promise about reality, and it is not understanding. The scores are made to add up to 100%, so the machine always names a most-likely option — even for an input that is really none of them.
2. A clear cat photo scores "cat 91%". You add a light blur and cover one ear — changes a person barely notices — and the same model now says "dog 74%". What does this show?
- The model improved and correctly spotted a dog it missed before
- The animal in the photo must have changed
- The score is a brittle pattern match: a small nudge outside its training patterns can flip the answer while keeping confidence high
- The model ran low on power and guessed randomly
Answer
The score is a brittle pattern match: a small nudge outside its training patterns can flip the answer while keeping confidence high — Nothing about the animal changed. Because the score only measures how well the input matches patterns it saw before, a small push outside those patterns can swing it hard — flipping a confident 91% cat into a confident, wrong 74% dog. A loud number can still be lost.
3. You ask the model, "Why did you decide dog?" What is the honest answer?
- It weighed the evidence and concluded the snout shape ruled out a cat
- There is no plain reason to give — the decision is the total of millions of tuned numbers across many layers, with no readable sentence underneath
- It followed a written rule that says covered ears mean dog
- It compared the photo to a stored database of dog pictures and found the closest one
Answer
There is no plain reason to give — the decision is the total of millions of tuned numbers across many layers, with no readable sentence underneath — The decision is the end of an enormous chain of weighted sums; no single number is 'the reason' and there is no rule or clause inside. Even the builders can inspect every weight and still not say, in words, why this input got this answer. That is what 'black box' means.
4. Given both ideas — confidence is a brittle score, and there is no readable reason — where is this combination most dangerous?
- Sorting your holiday photos into rough albums
- Suggesting the next word as you type a message
- A high-stakes decision taken on trust, like a loan or a medical scan, where you can't argue with a confident number that won't explain itself
- Any task, equally — the risk is the same everywhere
Answer
A high-stakes decision taken on trust, like a loan or a medical scan, where you can't argue with a confident number that won't explain itself — For low-stakes guesses, a confident-but-wrong answer harms no one. The danger arrives when a loud score with no readable 'because' drives a decision people act on — a loan, a scan, a brake — because you cannot catch it being confidently, brittlely wrong, and cannot argue with a number that won't explain itself.