Lesson 8 of 13

It predicts, it doesn't know

Explain that a model generates likely output from patterns, not looked-up facts.

01 · Learn · the idea

Type “The capital of France is” into a text box that finishes your sentences, and it offers “Paris”. It looks like the machine reached into a drawer, pulled out a fact, and handed it to you. It didn’t. There is no drawer. Back at item 2 you saw that a model computes a fresh answer instead of looking one up. A model that writes text is the same idea, pointed at words: it never retrieved “Paris” — it predicted it. This item is where that difference stops being a technicality and starts to matter for everything the machine says.

A machine that guesses the next word

Strip a text-writing model down to its one move and here is what’s left: given the words so far, produce the single most likely next word. That’s it. Autocomplete on your phone does a small version — you type “see you”, it offers “tomorrow”. A large model does the same thing, trained not on your texts but on a mountain of writing, so its sense of “what usually comes next” is enormous.

The trick is that it does this over and over. It predicts a word, adds it to the end, and then treats the whole thing — original plus new word — as a fresh prompt and predicts again. Word by word, it grows a sentence, each step just picking the likeliest continuation of what’s there so far. A paragraph that reads like careful thought is a long chain of next-word guesses, each one blind to where the sentence is ultimately going.

There is no shelf of facts inside

Here is the part that trips everyone up. Nowhere inside the model is the sentence “The capital of France is Paris” stored as a fact you could go and read. It was never filed away. What the training left behind is a pattern: across billions of lines of text, the words “The capital of France is” were followed by “Paris” almost every single time. That pattern is baked into the model’s numbers. So when you type that phrase, “Paris” comes out — not because the machine knows a fact about France, but because “Paris” is the overwhelmingly likely next word.

The answer is right. But it’s right the way a well-worn path is right, not the way a lookup is right. The machine didn’t consult France. It followed the groove that all that text wore into it.

The same move, whether or not there’s a truth

Now the sharp edge. The predictor runs the exact same way no matter what you ask. It has one move — pick the likely next word — and it makes that move every single time. When there’s a strong true pattern behind your question, the likely word is the correct one, and it looks like knowledge. When there’s no fact behind your question at all, the machine doesn’t stop, shrug, or flag it. It still picks a likely-sounding word, just as smoothly, because likely is the only thing it can do.

A worked example: two prompts, one machine

Prompt one: “The capital of France is ___”

The predictor scores possible next words by how often each followed that phrase in its training text:

Paris — about 97%
France — about 1%
a — about 1%
something else — about 1%

It picks “Paris”. This looks exactly like a mind recalling a fact. It isn’t. It’s an overwhelming pattern collapsing onto one word.

Prompt two: “My neighbour’s favourite colour is ___”

There is no fact here for anyone to know. The machine has never met your neighbour. But it has seen plenty of sentences about favourite colours, so it scores the likely words anyway:

blue — about 35%
green — about 20%
red — about 18%
purple — about 12%
something else — about 15%

It confidently picks “blue”. Not because it’s true — it can’t be true, there’s no neighbour behind the words — but because “blue” is the most likely word to follow that phrase. Same machine. Same move. In the first case the likeliest word happened to match reality; in the second there was no reality to match, and the machine produced an answer of exactly the same shape and confidence.

That’s the whole point. From the outside, “Paris” and “blue” arrive looking identical — fluent, quick, sure of themselves. Inside, they came from one process that has no idea which of the two is grounded in anything. The machine cannot tell a fact from a good guess, because to the machine they aren’t different things. They’re both just likely next words.

On the whole

A text model does not know things. It predicts things. Right answers and wrong answers come out of the identical process — scoring the next word by how likely it is, given everything that came before. “Knowing” never enters the machine at any point; there’s no place for it to live.

Hold onto how unsettling that is, because it explains a great deal that’s coming. The fluency is real and the usefulness is real, but neither is evidence of understanding underneath. When the machine is right, it’s because the likely word matched the truth. When it’s wrong, the machine was doing the same thing it always does, just as calmly. We tend to hear confidence and assume knowledge sits behind it. With a next-word predictor, that assumption is the trap — and it’s ours, not the machine’s. Seeing the one move clearly is what lets you judge, sentence by sentence, whether there was ever a fact behind the fluent words at all.

02 · Try · the lab

03 · Check · quick quiz

1. You type "The capital of France is" into a text model and it writes "Paris". Which best describes what happened inside?

It looked up a stored fact that says the capital of France is Paris
It predicted "Paris" as the most likely next word, because that word almost always followed those words in its training text
It searched an encyclopedia and copied the matching entry
It reasoned about France's geography and concluded the answer

Answer

It predicted "Paris" as the most likely next word, because that word almost always followed those words in its training text — There is no filed-away fact inside. "Paris" comes out because, across the training text, it was the overwhelmingly likely word to follow that phrase. Right answer, but produced by prediction, not lookup.

2. You ask a text model "My neighbour's favourite colour is ___" and it confidently answers "blue". Why is it wrong to say it 'knew' the answer?

It should have answered 'green' instead — that's the real most-likely colour
It refused to answer, so nothing was known
There is no fact behind the words for anyone to know; it still picked a likely-sounding word, exactly as it does when a real fact exists
It knew the colour but chose to hide the true one

Answer

There is no fact behind the words for anyone to know; it still picked a likely-sounding word, exactly as it does when a real fact exists — The machine has never met your neighbour — there's nothing to know. It runs its one move anyway: pick a likely next word. The confident "blue" is a guess dressed as knowledge.

3. A friend says: 'When the model is right it's recalling a fact, and when it's wrong it's just guessing.' What's the flaw in that?

Right and wrong answers come out of the identical process — scoring the next word by likelihood; the machine never switches into a 'recall' mode
There's no flaw; recall and guessing really are two separate modes
The model only guesses when it's tired or overloaded
The model recalls facts for short questions and guesses for long ones

Answer

Right and wrong answers come out of the identical process — scoring the next word by likelihood; the machine never switches into a 'recall' mode — The predictor has one move and makes it every time. Correct answers happen when the likely word matches the truth; wrong answers happen when it doesn't. Same machinery — 'knowing' never enters it.

4. How does a text model build a whole sentence, rather than just one word?

It plans the full sentence first, then writes it out
It retrieves a complete matching sentence from storage
It predicts one likely next word, adds it, then treats the longer text as a fresh prompt and predicts again — repeating word by word
It writes the last word first and works backwards to the start

Answer

It predicts one likely next word, adds it, then treats the longer text as a fresh prompt and predicts again — repeating word by word — Each step is just 'what's the likeliest next word here?'. It adds that word and predicts again from the new, longer text. A whole paragraph is a chain of next-word guesses, each blind to where the sentence is finally headed.