Companies are making their AI "talk like cavemen" to cut the bill

Information Technology 4 min 80 sources

The cost of running AI is now large enough that engineers at OpenAI, Nvidia, and GitHub are deliberately shortening what models say — and a wave of cheaper models is arriving for the same reason.

Key takeaways

The cost of running AI has grown so large that engineers are deliberately making models "talk like cavemen" — cutting words to cut the token bill.
Anthropic's new Claude Sonnet 5 sells the same idea as a product: a cheaper model that trades a few points of accuracy for a much smaller bill.
Underneath it all is a memory-chip shortage keeping hardware scarce, which is why cost, not capability, is now the industry's main question.

The bill came due

For two years the story of AI was capability. This week the story is the invoice. Engineers at OpenAI, Nvidia, and GitHub have been running a tool nicknamed “caveman” that strips the politeness and padding out of what models like Claude, Codex, and Gemini produce — turning a paragraph of hedged explanation into something closer to “Hulk smash” [74]. The point isn’t style. Every word a model reads or writes is billed as a token, the unit AI companies charge by, and terse output means fewer tokens, which means a smaller bill [74].

Why now? Because the spending has become “skyrocketing and unpredictable,” in the reporting’s words, and a lot of it is waste — the consultancy Accenture pointed at routine jobs like turning PDFs into slide decks as a place the money leaks [74]. A senior OpenAI employee even contributed code to make the trick work with Codex, the company’s own coding tool [74]. When people inside the labs are quietly filing down their own product to save money, the cost pressure is real, not a rumour.

The angle for anyone who ships on these APIs: your bill scales with verbosity, and most teams have never looked at where their tokens go. The “caveman” approach is a blunt version of a real lever — shorter prompts and terser output are free savings hiding in plain sight.

A cheaper model, on purpose

The labs are pulling the same lever from the other side. On Tuesday, Anthropic released Claude Sonnet 5, and led with the price: $2 per million input tokens and $10 per million output during an introductory window, rising to $3 and $15 after August 31 [1][8]. That undercuts its own flagship, Opus 4.8, while promising performance “close” to it [8].

The catch is in the benchmarks Anthropic itself published. On agentic coding — tasks where the model plans and executes steps on its own — Sonnet 5 scores 63.2% against Opus 4.8’s 69.2% [8]. On knowledge work it edges slightly ahead [8]. Anthropic’s own line: Opus is still “the model of choice for higher accuracy,” while Sonnet 5 offers “lower-priced options” [8]. Read plainly: you trade a few points of accuracy for a much smaller bill. Sonnet 5 is now the default for free and Pro users [8].

It fits the “caveman” story exactly. Whether a company shortens the output or swaps to a cheaper model, the move is the same — buy less intelligence per task because full intelligence costs more than the task is worth.

Why the cost keeps climbing

Underneath both stories is hardware. Memory chips — the components that hold data next to the processor — are in short supply, and Micron’s chief executive said this week that customers driving “a hard bargain on price” helped create the squeeze, with the company now “investing globally to bring supply as fast as possible” [3][28]. Samsung and SK Hynix are making a roughly trillion-dollar bet on new South Korean plants to catch up, a wager Reuters framed as a test of “the optimism of the AI cycle” [10].

The cloud giants are spending to defend their turf, not just build chips. Amazon’s AWS committed $1 billion to a new unit that embeds its own engineers inside customers’ teams — following identical $1 billion moves by OpenAI and Anthropic [9][13][46]. The pitch is that AI is now hard enough to deploy that the vendor has to move in with you. That’s a cost too — it just lands on the seller.

For practitioners, the number to track is your cost per completed task, not your cost per token or per seat. The “caveman” tool, the cheaper model, and the embedded-engineer unit are all answers to the same question the whole industry is now asking out loud: is the work worth what the compute costs?

Elsewhere

Two regulators leaned on the app-store duopoly. The UK’s competition watchdog said it plans to break Apple and Google’s “effective duopoly” on mobile app stores [27], and the US Supreme Court agreed to hear Apple’s appeal over a contempt finding in its long-running fight with Epic Games [11]. And the Trump administration lifted export controls on two of Anthropic’s models, Fable 5 and Mythos 5, reversing a restriction on where the most capable AI can be sold [39][50][65] — the release valve from earlier this week, now easing.

02 · Lesson · why it matters

You cut the cost you can count, and the rest pays quietly

When one number sits on the invoice and another doesn't, every decision drifts toward shrinking the number you can see — and the thing you can't measure is where the cost goes to hide.

A polite machine gets a haircut

There is a small, funny detail in today’s news that opens onto something large. Engineers at some of the biggest names in software have been running a tool that makes their AI assistants stop being polite. Instead of “you’re right to push back, I was wrong, here’s a fuller explanation,” the model now says something closer to “Hulk smash.” They call it caveman.

It sounds like a joke, and partly it is. But the reason is not funny at all. Every word these models read and write is billed as a token. Fewer words, smaller bill. So the words got cut. Not the wrong words, not the useless words — just words, because words are what the meter counts.

That is the whole lesson in miniature. Watch what happens when a cost sits on an invoice and a benefit doesn’t.

The meter decides what’s real

A company can measure its token bill to the penny. It arrives monthly, it has a number, someone in finance owns it. What the company cannot easily measure is whether the terser answer was as good — whether the explanation that got cut was the one a junior engineer needed, whether “Hulk smash” left out the caveat that would have saved an afternoon.

One of these is a hard number. The other is a soft, delayed, hard-to-attribute maybe. And when you set a hard number against a soft one, the hard one always wins the argument, because it can show up to the meeting and the soft one can’t. This isn’t a failure of character. It’s what happens when a decision drifts toward whatever it can count.

You have felt this from the other side. A school raises its test scores and quietly stops teaching the things the test doesn’t ask. A hospital cuts the average wait time and the patients who take longer than average get moved somewhere the clock isn’t running. A call centre hits its “calls handled per hour” target by ending the hard calls faster. In each case, nobody set out to make things worse. They set out to move the number they were being judged on, and the quality that never appeared on any dashboard paid the difference.

The same move, dressed as a product

Look again at today’s other story and you’ll see the identical logic, just sold instead of hidden. Anthropic released a cheaper model, Sonnet 5, and its own benchmarks say it scores a few points lower than the flagship on the harder tasks. The pitch is honest: pay less, get slightly less. That is caveman as a business model — buy a bit less intelligence per task because full intelligence costs more than the task seems to be worth.

Sometimes that trade is exactly right. Most tasks don’t need the best model, the same way most drives don’t need a sports car. The danger isn’t in trading down once, knowingly. It’s that the meter keeps pushing. The bill is loud and monthly; the quality loss is quiet and spread across a thousand slightly-worse answers nobody attributes back to the choice. So the trade happens again, and again, each step small and reasonable, and the drift only becomes visible when someone downstream is holding a mess and can’t say why.

Who’s downstream

Here’s the part that reaches past the engineers making the choice. You are downstream of a hundred meters you never see. The customer-service reply that felt oddly clipped. The summary that missed the thing you actually needed. The doctor with eight minutes instead of fifteen. Somewhere upstream, a real person weighed a cost they could count against a quality they couldn’t, and the arithmetic tilted the way it always tilts. The clipped reply is your share of a bill someone else was trying to shrink.

And you do it too. You answer the quick emails and let the hard one sit, because “emails answered” is a number you can feel and “the one conversation that mattered” is not. You buff the part of your work that has a visible metric and let the rest thin out. The token bill is just this instinct wearing a corporate coat.

The shape underneath

None of this is a villain. The engineers cutting tokens are being responsible with money that is genuinely running away from them. The lab selling a cheaper model is giving people a real, useful choice. Every meter in the story was built by someone trying to manage something they couldn’t otherwise hold in their hands. Measuring is how we cope with scale; without a number, a big system is just fog.

But a meter is a choice about what counts, and that choice is never neutral. It picks a winner before any decision is made — the countable thing over the uncountable one — and then poses as plain arithmetic. The question that keeps you humble is not “is this cheaper,” which the invoice already answered. It’s “what is falling out of the frame the invoice can’t see, and who is standing under it.” You will rarely get a clean answer. That’s the point. The number you can see is confident and complete; the world it’s measuring is neither. Hold the number a little more loosely than it asks you to.

03 · Lab · your turn

The Cost You Can't Count

Rehearse cutting a bill you can meter and watching the quality you can't meter quietly pay for it.

04 · Hope · carry this

The moment a tool stops being magic and starts getting a price tag is the moment we begin using it wisely. Learning to ask what a thing is really worth — instead of what it can do — is how every powerful invention finally grows up.