An AI found 21 hidden flaws in software you already use — for about $1,000

Cybersecurity 5 min 53 sources

Cheap AI bug-hunting is flooding software makers with vulnerabilities faster than humans can fix them. The same week, OpenAI gave ChatGPT a "Lockdown Mode" to blunt a different AI risk — and a startup raised $23M to police what AI agents are allowed to touch.

Key takeaways

An AI agent found 21 long-hidden flaws in FFmpeg — software inside most apps that play video — for about $1,000; some had been latent for two decades.
AI is now on both sides of security: finding flaws cheaply, and being locked down (ChatGPT's new "Lockdown Mode") to stop attackers hijacking it to steal your data.
Finding flaws got cheap; fixing them didn't — so the bottleneck is now too few people to patch the flood, which is why defenders are shrinking what attackers can reach.

This week made the same point from three directions: artificial intelligence is now sitting on both sides of the security fight, and the people who have to clean up are struggling to keep pace.

A $1,000 scan, 21 hidden flaws

A security startup called depthfirst pointed an autonomous AI agent — software that scans code on its own — at FFmpeg, the free media library buried inside almost everything that plays video: browsers, phones, streaming apps, smart TVs. The agent read roughly 1.5 million lines of code and surfaced 21 previously unknown flaws, each with a working example proving the flaw is real [1].

The company says the whole run cost about $1,000 [1]. Some of these flaws had been sitting in the code for 15 to 20 years. One dates to 2003 and went unnoticed for 23 [1].

These are “zero-days” — a zero-day is a flaw the software’s makers don’t yet know about, so there’s no fix yet and anyone who finds it first has a clear run [1]. Most of the 21 are the kind of memory-handling mistake that, in the wrong spot, lets a booby-trapped video file take control of the program reading it [1]. Nine already have official tracking numbers — CVEs, the industry’s shared catalogue of named flaws — and the rest are being fixed [1].

depthfirst isn’t alone. Google’s own bug-hunting agent and a model from Anthropic have both pulled long-buried flaws out of FFmpeg in recent months, and another tool recently found a two-year-old flaw in Redis, a widely used database tool [1].

The same week, a record patch

Days earlier, Google shipped Chrome 149 with fixes for 429 security bugs — the most it has ever patched in a single release [1]. More than 100 were serious. The worst, valued by Google at $97,000, let a crafted web page break out of Chrome’s safety box — the “sandbox” that’s meant to keep a web page from touching the rest of your computer — and run code on the machine [1].

Google hasn’t blamed AI for the sheer count [1]. But it did recently overhaul how it pays outside bug-finders, after a flood of AI-written reports, now asking for one clear reproducer instead of the long write-ups AI tends to produce [1]. Different mechanism, same pressure: more flaws arriving, faster.

If you use Chrome: confirm it has updated to version 149 (check the menu under Help → About Google Chrome) [1]. For most people that happens automatically. The harder fixes are FFmpeg’s — and because it’s bundled invisibly inside so many apps, those updates arrive quietly through each app’s own updates over the coming weeks [1].

When the AI is the thing being attacked

The flip side showed up the same week. OpenAI began rolling out a “Lockdown Mode” for ChatGPT, aimed at people who handle sensitive information [2].

It defends against “prompt injection” — where an attacker hides instructions inside a web page or a file the AI reads, tricking the assistant into doing something the user never asked for [2]. The danger isn’t the trick itself but where it can lead: the hidden instruction tells the AI to quietly send your private data out to the attacker [2].

Lockdown Mode doesn’t stop the trick from happening. It closes the exits. It blocks live web browsing, file downloads, agent mode, and other features that could carry data off your device, accepting that you lose some convenience to gain that protection [2]. OpenAI is blunt that it isn’t a guarantee — new tricks may still get through [2]. It’s available on free and paid plans, off by default [2].

OpenAI also added a way to see every device currently logged into your account and kick out any you don’t recognise [2] — a setting worth checking on any account that offers it.

Policing what the AI is allowed to touch

The third signal was financial. Opal Security, a San Francisco startup, raised $23 million — $59 million in total to date — to manage who and what is allowed access inside a company [3].

Its pitch names the new problem directly: companies now run swarms of AI agents that need access to systems, and those agents request access faster and in greater numbers than any human team can review by hand [3]. Opal’s answer is to grant access only when needed, take it away when it goes stale, and pull a human in only for the risky calls [3].

That principle — give the least access needed, for the shortest time — isn’t new. What’s new is the scale forcing it: when AI agents become users too, hand-checking who can reach what stops working.

The thread

Finding flaws used to be slow, skilled, expensive work. AI made it cheap [1]. But fixing a flaw — triaging the report, writing the patch, shipping it, getting people to install it — costs about what it always did, and much of that work still falls to a thin layer of volunteers and human reviewers [1].

So the bottleneck moved. The constraint is no longer finding the problems; it’s having enough trustworthy hands to deal with them. That’s the quiet story under all three headlines — and it’s why “Lockdown Mode” and “just-in-time access” are showing up now: when you can’t fix everything fast enough, you shrink what an attacker can reach in the meantime.

02 · Lesson · why it matters

When finding gets cheap, the bottleneck moves to fixing

Make one half of a job a hundred times cheaper and you don't get a hundred times more done — you just expose how slow the other half always was.

The cost of finding a flaw just collapsed

For thirty years, finding a hidden flaw in software was slow, skilled work. A researcher had to know the code, guess where it might break, and prove it. That scarcity shaped everything around it.

This week an AI agent read 1.5 million lines of FFmpeg and surfaced 21 real flaws for about a thousand dollars. Some had been hiding for two decades. The skill that used to gate this work is now something a machine does on its own, cheaply, overnight.

That is a genuine shift. But notice what it did not change.

The other half didn’t get cheaper

Finding a flaw is one step. Then someone has to judge whether it’s serious, write a fix without breaking everything else, ship that fix, and get millions of people and machines to actually install it.

None of that got cheaper this week. Writing a careful patch still takes a careful human. Testing it still takes time. And much of FFmpeg — bundled invisibly inside browsers, phones, and appliances — is maintained by volunteers and a thin layer of human reviewers.

So now the flaws arrive faster than the fixes can. The work didn’t speed up. The imbalance just became visible.

The bottleneck was always the slow half

Here is the pattern worth carrying. Any task is a chain of steps. Speed up one step dramatically and the total doesn’t speed up to match — it speeds up only until it hits the next step, which now becomes the wall.

The slow step was always the wall. You just couldn’t see it, because an earlier step was hiding it. Make finding a hundred times cheaper, and “we don’t have enough people to fix things” stops being a quiet background fact and becomes the whole problem.

This is why Chrome could patch a record 429 bugs in one release and the security press still wrote it up as pressure, not triumph. More found is not more fixed. It’s more waiting in the queue.

Why the defenders changed tactics, not just speed

Watch what the rest of the week did about it. OpenAI didn’t try to make ChatGPT immune to every trick — that’s the fast-getting-faster fantasy. It shipped “Lockdown Mode,” which simply closes the exits an attacker could use to carry data out.

Opal Security raised $23 million to grant access only when needed and take it away when it goes stale. Again: not “stop every attack,” but “shrink what an attacker can reach.”

Both are the same move. When you can’t fix the flood fast enough, you stop trying to win the race on speed and instead reduce how much a single failure can cost you. You manage the bottleneck instead of pretending it isn’t there.

The same shape, far from software

This isn’t really about code. It’s about what happens whenever a new tool makes one part of a job nearly free.

A faster checkout line doesn’t clear a store if the bottleneck is one person stocking shelves. Cheaper diagnostic tests don’t cure more people if there aren’t enough doctors to act on the results. A team that can generate ideas in an afternoon still ships at the speed of whoever has to build them.

The mistake is to look at the part that got cheap and assume the whole thing sped up. The honest question is the opposite one: now that this step is nearly free, which slow step is it about to overwhelm — and have we staffed for that, or just admired the speed?

03 · Lab · your turn

The Triage Desk

Rehearse choosing which few flaws to patch when reports outnumber fixes — reachable-and-soon beats scary-on-paper.

04 · Hope · carry this

The same cheap tools finding flaws by the dozen can be turned on our own software first — and they already are, surfacing twenty-year-old bugs before anyone with bad intent gets to them. The defenders aren't only racing the flood; they've started handing out the same machine that makes it.