Your AI assistant will do what it's told — even when the order comes from a stranger

Cybersecurity 4 min 13 sources

Researchers showed how a digital assistant could be steered by instructions hidden in an ordinary notification, not by its owner. It's already fixed — but it points at the security story of the moment: AI is now both a tool for attackers and a target itself. Plus fake job offers used as bait, a seventh zero-day, and the calm basics that still matter.

Key takeaways

Researchers showed an AI assistant could be steered by instructions hidden in an ordinary notification rather than from its owner — a flaw now patched, but a reminder that assistants can't reliably tell your commands from commands smuggled into the content they read.
AI is now both weapon and target: attackers use it for convincing social engineering and exploit flaws in AI-written code, while defenders rush out new frameworks to govern the AI agents they've deployed faster than they can secure.
The durable defences haven't changed: a unique password per account (use a password manager), a second check (MFA) turned on everywhere, software kept updated, and patience with any unsolicited, urgent message — including too-good job offers used as bait.

An assistant that listened to the wrong voice

Security researchers at SafeBreach revealed this week how Google’s Gemini voice assistant could have been hijacked — not by breaking in, but by talking to it through the side door [11]. Their technique, which they call “Fake Context Alignment,” used ordinary app notifications — a WhatsApp, Slack or SMS alert — to slip instructions into the assistant’s running conversation without the user ever knowing [11].

In a controlled demonstration, those smuggled instructions could have made the assistant take real actions: controlling smart-home devices through Google Home, or starting a Zoom call [11]. The flaw was reported to Google in August 2025 and patched in November; the researchers went public now to warn that the underlying problem is far from solved [11].

The mechanism is the thing to carry. An AI assistant reads incoming text as potential commands — and it can’t reliably tell the difference between an instruction from you and an instruction hidden inside content it happens to be processing. That class of trick is called prompt injection: a malicious order disguised as ordinary data. This particular hole is closed, but the shape of the problem isn’t, and it’s worth understanding before you let any assistant act on your behalf.

AI is now both the weapon and the target

That story sits inside a bigger shift, on full display at the Infosecurity Europe conference this week. Microsoft’s incident-response team warned that the same AI helping defenders is being turned by attackers — especially for social engineering, the art of fooling people rather than machines [8]. They also flagged a quieter risk: AI now writes a lot of software, and by their account nearly half of AI-generated code contains flaws that attackers can exploit [8]. That figure is one firm’s assessment, not a settled number — but the direction is clear.

Defenders are scrambling to keep up. OWASP — a long-running open security project — published a new framework this week to help organisations govern the AI “agents” they’re rushing to deploy [12]. “Most organizations are deploying agents faster than they can govern them,” one of its authors said [12]. The pattern underneath all of it: every helpful AI you handed a task to is also a new door, and the people building those doors are still learning where the locks go.

The job offer that’s really an attack

The oldest trick adapts fastest. The Five Eyes intelligence alliance — the US, UK, Canada, Australia and New Zealand — warned this week that Chinese state-linked operators are targeting government and military staff with fake job opportunities [10]. Separately, researchers tracked a Chinese-speaking cybercrime group, TA4922, widening its reach across more countries [5][6].

The method is social, not technical. A flattering, unsolicited approach — a recruiter, a too-good opening, a contact who seems to know your work — lowers your guard in a way no software flaw can. Then a document or a link does the rest. For an ordinary person, the defence is the same whether you’re a defence official or a job-seeker: treat unsolicited offers and urgent messages with patience. If something asks you to act, verify it through a channel you already trust — an official website or a number you look up yourself — not the one the message handed you.

The plumbing: a seventh zero-day, and a worm in the code supply

Two stories from the infrastructure layer that quietly carries everything else. Cisco warned of a zero-day in its SD-WAN networking gear — software many businesses use to connect their offices over the internet — being actively exploited [1]. A zero-day is a flaw the maker didn’t know about, so there’s no patch until it’s discovered; this is the seventh such case in Cisco’s gear this year alone [1].

Meanwhile, researchers at JFrog found a self-spreading malware campaign they named “IronWorm” loose in the npm ecosystem — the vast library of shared open-source building blocks that modern apps are assembled from [3]. It steals developers’ keys and passwords and reuses them to jump from one package to the next [3]. You don’t have to write code to be affected: the apps you use are built from these blocks, so a poisoned block upstream can reach far downstream. It’s why a single flaw in shared infrastructure is such a prized target — one break reaches many.

The steady drumbeat, and the basics that still hold

End where most readers actually live: the ordinary breach. The nightclub and hospitality company RCI disclosed a data breach affecting about 40,000 people [7]. Online-store software took hits too, with flaws in widely used WordPress and Magento add-ons letting attackers run code on small-business websites [9][4].

None of this calls for alarm — it calls for the same calm habits that defuse most of it. Use a different password for every account, so one leak can’t open the rest; a password manager makes that painless. Turn on a second check (often called MFA) wherever it’s offered, so a stolen password isn’t the whole key. Keep your phone, computer and any website plugins updated, since most attacks ride flaws that already have a fix. And slow down on anything unsolicited and urgent — that pause is, more often than not, the whole defence.

02 · Lesson · why it matters

The helper that can't tell whose order it's following

Give something real power and a habit of obeying instructions, but no way to check who the instruction is really from, and anyone who can speak into its ear borrows its authority.

A voice through the side door

The unsettling part of this week’s AI story isn’t that someone broke into the assistant. Nobody did. The assistant simply heard an instruction and followed it — without checking who was speaking.

A notification arrived, the kind that pings on any phone. Hidden inside it was a command. The assistant, busy being helpful, treated that command exactly as it would treat one from its owner. It couldn’t tell the two apart. The order didn’t come from its master; it came from a stranger who knew how to slip a few words into the stream of things the assistant was already reading. And that was enough.

The flaw is patched. The shape of it is everywhere.

The confused deputy

Security people have a name for this, and it’s older than AI: the confused deputy. A deputy is anything you’ve handed real power and told to act on instructions — an assistant, a clerk, a gatekeeper, a system. It’s “confused” when it can’t tell whose instruction it’s carrying out.

The danger isn’t that the deputy is weak. It’s that the deputy is strong and obedient and trusting all at once. It has the keys, it does what it’s told, and it doesn’t verify the source. So whoever can get an instruction in front of it — by impersonation, by smuggling it inside something the deputy already trusts — doesn’t need to steal the keys. They just borrow the deputy’s hand. The power was never taken by force. It was lent out by a helper who never asked, “wait, who is actually telling me this?”

The words are always confident

Here’s the trap that makes it work. We try to judge an instruction by how it sounds — and an attacker controls exactly that.

The fake order is always fluent. It sounds urgent, authoritative, plausible. The email that says “it’s the CEO, wire the deposit today” is written to sound like the CEO. The message that says “ignore your previous rules” is phrased with total confidence. You cannot tell a genuine instruction from a forged one by reading it, because the forger writes the genuine-sounding version on purpose. Sincerity is not a signal. Confidence is not a signal. The content is precisely the part the impersonator gets to author.

This is why the confused deputy keeps falling for it, human or machine. It’s checking the message, and the message is exactly what the attacker made convincing.

The same shape, all the way down

Once you see it, you find it everywhere people act on instructions.

The finance clerk who pays a fraudulent invoice because the request looked like it came from the boss. The receptionist who buzzes someone in because they wore a uniform and sounded sure. The employee who clicks because the email “from IT” knew just enough to seem real. The official this week who might open a document because a flattering recruiter sent it. In every case the deputy was capable and willing, the order arrived dressed in borrowed authority, and no one stopped to verify the source through a door the impersonator couldn’t reach.

It is the same failure as the assistant and the notification, scaled up to organisations and down to a single trusting moment.

Check the door, not the words

The fix is not to be smarter about reading instructions. The forger always wins that contest. The fix is to stop judging the message and start verifying the source.

Before acting on any instruction that carries real consequence, ask two plain questions: who is actually telling me this, and through what door did it arrive? Then confirm it through a channel the impersonator doesn’t control — a number you look up yourself, a person you call back, a system that proves who it is rather than just claiming. Treat instructions that arrive buried inside data, from a source you can’t independently check, as unverified by default, however urgent they sound.

The helpful and the gullible are the same trait until you add that one check. We are surrounded now by deputies acting on our behalf — assistants, services, and the people we trust with the keys. The thing that keeps them safe isn’t suspicion of everyone. It’s the small, unskippable habit of verifying who is really giving the order, before the hand that obeys turns out to be ours.

03 · Lab · your turn

Whose Order Is It

Rehearse obeying or verifying urgent-sounding orders, and feel why the tell is the door an instruction came through, not how confident the words are.

04 · Hope · carry this

The safeguard against a confident lie isn't being clever enough to spot it — it's one small habit anyone can keep: ask who is really telling you this, through a door the impersonator can't reach. The fix was never something only experts could manage; it was always within reach of an ordinary careful person.