Even as OpenAI works to harden the Atlas AI browser against cyberattacks, the company has acknowledged that instant injection, a type of attack that manipulates AI agents into following malicious instructions often hidden in web pages or emails, is a threat that isn’t going away anytime soon — raising questions about how AI agents can operate on the open web.
“Web scams and social engineering are unlikely to ever be fully ‘solved,'” Oppany wrote Monday. Blog post It details how the firm is developing Atlas’s armor to counter unsophisticated attacks. The company admitted that the “agent mode” in ChatGPT Atlas “increases the level of security risk.”
OpenAI launched its ChatGPT Atlas browser in October, and security researchers rushed to publish their demo, which showed that it was possible to write a few words into Google Docs that were able to change the basic browser’s behavior. Brave on the same day Published a blog post Stating that indirect instant injection is a systematic challenge for AI-powered browsers, including the Comet of Trouble.
Openi is not alone in admitting that immediacy-based injections aren’t going away. Britain’s National Cyber Security Center warned earlier this month Quick injection attacks against generative AI applications “can never be fully mitigated,” leaving websites vulnerable to data breaches. The UK government agency advised cyber professionals to immediately mitigate the risk and impact of injections, rather than thinking that attacks can be “prevented”.
For Openai’s part, the company said: “We see instant injection as a long-term AI security challenge, and we will need to continually strengthen our defenses against it.”
The company’s response to this Sisyphean task? A proactive, rapid response cycle is showing early promise in discovering novel attack strategies internally before exploitation “in the wild,” the firm says.
That’s not all that different from what competitors like Anthropic and Google have said: that to fight against the constant threat of immediacy-based attacks, defenses must be layered and constantly stress-tested. Google’s recent workFor example, the focus is on architectural and policy-level control for agentic systems.
But where Openai is taking a different tack is with its “LLM-based automated attacker.” This attacker is essentially a bot that Openai trained, using reinforcement learning, to play the role of a hacker looking for ways to hide malicious instructions to an AI agent.
A bot can test an attack in simulation before using it for real, and the simulator shows how an AI target would think and what actions it would take if it saw an attack. The bot can then study this response, adapt the attack, and try again. This insight into an AI target’s internal reasoning is something that outsiders don’t have access to, so, in theory, Openei’s bot should be able to find flaws faster than a real-world attacker.
This is a common tactic in AI safety testing: create an agent to find edge cases and quickly test against them in simulation.
“Ours [reinforcement learning]”A trained attacker can enable an agent to execute sophisticated, long-range malicious workflows that span tens (or even hundreds) of steps,” Openai wrote.

In a demo (pictured above), OpenEye showed how its automated attacker slipped a malicious email into a user’s inbox. When the AI agent later scanned the inbox, it followed the instructions hidden in the email and sent a resignation message instead of replying out of office. But after the security update, “agent mode” was able to successfully detect the immediate injection attempt and flag the user, according to the company.
The company says that while instant injection is difficult to foolproof, it is leaning on extensive testing and rapid patch cycles to test its systems in real-world attacks before hardening them.
An OpenAI spokesperson declined to say whether Atlas’s security update has resulted in a measurable drop in successful injections, but said the firm is working with a third party to harden Atlas against injections immediately before launch.
Reinforcement learning is one way to permanently adapt an attacker’s behavior, but it’s only part of the picture, says Rami McCarthy, principal security researcher at cybersecurity firm Ways.
“A useful way to reason about vulnerability in AI systems is through autonomy from access,” McCarthy told TechCrunch.
“Agent browsers sit in a difficult part of the space: moderate autonomy with a lot of reach,” McCarthy said. “Many current recommendations reflect this trade-off. Limiting login access essentially reduces exposure, while requiring revisions to authentication requests.
Those are two of Openei’s recommendations to reduce its risk to users, and a spokeswoman said Atlas is also trained to get user confirmation before sending messages or making payments. OpenAI also recommends that users give specific instructions to agents instead of giving them access to their inbox and letting them “take whatever action they need.”
“Wide latitude makes it easy for hidden or malicious content to affect an agent, even when security measures are in place.”
While OpenEye says protecting Atlas users from immediate injections is a top priority, McCarthy invites some skepticism when it comes to the return on investment for vulnerable browsers.
“For most everyday use cases, agent browsers don’t yet provide enough value to justify their current risk profile,” McCarthy told TechCrunch. “The risk is greater given their access to sensitive data like email and payment information, although that access also makes them powerful. That balance will evolve, but the trade-off is still very real today.”





