In brief
- Researchers found AI agents powered by GPT-5 and Gemini could not resist prompt injection attacks.
- Direct attacks succeeded more than 79% of the time, while hidden attacks embedded in web content frequently manipulated agent behavior.
- The findings suggest prompt injection remains a broader security problem as AI agents become more mainstream.
As developers race to deploy AI agents capable of browsing the internet, conducting research, shopping online, and trading cryptocurrency autonomously, new research suggests the systems remain highly vulnerable to prompt injection attacks.
In a new study published on Thursday, researchers from Nanyang Technological University, ST Engineering, IBM Research, and the University of Illinois Urbana-Champaign found that none of the AI agents they tested consistently resisted prompt injection attacks.
“Existing security benchmarks adopt an attack-centric perspective, focusing on the technical feasibility of injections while overlooking the nuanced distribution of resulting harms,” the researchers wrote. “In practice, however, prompt-injection risk is victim-dependent: a single exploit can produce asymmetric consequences for different stakeholders, and the same attack pattern may exhibit substantially different effectiveness depending on whom it targets.”
Prompt injection occurs when attackers embed hidden instructions in content that an AI agent encounters, causing it to follow the attacker's directions instead of the user's. To address gaps in existing AI agent evaluations, the researchers developed StakeBench, a benchmark that tests how AI agents respond to prompt injection attacks in realistic online environments.
“We now use StakeBench to characterize the conditions under which this vulnerability is amplified or suppressed, focusing on [Indirect Prompt Injection] as the primary deployment-relevant channel,” the researchers wrote. “StakeBench probes three such factors: the semantic distance between the injected objective and the user’s original intent, the consistency of surrounding environmental cues, and the position along the agent’s execution trajectory at which the benchmark first exposes it to the injected content.”
The team conducted 3,168 attack simulations using NanoBrowser and BrowserUse with GPT-5 and Gemini 2.5-Flash. Researchers found direct prompt injection attacks succeeded more than 79% of the time across all tested configurations, and indirect attacks achieved success rates of 41.67% to 68.16%.
The study comes as prompt injection attacks become increasingly common and AI agents proliferate.
In February, Microsoft researchers warned that hidden instructions embedded in AI summary links could influence chatbot behavior. In April, Google documented prompt injection attacks hidden in web pages that attempted to manipulate AI agents into leaking credentials or sending payments. More recently, Microsoft disclosed a prompt injection flaw in Anthropic's Claude Code GitHub Action that could have exposed user credentials.
The study also identified what researchers called "stealthy parasitism," where an AI agent completes a user's task while simultaneously advancing an attacker's objective. For example, stealthy parasitism caused by a prompt injection attack could subtly influence product recommendations, steering users toward a particular item without any obvious signs that the system had been compromised.
“These results indicate that prompt-injection security in deployable web agents is not a scalar property of the backbone model but a distribution of harm whose realization is jointly determined by the affected stakeholder, the semantic alignment between the injected objective and the user’s task, and the architectural context in which the backbone is deployed,” they wrote.

