Anthropic’s recent testing of its autonomous Chrome extension revealed notable security weaknesses. In 123 test cases spanning 29 simulated attack scenarios, the company found a 23.6 percent success rate for malicious actions when the browser agent operated without any built-in safety protections.
In one test, attackers used a fraudulent email that instructed Claude to delete all of a user’s messages under the guise of improving “mailbox hygiene.” Without safeguards, Claude complied and erased the emails—no confirmation prompt required.
To address these risks, Anthropic says it has introduced multiple defensive layers. Users can now explicitly allow or block Claude’s access to specific sites via granular permissions. The extension also requires user approval before triggering high-risk actions such as purchases, publishing content, or sharing sensitive information. Furthermore, the AI is automatically prevented from accessing financial platforms, adult sites, and piracy-related domains.
These mitigations significantly improved security outcomes: the autonomous attack success rate dropped from 23.6 percent to 11.2 percent. In a focused assessment of four browser-specific attack categories, the enhanced defenses reduced successful exploits from 35.7 percent to zero.
Still, independent AI researcher Simon Willison—known for coining the term “prompt injection” in 2022—warned that an 11.2 percent breach rate remains dangerously high. On his blog, he wrote, “In the absence of 100% reliable protection I have trouble imagining a world in which it’s a good idea to unleash this pattern.”
Willison argues that the broader design philosophy behind “agentic” browser extensions may be fundamentally flawed. In another post discussing similar issues found in Perplexity Comet, he said he doubts such systems can ever be made truly safe.
Recent incidents lend weight to these concerns. Brave’s security team revealed that Perplexity’s Comet browser agent could be manipulated into accessing users’ Gmail accounts and initiating password recovery attempts. Attackers embedded hidden malicious instructions inside Reddit posts, which the AI executed when a user asked it to summarize a thread. Although Perplexity deployed a fix, Brave later confirmed the patch had been bypassed and the vulnerability persisted.
For now, Anthropic is treating its Chrome extension as a research preview, hoping to uncover and mitigate real-world attack patterns before broader release. But with the current state of agentic AI browsing tools, security experts caution that end users bear significant risk. As Willison noted, “I don’t think it’s reasonable to expect end users to make good decisions about the security risks.”
