Dev.to3d ago1 min read

We Ran a $5,000 AI Agent Adversarial Testbed...

I published a research paper this week. The number that surprised me most was not the one I expected. I expected the 0%: under a restrictive pre-action authorization policy, a population of 879 adversarial attempts achieved zero successful unauthorized actions. That part worked as designed. The number that stopped me was 74.6%. That's how often social engineering succeeded against the model alone, with no authorization layer, across a live adversarial testbed with a $5,000 bounty to anyone who c

Read original on dev.to