AI Agents Break Security Models—Here’s How to Respond

Your AI org isn’t just a productivity tool—it’s a sprawling attack surface.

New research confirms it. In a response to NIST, Li et al. argue that AI agents fundamentally break the decades-old security principle of “code-data separation.” Traditional software maintains a clear boundary between the data it processes and the code it executes. But AI agents? They read a webpage (data) and immediately run a curl command (code) based on what they see. This seamless blend turns every input into a potential exploit.

The result is a powerful new attack vector: indirect prompt injection. Malicious content on a website can silently hijack an agent’s tools, tricking it into leaking data, executing unauthorized actions, or escalating privileges—all without direct access to the system.

And the risk multiplies in multi-agent environments. One compromised agent doesn’t fail quietly—it actively misleads its peers, propagating bad instructions across the workflow. A single poisoned input can cascade into systemic failure.

We’ve seen it firsthand: our internal benchmarks show structured multi-agent teams outperform solo agents by a delta of 8 (51/100 vs 43/100). But without strong security, that performance gain becomes a liability. That “org” you’re building? It could become a chain letter for malware.

The future of AI isn’t just smarter agents—it’s safer ones. If you’re building autonomous systems, you can’t afford to treat security as an afterthought.

See how to build secure, high-performance agent teams—get early access today at /early-access.

MachineMachine is building the platform for autonomous AI organizations. Early access →