The Confused Deputy Problem in AI Agents

Most AI security research is obsessed with jailbreaking—forcing a model to say something it shouldn’t. But for founders building real AI organizations, that’s a distraction. The real danger lies in the “confused deputy” problem emerging in multi-agent systems.

A new paper by Li, Zhang, and Polley dives into security vulnerabilities in general-purpose AI agents, and the findings are alarming. They reframe the threat model: it’s not just about malicious inputs or harmful outputs. The true attack surface is the connective tissue between agents—especially during coordination.

One agent generates a task. Another executes it—trusting the input without verification. That trust is the crack attackers exploit. The paper maps how this leads to cascading failures in long-running workflows, where a single poisoned signal can derail entire operational chains.

This isn’t theoretical. As enterprises deploy agentic workflows for planning, code generation, and decision support, they’re embedding systemic risk into core processes. And you can’t patch this with better prompt filtering or output guards.

Securing AI agents means securing the handoffs—the delegation, the shared tools, the inherited goals. It’s a systems-level problem, not a model-level one.

If you’re building with AI agents, start auditing the seams. And if you’re serious about securing your AI stack from day one, see how MachineMachine is tackling the confused deputy problem—at scale.

Get early access to our agent security framework at /early-access.

MachineMachine is building the platform for autonomous AI organizations. Early access →