๐Ÿ“Š BenchmarkSuite v2 ยท Real Data

Does multi-agent actually work?

We ran a blinded benchmark: three enterprise tasks, three organisational topologies, multiple iterations. Single-agent baseline vs multi-agent coordination. Here's what we found.

๐Ÿ”ฌ Blinded evaluator (Claude Haiku)
๐Ÿ“‹ 45 conditions total
๐Ÿ’ฐ Real API cost: $1.94
๐Ÿ“… February 2026
+20%
improvement on contract review
with the right multi-agent topology
๐Ÿ“œ

Contract Review

Legal & Compliance โ€” identify risks, recommend amendments

Star โœ“ Best
SA 81.3
MA 97.7
+16.4 pts
Self-Decompose
SA 92.0
MA 93.6
+1.6 pts
HRM โš  Wrong choice
SA 87.9
MA 51.7
โˆ’36.2 pts
Insight: Star topology (parallel specialists โ†’ synthesis) dominates here. HRM (hierarchical decomposition) catastrophically fails โ€” the task needs broad coverage, not deep hierarchy. Topology selection is the product.
๐Ÿ”

Code Review Protocol

Software Engineering โ€” design a complete CI gate for a 3x-daily shipping team

HRM โœ“ Best
SA 87.1
MA 91.7
+4.6 pts
Star
SA 87.0
MA 81.1
โˆ’5.9 pts
Self-Decompose
SA 87.1
MA 79.0
โˆ’8.1 pts
Insight: Hierarchical review (sub-tasks โ†’ integration) wins here. Flat star parallelism actually hurts on tasks that need sequential reasoning chains.
๐ŸŽซ

Support Triage System

Customer Support โ€” design a full triage system for a B2B SaaS product

HRM โœ“ Best
SA 87.5
MA 92.2
+4.7 pts
Self-Decompose
SA 88.4
MA 88.7
+0.2 pts
Star
SA 86.4
MA 87.6
+1.1 pts
Insight: Consistent improvements across all topologies โ€” this task benefits from any multi-agent approach.

What this actually means

๐Ÿ“

Topology is the variable

Multi-agent isn't universally better. The right topology can add 20 points. The wrong one can lose 36. Structure determines outcome โ€” not model size.

โšก

Our thesis is validated

MachineMachine doesn't deploy agents. It deploys organisations โ€” with dynamic topology selection matched to the task. That's the edge that compounds.

๐Ÿ”ฌ

Reproducible

Source: machine.machine/agent-org-simulator. Blinded evaluation by Claude Haiku. Raw JSON results available on request.