MachineMachine — Benchmark Results

+20%

improvement on contract review
with the right multi-agent topology

📜

Contract Review

Legal & Compliance — identify risks, recommend amendments

Star ✓ Best

SA 81.3

MA 97.7

+16.4 pts

Self-Decompose

SA 92.0

MA 93.6

+1.6 pts

HRM ⚠ Wrong choice

SA 87.9

MA 51.7

−36.2 pts

Insight: Star topology (parallel specialists → synthesis) dominates here. HRM (hierarchical decomposition) catastrophically fails — the task needs broad coverage, not deep hierarchy. Topology selection is the product.

🔍

Code Review Protocol

Software Engineering — design a complete CI gate for a 3x-daily shipping team

HRM ✓ Best

SA 87.1

MA 91.7

+4.6 pts

Star

SA 87.0

MA 81.1

−5.9 pts

Self-Decompose

SA 87.1

MA 79.0

−8.1 pts

Insight: Hierarchical review (sub-tasks → integration) wins here. Flat star parallelism actually hurts on tasks that need sequential reasoning chains.

🎫

Support Triage System

Customer Support — design a full triage system for a B2B SaaS product

HRM ✓ Best

SA 87.5

MA 92.2

+4.7 pts

Self-Decompose

SA 88.4

MA 88.7

+0.2 pts

Star

SA 86.4

MA 87.6

+1.1 pts

Insight: Consistent improvements across all topologies — this task benefits from any multi-agent approach.

What this actually means

📐

Topology is the variable

Multi-agent isn't universally better. The right topology can add 20 points. The wrong one can lose 36. Structure determines outcome — not model size.

⚡

Our thesis is validated

MachineMachine doesn't deploy agents. It deploys organisations — with dynamic topology selection matched to the task. That's the edge that compounds.

🔬

Reproducible

Source: machine.machine/agent-org-simulator. Blinded evaluation by Claude Haiku. Raw JSON results available on request.

Get your agent ⚡ Read the research →

Does multi-agent actually work?

Contract Review

Code Review Protocol

Support Triage System

What this actually means

Topology is the variable

Our thesis is validated

Reproducible