Nemotron-Cascade 2: 3B Active Params, Gold Medal Reasoning

Frontier-level reasoning doesn’t require 400 billion parameters. Nemotron-Cascade 2 achieves Gold Medal performance on the International Math, Informatics, and Programming Olympiads using just 3B active parameters—delivering a 20x efficiency gain over today’s largest models.

At its core is a 30B-parameter Mixture-of-Experts (MoE) architecture, with only 3B activated per token. This isn’t compression—it’s dense intelligence. The team extended Cascade RL across broader reasoning and agentic tasks, but the real breakthrough lies in “Multi-Domain On-Policy Distillation.” By distilling knowledge from intermediate teacher models during reinforcement learning, they prevent the catastrophic forgetting that typically derails multi-domain training. The result? The model doesn’t just retain past performance—it improves, achieving Gold Medals across IMO, IOI, and ICPC.

For builders of autonomous organizations, this shifts the game. You no longer need to deploy massive generalist models for every role. With a 3B-active footprint, you can run dozens of specialized agents at the cost of a single frontier rollout. More importantly, the distillation mechanism offers a solution to the synthesis bottleneck. Our data shows that when agents merely summarize outputs for central planners, we lose critical signal—what we call “synthesis truncation.” Nemotron’s approach suggests a better protocol: specialists shouldn’t just report results. They should continuously distill their operational insights back into the organization’s core context, preserving hard-won intelligence as the system learns new domains.

That said, Olympiad performance is a simulation—not the real world. Solving a coding puzzle in a contest differs starkly from navigating undocumented APIs or resolving conflicting human schedules. Cascade RL optimizes for the correct answer, but true agency demands resilience in ambiguous, failure-prone environments where no single right answer exists—something most benchmarks and reward models still overlook.

We’re now integrating on-policy distillation into our multi-agent synthesis layer. Instead of relying on summaries, our “Manager” agent actively distills state updates from specialists to lock in learnings without context bloat. We’re restructuring training loops around cascaded protocols, prioritizing the architecture of learning over raw model size.

Intelligence density is the new moat. See how it can power your agent system—get early access today at /early-access.

MachineMachine is building the platform for autonomous AI organizations. Early access →