SAGE: Self-play Adversarial Games Enhance Large Language Model Reasoning Capabilities
Abstract
We introduce SAGE (Self-play Adversarial Games for Enhancement), a framework for improving LLM reasoning capabilities through adversarial self-play without human-curated data. SAGE places two model instances in an asymmetric game: a Setter generates a problem and predicts its solution, while a Solver attempts to solve the problem independently. The Setter receives positive reward only when it answers correctly and the Opponent fails, incentivizing the generation of problems that are solvable yet challenging and naturally targeting the frontier of model capabilities. We instantiate SAGE in two domains: Code-Game, where problems are Python programs verified by execution, and Math-Game, where math problems are graded by an external LLM judge, as a proxy for a verifiable environment. Training models from 1B to 4B parameters across two architectures (Qwen, Llama), SAGE consistently outperforms baselines: up to +10% on MATH, +8% on MBPP, and +6% on ARC-Challenge. Notably, we find cross-domain transfer: Code-Game training improves mathematical reasoning and vice versa, suggesting SAGE strengthens domain-general reasoning skills. Ablations confirm that adversarial pressure, rather than verified rewards alone, drives these gains: removing the opponent while retaining execution-verified rewards decreases the improvement by 40-70%. SAGE offers a scalable path to reasoning improvement that requires only a verifier, not human supervision.