Detecting Scaling Factors Beyond the Model: A Reporting Framework for AI Agent Systems
Abstract
As AI agents take on increasingly complex tasks, it becomes difficult to determine whether their progress results from model scaling or environmental design. Performance in agentic systems often arises from improvements in the workspace and accessible knowledge rather than model inference alone. This phenomenon is particularly evident in mathematical reasoning applied to unsolved problems. Recent reports on solving these problems highlight the critical role of subgoal decomposition and formalization. These processes enable an agent to break down complex claims into verifiable steps while using a proof assistant such as Lean for rigorous verification. Such capabilities are heavily influenced by the specific tools and external information available to the system. To address this confounding issue, we propose the Model, Scaffold, and World Reference (MSW) framework to decompose agent performance into three distinct domains. We also introduce a reporting template based on the four perspectives of Identity, Policy, Budget, and Trace (IPBT) to enhance transparency and reproducibility. By auditing publicly available reports claiming that AI systems have solved unsolved mathematical problems, we show that MSW-IPBT highlights missing information that is necessary to distinguish model contributions from environmental support. This study provides guidance for examining claims that AI systems have solved unsolved problems by distinguishing contributions of model improvements from contributions due to environmental scaling.