Poster
in
Workshop: Workshop on Reasoning and Planning for Large Language Models

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

Zishun Yu · Tengyu Xu · Di Jin · Karthik Abinav Sankararaman · Yun He · Wenxuan Zhou · Zhouhao Zeng · Eryk Helenowski · Chen Zhu · Sinong Wang · Hao Ma · Han Fang

Project Page [ OpenReview]

Abstract

Solving mathematics problems has been an intriguing capability of language models, and many efforts have been made to improve reasoning by extending reasoning length, such as through self-correction and extensive long chain-of-thoughts. While promising in problem-solving, advanced long reasoning chain models exhibit an undesired uni-modal behavior, where trivial questions require unnecessarily tedious long chains of thought. In this work, we propose a way to allow models to be aware of inference budgets by formulating it as utility maximization with respect to an inference budget constraint, hence naming our algorithm Inference Budget-Constrained Policy Optimization (IBPO). In a nutshell, models fine-tuned through IBPO learn to ``understand'' the difficulty of queries and allocate inference budgets to harder ones. With different inference budgets, our best models are able to have a $4.14$\% and $5.74$\% absolute improvement ($8.08$\% and $11.2$\% relative) on MATH500 using $2.16$x and $4.32$x inference budgets respectively, relative to LLaMA3.1 8B Instruct. These improvements are approximately $2$x those of self-consistency under the same budgets.

Chat is not available.