Poster
in
Workshop: Workshop on Reasoning and Planning for Large Language Models

When More is Less: Understanding Chain-of-Thought Length in LLMs

Yuyang Wu · Yifei Wang · Tianqi Du · Stefanie Jegelka · Yisen Wang

Project Page [ OpenReview]

Abstract

Chain-of-thought (CoT) reasoning enhances the multi-step reasoning capabilities of large language models (LLMs) by breaking complex tasks into smaller, manageable sub-tasks. Researchers have been exploring ways to guide models to generate more complex CoT processes to improve the reasoning ability of LLMs, such as long CoT and the test-time scaling law. However, for most models and tasks, does an increase in CoT length consistently lead to improved reasoning accuracy?In this paper, we observe a nuanced relationship: as the number of reasoning steps increases, performance initially improves but eventually decreases. To understand this phenomenon, we provide a piece of evidence that longer reasoning processes are increasingly susceptible to noise. We theoretically prove the existence of an optimal reasoning step number and derive a scaling law for this optimal CoT length based on model capability and task difficulty. Inspired by our theory, we propose length-aware majority voting to alleviate the effects of excessively long or short CoTs, which is verified on both synthetic and real-world datasets. Our findings highlight the critical need to calibrate CoT length to align with model capabilities and task demands, offering a principled framework for optimizing multi-step reasoning in LLMs.

Chat is not available.