DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
Abstract
Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. We conduct a systematic analysis across models and datasets and discover a U-shaped entropy pattern: high entropy on simple problems despite high accuracy, low entropy on medium difficulty, and high entropy on hard problems reflecting uncertainty. The 22--25\% entropy reduction from simple to optimal regions reveals a fundamental inefficiency—an \emph{overthinking} phenomenon on easy instances. Building on these insights, we introduce \textbf{DiffAdapt}, a lightweight, deployment-ready framework that predicts problem difficulty from hidden states and selects among Easy/Normal/Hard reasoning strategies to allocate computation adaptively. DiffAdapt requires no retraining of the base LLM and is compatible with common inference optimizations. Across five models and eight benchmarks, DiffAdapt achieves comparable or improved accuracy while reducing token usage by up to 22.4\%, establishing a practical path toward compute-efficient reasoning.