Interpreting Chain-of-thought Reasoning via Partial Information Decomposition
Barproda Halder ⋅ Qiuyi (Richard) Zhang ⋅ Sanghamitra Dutta
Abstract
Large reasoning models have generated interest in complex tasks. However, they often generate verbose, repetitive, or incorrect reasoning steps on challenging problems. In this work, we introduce a new interpretability framework SLIDER for evaluating the quality of the reasoning process, assessing consecutive steps in terms of incorrectness and repetitiveness. SLIDER leverages an emerging body of work from information theory called Partial Information Decomposition (PID) to disentangle the information about the target between two consecutive reasoning steps into non-negative components: unique information in a reasoning step $S_i$ or $S_{i+1}$ that is not in the other, redundant information that is common between both steps, and synergistic information which is only meaningful when the steps are considered jointly. Given the responses of a large reasoning model, SLIDER moves across the steps in a sliding-window, projects them onto a meaningful embedding space, and then computes a set of new per-token information-decomposition measures that enables the identification of various failure modes. We demonstrate application of SLIDER to analyze incorrectness and repetitiveness for several use-cases across arithmetic problems and GSM8K word problems.
Chat is not available.
Successful Page Load