Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Self-Improving Foundation Models Without Human Supervision

Assessing Diversity Collapse in Reasoning

Xingyu Dang · Christina Baek · Zico Kolter · Aditi Raghunathan

Keywords: [ LLM ] [ reasoning ] [ supervised finetuning ] [ decoding strategy ] [ reinforcement learning ]


Abstract:

We identify a striking phenomenon in large language models finetuned on reasoning tasks: as Pass@1 improves during supervised finetuning, Pass@k rapidly deteriorates and fails to recover with reinforcement learning or self-improvement. We formalize the relationship between expected Pass@k and Pass@1 over the test distribution and attribute the early drop in Pass@k to diversity collapse—where fine-tuning causes the probability mass to converge toward a single reasoning path and final answer for test questions. We theoretically prove how the standard finetuning strategy of SFT and RL leads to diversity collapse in reasoning models. Finally, we estimate the optimal Pass@k performance achievable with an oracle given access to the model's distribution over final answers marginalized over all rollouts and reveal a significant gap compared to current token-level diverse decoding methods such as temperature scale, top-k, nucleus, and min-p sampling. We highlight the need for better decoding strategies for generating reasoning steps during self-improvement and inference. Finally, we propose a promising solution by model weight interpolation.

Chat is not available.