ICLR Poster Progress or Regress? Self-Improvement Reversal in Post-training

Poster

Progress or Regress? Self-Improvement Reversal in Post-training

Ting Wu · Xuefeng Li · Pengfei Liu

Hall 3 + Hall 2B #260

[ Abstract ] [ Project Page ]

Thu 24 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

Self-improvement through post-training methods such as iterative preference learning has been acclaimed for enhancing the problem-solving capabilities (e.g., mathematical reasoning) of Large Language Models (LLMs) without human intervention. However, as our exploration deepens, it is crucial to critically assess whether these enhancements indeed signify comprehensive progress or if they could lead to unintended regressions. Through rigorous experimentation and analysis across diverse problem-solving tasks, we uncover nuances in the self-improvement trajectories of LLMs. Our study introduces the concept of \emph{self-improvement reversal}, where models showing improved overall accuracy metrics might paradoxically exhibit declines in broader, essential capabilities. We propose a comprehensive evaluative framework to scrutinize the underlying mechanisms and outcomes of post-training self-improvement, aiming to discern between superficial metric improvements and genuine enhancements in model functionality. The findings emphasize the complexity of technological advancements in LLMs, underscoring the need for a nuanced understanding of the \textit{progress or regress} dichotomy in their development.

Live content is unavailable. Log in and register to view live content