ICLR Attributing Mode Collapse in the fine-tuning of Large Language Models

Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models

Attributing Mode Collapse in the fine-tuning of Large Language Models

Laura O'Mahony · Leo Grinsztajn · Hailey Schoelkopf · Stella R Biderman

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

Large language models (LLMs) are typically trained in two stages: first, pre-training on a large, diverse dataset for general-purpose language modeling capabilities, followed by a fine-tuning stage (often called "instruction tuning" or "alignment") on smaller, more curated datasets to adapt them to a specific task or downstream application, such as chat, or general instruction-following. It is a well-known anecdotal observation that instruction-tuned models have less output diversity, meaning a model lacks the ability to generate varied outputs, which can be a limitation for many use cases. In this manuscript, we quantify how each step in a typical RLHF or instruction-tuning pipeline changes a model's diversity, for a series of models trained in a controlled fine-tuning setup. We distinguish between two categories of diversity in LLMs: token-level prediction diversity, and model output generation diversity. We find that the supervised fine-tuning and reward-based fine-tuning steps have different effects on these distinct diversity types. Our results have implications for better understanding the effects of instruction tuning on language models.

Chat is not available.

Poster in Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models

Attributing Mode Collapse in the fine-tuning of Large Language Models

Laura O'Mahony · Leo Grinsztajn · Hailey Schoelkopf · Stella R Biderman

Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models