ICLR RLHF without RL

Poster
in
Affinity Workshop: Blog Track Session 6

RLHF without RL

Mischa Panchenko

Halle B #2

[ Abstract ] [ Project Page ]

[ OpenReview]

Thu 9 May 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Reinforcement learning from human feedback (RLHF) plays an important role in aligning language models to human preferences. However, there has been some discussion about whether RLHF is actually reinforcement learning at all. The environment for RLHF consists of the model itself, and no new data is acquired during the training process. The only way in which additional data is incorporated into the training is in the supervised training of the reward function. Recently, this discussion has been exacerbated by the publication of the Direct Preference Optimization algorithm, which bypasses reinforcement learning entirely. In this blogpost we will discuss related works, highlight the information flow of RLHF, and analyze to which extent alignment requires RL for modern applications of LLMs.

Chat is not available.

Poster in Affinity Workshop: Blog Track Session 6

RLHF without RL

Mischa Panchenko

Halle B #2

Poster
in
Affinity Workshop: Blog Track Session 6