ICLR The N Implementation Details of RLHF with PPO

Poster
in
Affinity Workshop: Blog Track Session 4

The N Implementation Details of RLHF with PPO

Shengyi Huang · Tianlin Liu · Leandro Von Werra

Halle B #2

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for training modern language models such as ChatGPT. In this blog post, we explore OpenAI's first RLHF paper from 2019 and its accompanying open-source codebase, available at https://github.com/openai/lm-human-preferences. Our examination shows important implementation details of RLHF that are often overlooked. Moreover, we illustrate how to replicate OpenAI's original Tensorflow 1.0 implementation using contemporary PyTorch and JAX frameworks, offering a minimal reference implementation for RLHF.

Chat is not available.

Poster in Affinity Workshop: Blog Track Session 4

The N Implementation Details of RLHF with PPO

Shengyi Huang · Tianlin Liu · Leandro Von Werra

Halle B #2

Poster
in
Affinity Workshop: Blog Track Session 4