Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: Blog Track Session 4

The N Implementation Details of RLHF with PPO

Shengyi Huang · Tianlin Liu · Leandro Von Werra

Halle B #2
[ ] [ Project Page ]
Wed 8 May 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for training modern language models such as ChatGPT. In this blog post, we explore OpenAI's first RLHF paper from 2019 and its accompanying open-source codebase, available at https://github.com/openai/lm-human-preferences. Our examination shows important implementation details of RLHF that are often overlooked. Moreover, we illustrate how to replicate OpenAI's original Tensorflow 1.0 implementation using contemporary PyTorch and JAX frameworks, offering a minimal reference implementation for RLHF.

Live content is unavailable. Log in and register to view live content