Skip to yearly menu bar Skip to main content

Affinity Workshop: Blog Track Session 4

The N Implementation Details of RLHF with PPO

Shengyi Huang · Tianlin Liu · Leandro Von Werra

Halle B #2
[ ] [ Project Page ]
Wed 8 May 7:30 a.m. PDT — 9:30 a.m. PDT


Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for training modern language models such as ChatGPT. In this blog post, we explore OpenAI's first RLHF paper from 2019 and its accompanying open-source codebase, available at Our examination shows important implementation details of RLHF that are often overlooked. Moreover, we illustrate how to replicate OpenAI's original Tensorflow 1.0 implementation using contemporary PyTorch and JAX frameworks, offering a minimal reference implementation for RLHF.

Chat is not available.