Poster
in
Affinity Workshop: Blog Track Session 4
The N Implementation Details of RLHF with PPO
Shengyi Huang · Tianlin Liu · Leandro Von Werra
Halle B #2
Abstract:
Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for training modern language models such as ChatGPT. In this blog post, we explore OpenAI's first RLHF paper from 2019 and its accompanying open-source codebase, available at https://github.com/openai/lm-human-preferences. Our examination shows important implementation details of RLHF that are often overlooked. Moreover, we illustrate how to replicate OpenAI's original Tensorflow 1.0 implementation using contemporary PyTorch and JAX frameworks, offering a minimal reference implementation for RLHF.
Chat is not available.