Skip to yearly menu bar Skip to main content


Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

Banghua Zhu · Jiantao Jiao · Michael Jordan

Abstract

Video

Chat is not available.