Skip to yearly menu bar Skip to main content


Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

Wei Xiong ⋅ Hanze Dong ⋅ Chenlu Ye ⋅ Ziqi Wang ⋅ Han Zhong ⋅ Heng Ji ⋅ Nan Jiang ⋅ Tong Zhang

Abstract

Video

Chat is not available.