Skip to yearly menu bar Skip to main content


PILAF: Optimal Human Preference Sampling for Reward Modeling

Yunzhen Feng ⋅ Ariel Kwiatkowski ⋅ Kunhao Zheng ⋅ Julia Kempe ⋅ Yaqi Duan

Abstract

Chat is not available.