Skip to yearly menu bar Skip to main content


Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Shenao Zhang ⋅ Zhihan Liu ⋅ Boyi Liu ⋅ Yufeng Zhang ⋅ Yingxiang Yang ⋅ Yongfei Liu ⋅ Liyu Chen ⋅ TAO SUN ⋅ Zhaoran Wang

Abstract

Video

Chat is not available.