Skip to yearly menu bar Skip to main content


Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration

Avinandan Bose ⋅ Zhihan Xiong ⋅ Aadirupa Saha ⋅ Simon Du ⋅ Maryam Fazel

Abstract

Chat is not available.