Skip to yearly menu bar Skip to main content


(7 events)   Timezone:  
Show all
Toggle Poster Visibility
Oral
Sat Apr 25 06:30 AM -- 06:40 AM (PDT) @ 202 A/B None
Semi-Supervised Preference Optimization with Limited Feedback
Seonggyun Lee ⋅ Sungjun Lim ⋅ Seojin Park ⋅ Soeun Cheon ⋅ Kyungwoo Song
[ Slides [ OpenReview
Oral
Sat Apr 25 06:42 AM -- 06:52 AM (PDT) @ 202 A/B None
TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models
Philipp Becker ⋅ Niklas Freymuth ⋅ Serge Thilges ⋅ Fabian Otto ⋅ Gerhard Neumann
[ OpenReview
Oral
Sat Apr 25 06:54 AM -- 07:04 AM (PDT) @ 202 A/B None
Multiplayer Nash Preference Optimization
Fang Wu ⋅ Xu Huang ⋅ Weihao Xuan ⋅ Zhiwei Zhang ⋅ Yijia Xiao ⋅ Frank Wan ⋅ Xiaomin Li ⋅ Bing Hu ⋅ Peng Xia ⋅ Jure Leskovec ⋅ Yejin Choi
[ OpenReview
Oral
Sat Apr 25 07:06 AM -- 07:16 AM (PDT) @ 202 A/B None
The Art of Scaling Reinforcement Learning Compute for LLMs
Devvrit Khatri ⋅ Lovish Madaan ⋅ Rishabh Tiwari ⋅ Rachit Bansal ⋅ Venkata Sai Surya Subramanyam Duvvuri ⋅ Manzil Zaheer ⋅ Inderjit Dhillon ⋅ David Brandfonbrener ⋅ Rishabh Agarwal
[ Slides [ OpenReview
Oral
Sat Apr 25 07:18 AM -- 07:28 AM (PDT) @ 202 A/B None
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
Eran Malach ⋅ Omid Saremi ⋅ Sinead Williamson ⋅ Arwen Bradley ⋅ Aryo Lotfi ⋅ Emmanuel Abbe ⋅ Joshua Susskind ⋅ Etai Littwin
[ OpenReview
Oral
Sat Apr 25 07:30 AM -- 07:40 AM (PDT) @ 202 A/B None
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Geon-Hyeong Kim ⋅ Yu Jin Kim ⋅ Byoungjip Kim ⋅ Honglak Lee ⋅ Kyunghoon Bae ⋅ Youngsoo Jang ⋅ Moontae Lee
[ OpenReview
Oral
Sat Apr 25 07:42 AM -- 07:52 AM (PDT) @ 202 A/B None
Why DPO is a Misspecified Estimator and How to Fix It
Aditya Gopalan ⋅ Sayak Ray Chowdhury ⋅ Debangshu Banerjee
[ OpenReview