Skip to yearly menu bar Skip to main content


(7 events)   Timezone:  
Show all
The 2026 schedule is still incomplete
Toggle Poster Visibility
Oral
Sat Apr 25 06:30 AM -- 06:40 AM (PDT) None
Semi-Supervised Preference Optimization with Limited Feedback
Seonggyun Lee · Sungjun Lim · Seojin Park · Soeun Cheon · Kyungwoo Song
[ OpenReview
Oral
Sat Apr 25 06:42 AM -- 06:52 AM (PDT) None
TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models
Philipp Becker · Niklas Freymuth · Serge Thilges · Fabian Otto · Gerhard Neumann
[ OpenReview
Oral
Sat Apr 25 06:54 AM -- 07:04 AM (PDT) None
Multiplayer Nash Preference Optimization
Fang Wu · Xu Huang · Weihao Xuan · Zhiwei Zhang · Yijia Xiao · Guancheng Wan · Xiaomin Li · Bing Hu · Peng Xia · Jure Leskovec · Yejin Choi
[ OpenReview
Oral
Sat Apr 25 07:06 AM -- 07:16 AM (PDT) None
The Art of Scaling Reinforcement Learning Compute for LLMs
Devvrit Khatri · Lovish Madaan · Rishabh Tiwari · Rachit Bansal · Venkata Sai Surya Subramanyam Duvvuri · Manzil Zaheer · Inderjit Dhillon · David Brandfonbrener · Rishabh Agarwal
[ OpenReview
Oral
Sat Apr 25 07:18 AM -- 07:28 AM (PDT) None
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
Eran Malach · Omid Saremi · Sinead Williamson · Arwen Bradley · Aryo Lotfi · Emmanuel Abbe · Joshua Susskind · Etai Littwin
[ OpenReview
Oral
Sat Apr 25 07:30 AM -- 07:40 AM (PDT) None
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Geon-Hyeong Kim · Youngsoo Jang · Yu Jin Kim · Byoungjip Kim · Honglak Lee · Kyunghoon Bae · Moontae Lee
[ OpenReview
Oral
Sat Apr 25 07:42 AM -- 07:52 AM (PDT) None
Why DPO is a Misspecified Estimator and How to Fix It
Aditya Gopalan · Sayak Ray Chowdhury · Debangshu Banerjee
[ OpenReview