Toggle Poster Visibility
Oral
Sat Apr 25 06:30 AM -- 06:40 AM (PDT) @ 202 A/B None
Semi-Supervised Preference Optimization with Limited Feedback
[
Slides]
[
OpenReview]
Oral
Sat Apr 25 06:42 AM -- 06:52 AM (PDT) @ 202 A/B None
TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models
[
OpenReview]
Oral
Sat Apr 25 06:54 AM -- 07:04 AM (PDT) @ 202 A/B None
Multiplayer Nash Preference Optimization
[
OpenReview]
Oral
Sat Apr 25 07:06 AM -- 07:16 AM (PDT) @ 202 A/B None
The Art of Scaling Reinforcement Learning Compute for LLMs
[
Slides]
[
OpenReview]
Oral
Sat Apr 25 07:18 AM -- 07:28 AM (PDT) @ 202 A/B None
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
[
OpenReview]
Oral
Sat Apr 25 07:30 AM -- 07:40 AM (PDT) @ 202 A/B None
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
[
OpenReview]
Oral
Sat Apr 25 07:42 AM -- 07:52 AM (PDT) @ 202 A/B None
Why DPO is a Misspecified Estimator and How to Fix It
[
OpenReview]
Successful Page Load