firstbacksecondback
13 Results
Poster
|
Thu 1:45 |
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback Yifu Yuan · Jianye HAO · Yi Ma · Zibin Dong · Hebin Liang · Jinyi Liu · Zhixin Feng · Kai Zhao · YAN ZHENG |
|
Workshop
|
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint Wei Xiong · Hanze Dong · Chenlu Ye · Ziqi Wang · Han Zhong · Heng Ji · Nan Jiang · Tong Zhang |
||
Workshop
|
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint Wei Xiong · Hanze Dong · Chenlu Ye · Ziqi Wang · Han Zhong · Heng Ji · Nan Jiang · Tong Zhang |
||
Affinity Workshop
|
Thu 7:30 |
RLHF without RL Mischa Panchenko |
|
Affinity Workshop
|
Wed 7:30 |
The N Implementation Details of RLHF with PPO Shengyi Huang · Tianlin Liu · Leandro Von Werra |
|
Affinity Workshop
|
Thu 7:30 |
Policy Optimization in RLHF: The Impact of Out-of-preference Data Ziniu Li · Tian Xu · Yang Yu |
|
Poster
|
Wed 1:45 |
The Trickle-down Impact of Reward Inconsistency on RLHF Lingfeng Shen · Lingfeng Shen · Sihao Chen · Linfeng Song · Lifeng Jin · Baolin Peng · Haitao Mi · Daniel Khashabi · Dong Yu |
|
Affinity Workshop
|
Policy Optimization in RLHF: The Impact of Out-of-preference Data Ziniu Li · Tian Xu · Yang Yu |
||
Poster
|
Tue 7:30 |
Confronting Reward Model Overoptimization with Constrained RLHF Ted Moskovitz · Aaditya Singh · DJ Strouse · Tuomas Sandholm · Ruslan Salakhutdinov · Anca Dragan · Stephen McAleer |
|
Poster
|
Wed 1:45 |
Safe RLHF: Safe Reinforcement Learning from Human Feedback Juntao Dai · Xuehai Pan · Ruiyang Sun · Jiaming Ji · Xinbo Xu · Mickel Liu · Yizhou Wang · Yaodong Yang |
|
Poster
|
Wed 1:45 |
Understanding the Effects of RLHF on LLM Generalisation and Diversity Robert Kirk · Ishita Mediratta · Christoforos Nalmpantis · Jelena Luketina · Eric Hambro · Edward Grefenstette · Roberta Raileanu |
|
Workshop
|
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models Hritik Bansal · John Dang · Aditya Grover |