firstbacksecondback
266 Results
Poster
|
Wed 1:45 |
Universal Jailbreak Backdoors from Poisoned Human Feedback Javier Rando · Florian Tramer |
|
Poster
|
Wed 1:45 |
Human Feedback is not Gold Standard Tom Hosking · Phil Blunsom · Max Bartolo |
|
Poster
|
Thu 1:45 |
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach Xinwei Zhang · Zhiqi Bu · Steven Wu · Mingyi Hong |
|
Poster
|
Wed 7:30 |
Contrastive Preference Learning: Learning from Human Feedback without Reinforcement Learning Joey Hejna · Rafael Rafailov · Harshit Sikchi · Chelsea Finn · Scott Niekum · W. Bradley Knox · Dorsa Sadigh |
|
Poster
|
Wed 1:45 |
Safe RLHF: Safe Reinforcement Learning from Human Feedback Juntao Dai · Xuehai Pan · Ruiyang Sun · Jiaming Ji · Xinbo Xu · Mickel Liu · Yizhou Wang · Yaodong Yang |
|
Poster
|
Wed 7:30 |
Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles Zhiwei Tang · Dmitry Rybin · Tsung-Hui Chang |
|
Poster
|
Thu 1:45 |
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback Yifu Yuan · Jianye HAO · Yi Ma · Zibin Dong · Hebin Liang · Jinyi Liu · Zhixin Feng · Kai Zhao · YAN ZHENG |
|
Poster
|
Thu 7:30 |
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback Souradip Chakraborty · Amrit Bedi · Alec Koppel · Huazheng Wang · Dinesh Manocha · Mengdi Wang · Furong Huang |
|
Poster
|
Wed 1:45 |
Hindsight PRIORs for Reward Learning from Human Preferences Mudit Verma · Katherine Metcalf |
|
Poster
|
Thu 7:30 |
The Human-AI Substitution game: active learning from a strategic labeler Tom Yan · Chicheng Zhang |
|
Workshop
|
Learning to Abstract Visuomotor Mappings using Meta-Reinforcement Learning Carlos Velazquez-Vargas · Isaac Christian · Jordan Taylor · Sreejan Kumar |
||
Poster
|
Tue 7:30 |
Making RL with Preference-based Feedback Efficient via Randomization Runzhe Wu · Wen Sun |