Skip to yearly menu bar Skip to main content


Poster Thu, Apr 23, 2026 • 11:15 AM – 1:45 PM PDT Pavilion 3 P3-#2012

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

YiFan Zhang ⋅ Xingyu Lu ⋅ Xiao Hu ⋅ Chaoyou Fu ⋅ Bin Wen ⋅ Tianke Zhang ⋅ Changyi Liu ⋅ Kaiyu Jiang ⋅ Kaibing Chen ⋅ Kaiyu Tang ⋅ Haojie Ding ⋅ Jiankang Chen ⋅ Fan Yang ⋅ Zhang Zhang ⋅ Tingting Gao ⋅ Di ZHANG ⋅ Guorui Zhou ⋅ Liang Wang

Abstract

Log in and register to view live content