Skip to yearly menu bar Skip to main content


Poster Thu, Apr 23, 2026 • 6:30 AM – 9:00 AM PDT Pavilion 4 P4-#4714

Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models

Zizhuo Zhang ⋅ Jianing ZHU ⋅ Xinmu Ge ⋅ Zihua Zhao ⋅ (Andrew) Zhanke Zhou ⋅ Xuan Li ⋅ Xiao Feng ⋅ Jiangchao Yao ⋅ Bo Han

Abstract

Log in and register to view live content