Skip to yearly menu bar Skip to main content


Poster

From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation

Yuxin Jiang · Yufei Wang · Qiyuan Zhang · Xingshan Zeng · Liangyou Li · Jierun Chen · Chaofan Tao · Haoli Bai · Lifeng Shang

Abstract

Log in and register to view live content