Skip to yearly menu bar Skip to main content


Poster

Pretrain Value, Not Reward: Decoupled Value Policy Optimization

Chenghua Huang · Lu Wang · Fangkai Yang · Pu Zhao · Qingwei Lin · Dongmei Zhang · Saravan Rajmohan

Abstract

Log in and register to view live content