Skip to yearly menu bar Skip to main content


Poster

RiskPO: Risk-based Policy Optimization with Verifiable Reward for LLM Post-Training

Tao Ren · Jinyang Jiang · Hui Yang · Wan Tian · Minhao Zou · Guanghao Li · Zishi Zhang · Qinghao Wang · Shentao Qin · Yanjun Zhao · Rui Tao · Hui Shao · Yijie Peng

Abstract

Log in and register to view live content