Skip to yearly menu bar Skip to main content


Poster

Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?

Haizhong Zheng · Jiawei Zhao · Beidi Chen

Abstract

Log in and register to view live content