Skip to yearly menu bar Skip to main content


Offline Reinforcement Learning for LLM Multi-Step Reasoning

Huaijie Wang ⋅ Shibo Hao ⋅ Hanze Dong ⋅ Shenao Zhang ⋅ Yilin Bao ⋅ Ziran Yang ⋅ Yi Wu

Abstract

Chat is not available.