Skip to yearly menu bar Skip to main content


Poster Thu, Apr 23, 2026 • 11:15 AM – 1:45 PM PDT Pavilion 3 P3-#2013

Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions

Lu Ma ⋅ Hao Liang ⋅ Meiyi Qiang ⋅ Lexiang Tang ⋅ Xiaochen Ma ⋅ Zhen Wong ⋅ Junbo Niu ⋅ Chengyu Shen ⋅ Runming He ⋅ Yanhao Li ⋅ Wentao Zhang ⋅ Bin CUI

Abstract

Log in and register to view live content