Blog Track Poster Sat, Apr 25, 2026 • 11:15 AM – 1:45 PM PDT Pavilion 4 P4-#4712

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Bingxiang He ⋅ Zekai Qu ⋅ Zeyuan Liu ⋅ Yinghao Chen ⋅ Yuxin Zuo ⋅ Cheng Qian ⋅ Kaiyan Zhang ⋅ Weize Chen ⋅ Chaojun Xiao ⋅ Ganqu Cui ⋅ Ning Ding ⋅ Zhiyuan Liu

Project Page [ Slides] [ Poster] [ OpenReview]

Abstract

Training small reasoning models with RL has become a race toward complexity, using multi-stage pipelines, dynamic schedules, and curriculum learning. We ask whether this complexity necessary? We show that JustRL, a simple recipe with fixed hyperparameters, achieves state-of-the-art performance on two different 1.5B base models (54.5% and 64.3% across 9 math benchmarks) while using 2× less compute than sophisticated approaches. The same hyperparameters transfer across both models without tuning, and training remains stable over thousands of steps without intervention. This suggests the field may be adding complexity to solve problems that disappear with a stable, scaled-up baseline.

Video

Chat is not available.