SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Haozhan Li ⋅ Yuxin Zuo ⋅ Jiale Yu ⋅ Yuhao Zhang ⋅ Yang Zhaohui ⋅ Kaiyan Zhang ⋅ Xuekai Zhu ⋅ Yuchen Zhang ⋅ Tianxing Chen ⋅ Ganqu Cui ⋅ Dehui Wang ⋅ Dingxiang Luo ⋅ Yuchen Fan ⋅ Youbang Sun ⋅ Jia Zeng ⋅ Jiangmiao Pang ⋅ Shanghang Zhang ⋅ Yu Wang ⋅ Yao Mu ⋅ Bowen Zhou ⋅ Ning Ding
Abstract
Vision-Language-Action (VLA) models have emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental challenges: (i) the scarcity and high cost of large-scale robotic trajectories required for SFT scaling, and (ii) limited generalization to tasks under distribution shift. To overcome these limitations, we explore reinforcement learning (RL) as a pathway to scaling VLA training beyond limited datasets. Inspired by LLM breakthroughs where RL with outcome rewards enhances step-by-step reasoning, we ask: Can outcome-driven RL improve long-horizon step-by-step action planning of VLA? In this work, we introduce SimpleVLA-RL, an efficient RL framework tailored for VLA models. Building upon veRL, we introduce VLA-specific trajectory sampling, scalable parallelization, multi-environment rendering, and optimized loss computation. Applied to OpenVLA-OFT, SimpleVLA-RL achieves 99\% of SoTA performance on LIBERO and 80\% relative improvement on RoboTwin 1.0\&2.0, outperforming $\pi_0$ with our proposed exploration-enhancing strategies. SimpleVLA-RL reduces dependence on large-scale data, enables robust generalization, and remarkably surpasses SFT in real-world tasks. Moreover, we identify a novel phenomenon "pushcut'' during RL training, wherein the policy discovers unseen patterns beyond those seen in previous training process.
Successful Page Load