Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Abstract
Parallel thinking has emerged as a novel approach for enhancing the reasoning capabilities of large language models (LLMs) by exploring multiple reasoning paths concurrently. However, activating such capabilities through training remains challenging. Existing methods mainly rely on supervised fine-tuning (SFT) over synthetic data, which encourages teacher-forced learning rather than exploration and generalization. To address this issue, we propose Parallel-R1, the first reinforcement learning (RL) framework that instills parallel thinking for complex real-world reasoning tasks. Our framework employs a progressive curriculum that addresses the cold-start problem in training parallel thinking with RL. We first use SFT on prompt-generated trajectories from easier tasks to instill the parallel thinking behavior, then transition to RL to explore and generalize this skill on harder problems. Experiments on various math benchmarks, including MATH, AMC23, and AIME, show that Parallel-R1 successfully elicits parallel thinking, leading to 8.4% accuracy improvements over the sequential thinking model trained directly on difficult tasks with RL. Further analysis reveals a distinct shift in the model's thinking patterns: in the early stage, it utilizes parallel thinking as an exploration strategy, while in the later stage, it employs this ability for multi-perspective verification. Most significantly, we validate parallel thinking as a mid-training exploration scaffold, where this intermediate phase unlocks a higher performance ceiling after RL, yielding a 42.9% improvement over the sequential RL baseline.