Poster Session #2
in
Workshop: Workshop on Scaling Post-training for LLMs (SPOT) Mon, Apr 27, 2026 • 10:30 AM – 11:10 AM PDT

Sparse Attention for Efficient LLM Reinforcement Learning

Yang Zhou ⋅ Ranajoy Sadhukhan ⋅ Zhaofeng Sun ⋅ Zhuoming Chen ⋅ Souvik Kundu ⋅ Saket Dingliwal ⋅ Sai Muralidhar Jayanthi ⋅ Aram Galstyan ⋅ Haizhong Zheng ⋅ Beidi Chen

Project Page [ OpenReview]

Abstract

Reinforcement learning (RL) is a key driver of recent progress in large language model reasoning, but its scalability is increasingly limited by the cost of online rollouts, especially for long chain-of-thought generation and large-batch sampling. Sparse attention is a promising way to reduce per-token attention cost and improve rollout throughput, yet we find that practical sparse rollouts often destabilize training: approximation errors bias likelihood estimates, causing large actor–policy distribution mismatch that compounds over long trajectories and can collapse training. We propose DISTILLSPARSE, a robust sparse-rollout framework that restores distribution alignment while preserving speed. DISTILLSPARSE co-trains a sparse rollout policy via lightweight, LoRA-based on-policy distillation from the dense policy to prevent mismatch from accumulating across RL iterations. For long generations and high sparsity, DISTILLSPARSE further oversamples rollout candidates and applies reward-aware filtering to focus updates on trajectories that are both high-quality and closer to the dense distribution. We evaluate on POLARIS across 4B–8B models and mathematical reasoning benchmarks including AIME24/25, AMC23, and Math500. Across settings where training-free sparse rollouts degrade or collapse, DISTILLSPARSE matches dense-rollout training performance while providing substantial practical acceleration, achieving a 1.72× rollout speedup on NVIDIA H200 at 16K generation length with minimal overhead.

Chat is not available.