Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
Xintong Duan · Yutong (Kelly) He · Fahim Tajwar · Russ Salakhutdinov · Zico Kolter · Jeff Schneider
Abstract
Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While consistency models offer a potential solution, existing applications to decision-making either struggle with suboptimal demonstrations under behavior cloning or rely on complex concurrent training of multiple networks under the actor-critic framework. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method achieves single-step diffusion sampling while generating higher-reward action trajectories through decoupled training and noise-free reward guidance. Empirical evaluations on the Gym MuJoCo, FrankaKitchen, and long horizon planning benchmarks demonstrate that our approach can achieve a $9.7$% improvement over previous state-of-the-art while leveraging CTM (Kim et al., 2023) to offer up to $142\times$ speedup over diffusion counterparts in inference time.
Successful Page Load