Poster
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy

Efficient Consistency Model Training for Policy Distillation in Reinforcement Learning

Bowen Fang · Xuan Di

Keywords: Policy Gradient Probability Flow-ODE Consistency Model Reinforcement Learning Efficiency

Project Page [ OpenReview]

Abstract

This paper proposes an efficient consistency model (CM) training scheme in the context of reinforcement learning (RL). More specifically, we leverage Probability Flow ODE (PF-ODE) and introduce two novel loss functions to improve CM training for RL policy distillation. We propose Importance Weighting (IW) and Gumbel-Based Sampling (GBS) as strategies to refine policy learning under limited sampling budgets. Our approach enables efficient training by directly incorporating probability estimates, which reduces variance and improves sample efficiency. The numerical experiments demonstrate that our method outperforms conventional CM training, achieving more accurate policy representations under limited samples. These findings highlight the potential of CMs as an efficient alternative for policy optimization in RL.

Chat is not available.