Skip to yearly menu bar Skip to main content

Workshop: Reincarnating Reinforcement Learning

MOTO: Offline to Online Fine-tuning for Model-Based Reinforcement Learning

Rafael Rafailov · Kyle Hatch · Victor Kolev · John Martin · mariano Phielipp · Chelsea Finn


We study the problem of offline-to-online reinforcement learning from high-dimensional pixel observations. While recent model-free approaches successfully use offline pre-training with online fine-tuning to either improve the performance of the data-collection policy or adapt to novel tasks, model-based approaches still remain underutilized in this setting. In this work, we argue that existing methods for high-dimensional model-based offline RL are not suitable for offline-to-online fine-tuning due to issues with representation learning shifts, off-dynamics data, and non-stationary rewards. We propose a simple on-policy model-based method with adaptive behavior regularization. In our simulation experiments, we find that our approach successfully solves long-horizon robot manipulation tasks completely from images by using a combination of offline data and online interactions.

Chat is not available.