Spotlight
in
Workshop: Reincarnating Reinforcement Learning
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Mitsuhiko Nakamoto · Yuexiang Zhai · Anikait Singh · Yi Ma · Chelsea Finn · Aviral Kumar · Sergey Levine
A compelling use case of offline reinforcement learning (RL) is to obtain an effective policy initialization from existing datasets, which allows efficient fine-tuning with limited amounts of active online interaction in the environment. Many existing offline RL methods tend to exhibit poor fine-tuning performance. On the contrary, while naive online RL methods achieve compelling empirical performance, online methods suffer from a large sample complexity without a good policy initialization from the offline data. Our goal in this paper is to devise an approach for learning an effective offline initialization that also unlocks fast online fine-tuning capabilities. Our approach, calibrated Q-learning (Cal-QL) accomplishes this by learning a conservative value function initialization that underestimates the value of the learned policy from offline data, while also being calibrated, meaning that the learned value estimation still upper-bounds the ground-truth value of some other reference policy (e.g., the behavior policy). Both theoretically and empirically, we show that imposing these conditions speeds up online fine-tuning, and brings in benefits of the offline data. In practice, Cal-QL can be implemented on top of existing offline RL methods without any extra hyperparameter tuning. Empirically, Cal-QL outperforms state-of-the-art methods on a wide range of fine-tuning tasks from both state and visual observations, across several benchmarks.