Skip to yearly menu bar Skip to main content


Poster
in
Workshop: SCOPE: SCALABLE OPTIMIZATION FOR EFFICIENT AND ADPATIVE FOUNDATION MODELS

Yes, Q-learning Helps Offline In-Context RL

Denis Tarasov · Alexander Nikulin · Ilya Zisman · Albina Klepach · Andrei Polubarov · Lyubaykin Nikita · Alexander Derevyagin · Igor Kiselev · Vladislav Kurenkov

Keywords: [ offline reinforcement learning ] [ in-context reinforcement learning ] [ reinforcement learning ]


Abstract:

In this preliminary work, we explore the integration of Reinforcement Learning (RL) approaches within a scalable offline In-Context RL (ICRL) framework. To the best of our knowledge, this is the first study to explicitly optimize the RL objective in an offline ICRL setting using a scalable Transformer architecture. Through experiments across 96 datasets derived from GridWorld-based environments, we demonstrate that optimizing RL objectives improves performance by approximately 30\% on average compared to the powerful Algorithm Distillation (AD) baseline. Our results reveal that RL-based methods, particularly those from the offline RL family, outperform approaches such as DQN, which is not specifically designed for offline scenarios, across various dataset coverages, expertise levels, and environmental complexities. These findings underscore the importance of aligning the learning objectives with RL’s reward-maximization goal and suggest promising directions for applying offline RL in ICRL settings.

Chat is not available.