Poster
in
Workshop: SCOPE: SCALABLE OPTIMIZATION FOR EFFICIENT AND ADPATIVE FOUNDATION MODELS
Yes, Q-learning Helps Offline In-Context RL
Denis Tarasov · Alexander Nikulin · Ilya Zisman · Albina Klepach · Andrei Polubarov · Lyubaykin Nikita · Alexander Derevyagin · Igor Kiselev · Vladislav Kurenkov
Keywords: [ offline reinforcement learning ] [ in-context reinforcement learning ] [ reinforcement learning ]
In this preliminary work, we explore the integration of Reinforcement Learning (RL) approaches within a scalable offline In-Context RL (ICRL) framework. To the best of our knowledge, this is the first study to explicitly optimize the RL objective in an offline ICRL setting using a scalable Transformer architecture. Through experiments across 96 datasets derived from GridWorld-based environments, we demonstrate that optimizing RL objectives improves performance by approximately 30\% on average compared to the powerful Algorithm Distillation (AD) baseline. Our results reveal that RL-based methods, particularly those from the offline RL family, outperform approaches such as DQN, which is not specifically designed for offline scenarios, across various dataset coverages, expertise levels, and environmental complexities. These findings underscore the importance of aligning the learning objectives with RL’s reward-maximization goal and suggest promising directions for applying offline RL in ICRL settings.