Skip to yearly menu bar Skip to main content

Workshop: Reincarnating Reinforcement Learning

Bootstrapped Representations in Reinforcement Learning

Charline Le Lan · Stephen Tu · Mark Rowland · Anna Harutyunyan · Rishabh Agarwal · Marc G Bellemare · Will Dabney


In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, pretrained representations are often learnt from auxiliary tasks on offline datasets as part of an unsupervised pre-training phase to improve the sample efficiency of deep RL agents in a future online phase. Bootstrapping methods are today's method of choice to make these additional predictions but it is unclear which features are being learned. In this paper, we address this gap and provide a theoretical characterization of the pre-trained representation learnt by temporal difference learning \citep{sutton1988learning}. Surprisingly, we find that this representation differs from the features learned by pre-training with Monte Carlo and residual gradient algorithms for most transition structures of the environment. We describe the goodness of these pre-trained representations to linearly predict the value function given any downstream reward function, and use our theoretical analysis to design new unsupervised pre-training rules. We complement our theoretical results with an empirical comparison of these pre-trained representations for different cumulant functions on the four-room \citep{sutton99between} and Mountain Car \citep{Moore90efficientmemory-based} domains and demonstrate that they speed up online learning.

Chat is not available.