Skip to yearly menu bar Skip to main content

Workshop: Reincarnating Reinforcement Learning

Beyond Temporal Credit Assignment in Reinforcement Learning

Sephora Madjiheurem · Kimberly Stachenfeld · Peter Battaglia · Jessica Hamrick


In reinforcement learning, traditional value-based methods rely heavily on time as the main proxy for propagating information across the state space. This often results in slow learning and does not scale to large and complex environments. Here, we propose to leverage prior information about the structure of the the environment to assign credit non-temporally to improve learning efficiency. Specifically, we introduce the concept of structural neighbours, which are sets of states with similar semantic structures and which have equivalent values under the optimal policy. We augment traditional value-based RL methods (TD(0), Dyna and Dueling DQN) with a learning mechanism based on structural neighbours. Our empirical results show that by incorporating structural updates, learning efficiency can be greatly improved on a variety of environments ranging from simple tabular grid worlds to those which require function approximation, including the complex and high-dimensional game of Solitaire.

Chat is not available.