AMRL: Aggregated Memory For Reinforcement Learning

Jacob Beck; Kamil Ciosek; Sam Devlin; Sebastian Tschiatschek; Cheng Zhang; Katja Hofmann

Abstract: In many partially observable scenarios, Reinforcement Learning (RL) agents must rely on long-term memory in order to learn an optimal policy. We demonstrate that using techniques from NLP and supervised learning fails at RL tasks due to stochasticity from the environment and from exploration. Utilizing our insights on the limitations of traditional memory methods in RL, we propose AMRL, a class of models that can learn better policies with greater sample efficiency and are resilient to noisy inputs. Specifically, our models use a standard memory module to summarize short-term context, and then aggregate all prior states from the standard model without respect to order. We show that this provides advantages both in terms of gradient decay and signal-to-noise ratio over time. Evaluating in Minecraft and maze environments that test long-term memory, we find that our model improves average return by 19% over a baseline that has the same number of parameters and by 9% over a stronger baseline that has far more parameters.

AMRL: Aggregated Memory For Reinforcement Learning

Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann

Similar Papers

Variational Recurrent Models for Solving Partially Observable Control Tasks

Dongqi Han, Kenji Doya, Jun Tani,

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Siddharth Reddy, Anca D. Dragan, Sergey Levine,

Making Sense of Reinforcement Learning and Probabilistic Inference

Brendan O'Donoghue, Ian Osband, Catalin Ionescu,