Skip to yearly menu bar Skip to main content

Workshop: Generalizable Policy Learning in the Physical World

A Study of Off-Policy Learning in Environments with Procedural Content Generation

Andrew Ehrenberg · Robert Kirk · Minqi Jiang · Edward Grefenstette · Tim Rocktaeschel


Environments with procedural content generation (PCG environments) are useful for assessing the generalization capacity of Reinforcement Learning (RL) agents. A growing body of work focuses on generalization in RL in PCG environments, with many methods being built on top of on-policy algorithms. On the other hand, off-policy methods have received less attention. Motivated by this discrepancy, we examine how Deep Q Networks (Mnih et al., 2013) perform on the Procgen benchmark (Cobbe et al., 2020), and look at the impact of various additions to DQN on performance. We find that some popular techniques that have improved DQN on benchmarks like the Arcade Learning Environment (Bellemare et al., 2015, ALE) do not carry over to Procgen, implying that some research has overfit to tasks that lack diversity, and fails to consider the importance of generalization.

Chat is not available.