Poster
in
Affinity Workshop: Blog Track Session 5
Behavioral Differences in Mode-Switching Exploration for Reinforcement Learning
Loren Anderson · Nathan Bittner
Halle B #2
The exploration versus exploitation dilemma prevails as a fundamental challenge of reinforcement learning (RL), whereby an agent must exploit its knowledge of the environment to accrue the largest returns while also needing to explore the environment to discover these large returns. The vast majority of deep RL (DRL) algorithms manage this dilemma with a monolithic behavior policy that interleaves exploration actions randomly throughout the more frequent exploitation actions. In 2022, researchers from Google DeepMind presented an initial study on mode-switching exploration, by which an agent separates its exploitation and exploration actions more coarsely throughout an episode by intermittently and significantly changing its behavior policy. This study was partly motivated by the exploration strategies of humans and animals that exhibit similar behavior, and they showed how mode-switching policies outperformed monolithic policies when trained on hard-exploration Atari games. We supplement their work in this blog post by showcasing some observed behavioral differences between mode-switching and monolithic exploration on the Atari suite and presenting illustrative examples of its benefits. This work aids practitioners and researchers by providing practical guidance and eliciting future research directions in mode-switching exploration.