Keywords: [ deep reinforcement learning ] [ exploration ]
Goal-oriented Reinforcement Learning (GoRL) is a promising approach for scaling up RL techniques on sparse reward environments requiring long horizon planning. Recent works attempt to build suitable abstraction graph of the environment and enhance GoRL with classical graphical methods such as shortest path searching; however, these approaches mainly focus on either graph construction or agent exploitation, but leave the exploration lack of study. This paper proposes Graph-enhanced GoRL (G2RL), a new GoRL framework for effective exploration and efficient training based on the state-transition graph. We first introduce the optimal goals for exploration on the graph and then use them as supervised signals to train the goal generator in G2RL in a hindsight manner. Furthermore, we define relevant trajectories of a state based on its graph neighborhood and show that giving high priority to these trajectories would lead to an efficient policy learning. In addition to the theoretical results regarding optimal goal generation, our empirical results on standard discrete and continuous control benchmarks show that leveraging the state-transition graph is beneficial for GoRL to learn an effective and informative exploration strategy and outperform the state-of-the-art methods.