Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning
Provably Efficient Maximum Entropy Pure Exploration in Reinforcement Learning
Hongyi Guo · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang
Exploration plays a critical role in reinforcement learning (RL). Insufficient exploration could lead to sub-optimality or slow convergence. The exploration ability of an agent is reflected spontaneously on the state visitation measure induced by its policy. In this paper, we study how to find a policy that induces a state visitation measure with the maximum entropy. The problem is challenging mainly because the max-ent objective can not be optimized directly. To tackle the challenge, we provide a novel algorithm where we optimize the Fenchel duality of the objective instead of the objective itself. Compared with many heuristically designed exploration methods, ours is theoretically complete in that we establish the global optimality and convergence rate of our algorithm with neural networks. Furthermore, our algorithm extends naturally to multi-agent settings, and we prove empirically that our method improves remarkably the entropy of induced joint state visitation measure.