Virtual Poster presentation / poster accept
Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics
Sirui Zheng · Lingxiao Wang · Shuang Qiu · Zuyue Fu · Zhuoran Yang · Csaba Szepesvari · Zhaoran Wang
Keywords: [ neural network ] [ Representation Learning. ] [ reinforcement learning ] [ Reinforcement Learning ]
Incorporated with the recent advances in deep learning, deep reinforcement learning (DRL) has achieved tremendous success in empirical study. However, analyzing DRL is still challenging due to the complexity of the neural network class. In this paper, we address such a challenge by analyzing the Markov decision process (MDP) with neural dynamics, which covers several existing models as special cases, including the kernelized nonlinear regulator (KNR) model and the linear MDP. We propose a novel algorithm that designs exploration incentives via learnable representations of the dynamics model by embedding the neural dynamics into a kernel space induced by the system noise. We further establish an upper bound on the sample complexity of the algorithm, which demonstrates the sample efficiency of the algorithm. We highlight that, unlike previous analyses of RL algorithms with function approximation, our bound on the sample complexity does not depend on the Eluder dimension of the neural network class, which is known to be exponentially large (Dong et al., 2021).