Skip to yearly menu bar Skip to main content


Virtual Poster presentation / poster accept

Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics

Sirui Zheng · Lingxiao Wang · Shuang Qiu · Zuyue Fu · Zhuoran Yang · Csaba Szepesvari · Zhaoran Wang

Keywords: [ neural network ] [ Representation Learning. ] [ reinforcement learning ] [ Reinforcement Learning ]


Abstract:

Incorporated with the recent advances in deep learning, deep reinforcement learning (DRL) has achieved tremendous success in empirical study. However, analyzing DRL is still challenging due to the complexity of the neural network class. In this paper, we address such a challenge by analyzing the Markov decision process (MDP) with neural dynamics, which covers several existing models as special cases, including the kernelized nonlinear regulator (KNR) model and the linear MDP. We propose a novel algorithm that designs exploration incentives via learnable representations of the dynamics model by embedding the neural dynamics into a kernel space induced by the system noise. We further establish an upper bound on the sample complexity of the algorithm, which demonstrates the sample efficiency of the algorithm. We highlight that, unlike previous analyses of RL algorithms with function approximation, our bound on the sample complexity does not depend on the Eluder dimension of the neural network class, which is known to be exponentially large (Dong et al., 2021).

Chat is not available.