Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning
On the Theory of Skill-based Reinforcement Learning: Distribution Recovery and Generalization
Hongyi Guo · Xiaoyu Chen · Sirui Zheng · Zhuoran Yang · Zhaoran Wang
Skill learning without rewards has recently emerged as a focal area in Reinforcement Learning (RL). To understand what ``skills'' represent, we introduce a new Latent Markov Decision Process (LMDP) model where the reward function is governed by a latent variable. We interpret 'skill' as the optimal policy corresponding to this latent variable. Drawing inspiration from the InfoVAE, we develop a theoretically-grounded objective for offline skill learning. Our objective can be seamlessly adapted to the online setting, aligning with commonly used online skill-learning objectives. Under certain conditions, our objective can also be used to solve the hindsight information matching problems, establishing a connection between our algorithm, decision transformers, and goal-based algorithms. We offer a generalization bound for our objective, demonstrating that the policy learned through our objective recovers the trajectory distribution of the dataset.