Sub-policy Adaptation for Hierarchical Reinforcement Learning

Alexander Li; Carlos Florensa; Ignasi Clavera; Pieter Abbeel

Abstract: Hierarchical reinforcement learning is a promising approach to tackle long-horizon decision-making problems with sparse rewards. Unfortunately, most methods still decouple the lower-level skill acquisition process and the training of a higher level that controls the skills in a new task. Leaving the skills fixed can lead to significant sub-optimality in the transfer setting. In this work, we propose a novel algorithm to discover a set of skills, and continuously adapt them along with the higher level even when training on a new task. Our main contributions are two-fold. First, we derive a new hierarchical policy gradient with an unbiased latent-dependent baseline, and we introduce Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy jointly. Second, we propose a method of training time-abstractions that improves the robustness of the obtained skills to environment changes. Code and videos are available at sites.google.com/view/hippo-rl.

Sub-policy Adaptation for Hierarchical Reinforcement Learning

Alexander Li, Carlos Florensa, Ignasi Clavera, Pieter Abbeel

Similar Papers

Composing Task-Agnostic Policies with Deep Reinforcement Learning

Ahmed H. Qureshi, Jacob J. Johnson, Yuzhe Qin, Taylor Henderson, Byron Boots, Michael C. Yip,

Option Discovery using Deep Skill Chaining

Akhil Bagaria, George Konidaris,

Learning to Coordinate Manipulation Skills via Skill Behavior Diversification

Youngwoon Lee, Jingyun Yang, Joseph J. Lim,