Skip to yearly menu bar Skip to main content

Spotlight Poster

Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning

Sumeet Batra · Bryon Tjanaka · Matthew Fontaine · Aleksei Petrenko · Stefanos Nikolaidis · Gaurav Sukhatme

Halle B #293
[ ] [ Project Page ]
Thu 9 May 1:45 a.m. PDT — 3:45 a.m. PDT


Training generally capable agents that thoroughly explore their environment andlearn new and diverse skills is a long-term goal of robot learning. Quality DiversityReinforcement Learning (QD-RL) is an emerging research area that blends thebest aspects of both fields – Quality Diversity (QD) provides a principled formof exploration and produces collections of behaviorally diverse agents, whileReinforcement Learning (RL) provides a powerful performance improvementoperator enabling generalization across tasks and dynamic environments. ExistingQD-RL approaches have been constrained to sample efficient, deterministic off-policy RL algorithms and/or evolution strategies and struggle with highly stochasticenvironments. In this work, we, for the first time, adapt on-policy RL, specificallyProximal Policy Optimization (PPO), to the Differentiable Quality Diversity (DQD)framework and propose several changes that enable efficient optimization anddiscovery of novel skills on high-dimensional, stochastic robotics tasks. Our newalgorithm, Proximal Policy Gradient Arborescence (PPGA), achieves state-of-the-art results, including a 4x improvement in best reward over baselines on thechallenging humanoid domain.

Chat is not available.