ICLR A Minimalist Ensemble Method for Generalizable Offline Deep Reinforcement Learning

Poster
in
Workshop: Generalizable Policy Learning in the Physical World

A Minimalist Ensemble Method for Generalizable Offline Deep Reinforcement Learning

Kun Wu · Yinuo Zhao · Zhiyuan Xu · Zhen Zhao · Pei Ren · Zhengping Che · Chi Liu · Feifei Feng · Jian Tang

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Deep Reinforcement Learning (DRL) has achieved awesome performance in a variety of applications. However, most existing DRL methods require massive active interactions with the environments, which is not practical in real-world scenarios. Moreover, most current evaluation environments are exactly the same as the training environments, leading to the negligence of the generalization ability of the agent. To fulfill the potential of DRL, an ideal policy should have 1) the ability to learn from a previously collected dataset (i.e., offline DRL) and 2) the generalization ability for the unseen scenarios and objects in the testing environments. Given the expert demonstrations collected from the training environments, the goal is to enhance the performance of the model in both the training and testing environments without any more interaction. In this paper, we proposed a minimalist ensemble imitation learning-based method that trains a bundle of agents with simple modifications on network architecture and hyperparameter tuning and combines them as an ensemble model. To verify our method, we took part in the No Interaction Track of the SAPIEN Manipulation Skill (ManiSkill) Challenge and conducted extensive experiments on the ManiSkill Benchmark. The challenge rank and experimental results well demonstrated the effectiveness of our method.

Chat is not available.

Poster in Workshop: Generalizable Policy Learning in the Physical World

A Minimalist Ensemble Method for Generalizable Offline Deep Reinforcement Learning

Kun Wu · Yinuo Zhao · Zhiyuan Xu · Zhen Zhao · Pei Ren · Zhengping Che · Chi Liu · Feifei Feng · Jian Tang

Poster
in
Workshop: Generalizable Policy Learning in the Physical World