ICLR Poster Learning to Search from Demonstration Sequences

Poster

Learning to Search from Demonstration Sequences

Dixant Mittal · Liwei Kang · Wee Sun Lee

Hall 3 + Hall 2B #418

[ Abstract ]

Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Oral presentation: Oral Session 2A
Thu 24 Apr 12:30 a.m. PDT — 2 a.m. PDT

Abstract:

Search and planning are essential for solving many real-world problems. However, in numerous learning scenarios, only action-observation sequences, such as demonstrations or instruction sequences, are available for learning. Relying solely on supervised learning with these sequences can lead to sub-optimal performance due to the vast, unseen search space encountered during training. In this paper, we introduce Differentiable Tree Search Network (D-TSN), a novel neural network architecture that learns to construct search trees from just sequences of demonstrations by performing gradient descent on a best-first search tree construction algorithm. D-TSN enables the joint learning of submodules, including an encoder, value function, and world model, which are essential for planning. To construct the search tree, we employ a stochastic tree expansion policy and formulate it as another decision-making task. Then, we optimize the tree expansion policy via REINFORCE with an effective variance reduction technique for the gradient computation. D-TSN can be applied to problems with a known world model or to scenarios where it needs to jointly learn a world model with a latent state space. We study problems from these two scenarios, including Game of 24, 2D grid navigation, and Procgen games, to understand when D-TSN is more helpful. Through our experiments, we show that D-TSN is effective, especially when the world model with a latent state space is jointly learned. The code is available at https://github.com/dixantmittal/differentiable-tree-search-network.

Live content is unavailable. Log in and register to view live content