Workshop: Generalizable Policy Learning in the Physical World

One-Shot Imitation with Skill Chaining using a Goal-Conditioned Policy in Long-Horizon Control

Hayato Watahiki · Yoshimasa Tsuruoka


Recent advances in skill learning from a task-agnostic offline dataset enable the agent to acquire various skills that can be used as primitives to perform long-horizon imitation. However, most work implicitly assumes that the offline dataset covers the entire distribution of target demonstrations. If the dataset only contains subtask-local trajectories, existing methods fail to imitate the transitions between subtasks without a sufficient amount of target demonstrations, significantly limiting the scalability of these methods. In this work, we show that a simple goal-conditioned policy can imitate the missing transitions using only the target demonstrations. We combine it with a policy-switching strategy that uses the skills when they are applicable. Furthermore, we present multiple choices that can effectively evaluate the applicability of skills. Our new method successfully performs one-shot imitation with skills learned from a subtask-local offline dataset. We experimentally show that it outperforms other one-shot imitation methods in a challenging kitchen environment, and we also qualitatively analyze how each policy-switching strategy works during imitation.

Chat is not available.