ICLR Poster Efficient Imitation under Misspecification

Poster

Efficient Imitation under Misspecification

Nicolas Espinosa Dice · Sanjiban Choudhury · Wen Sun · Gokul Swamy

Hall 3 + Hall 2B #400

[ Abstract ] [ Project Page ]

Thu 24 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

Interactive imitation learning (IL) is a powerful paradigm for learning to make sequences of decisions from an expert demonstrating how to perform a task. Prior work in efficient imitation learning has focused on the realizable setting, where the expert's policy lies within the learner's policy class (i.e. the learner can perfectly imitate the expert in all states). However, in practice, perfect imitation of the expert is often impossible due to differences in state information and action space expressiveness (e.g. morphological differences between robots and humans.) In this paper, we consider the more general misspecified setting, where no assumptions are made about the expert policy's realizability. We introduce a novel structural condition, reward-agnostic policy completeness, and prove that it is sufficient for interactive IL algorithms to efficiently avoid the quadratically compounding errors that stymie offline approaches like behavioral cloning. We address an additional practical constraint—the case of limited expert data—and propose a principled method for using additional offline data to further improve the sample-efficiency of interactive IL algorithms. Finally, we empirically investigate the optimal reset distribution in efficient IRL under misspecification with a suite of continuous control tasks.

Live content is unavailable. Log in and register to view live content