In-Person Poster presentation / poster accept
Impossibly Good Experts and How to Follow Them
Aaron Walsman · Muru Zhang · Sanjiban Choudhury · Dieter Fox · Ali Farhadi
MH1-2-3-4 #104
Keywords: [ imitation learning ] [ distillation ] [ Experts ] [ reinforcement learning ] [ Reinforcement Learning ]
We consider the sequential decision making problem of learning from an expert that has access to more information than the learner. For many problems this extra information will enable the expert to achieve greater long term reward than any policy without this privileged information access. We call these experts ``Impossibly Good'' because no learning algorithm will be able to reproduce their behavior. However, in these settings it is reasonable to attempt to recover the best policy possible given the agent's restricted access to information. We provide a set of necessary criteria on the expert that will allow a learner to recover the optimal policy in the reduced information space from the expert's advice alone. We also provide a new approach called Elf Distillation (Explorer Learning from Follower) that can be used in cases where these criteria are not met and environmental rewards must be taken into account. We show that this algorithm performs better than a variety of strong baselines on a challenging suite of Minigrid and Vizdoom environments.