Skip to yearly menu bar Skip to main content


Workshop

Jointly Learning "What" and "How" from Instructions and Goal-States

Dzmitry Bahdanau · Felix Hill · Jan Leike · Edward Hughes · Pushmeet Kohli · Edward Grefenstette

East Meeting Level 8 + 15 #3

Tue 1 May, 4:30 p.m. PDT

Training agents to follow instructions requires some way of rewarding them for behavior which accomplishes the intent of the instruction. For non-trivial instructions, which may be either underspecified or contain some ambiguity, it can be difficult or impossible to specify a reward function or obtain relatable expert trajectories for the agent to imitate. For these scenarios, we introduce a method which requires only pairs on instructions and examples of positive goal states, from which we can jointly learn a model of the instruction-conditional reward and a policy which executes instructions. Two sets of experiments in a gridworld compare the effectiveness of our method to that of RL when a reward function can be specified, and the application of our method when no reward function is defined. We furthermore evaluate the generalization of our approach to unseen instructions, and to scenarios where environment dynamics change outside of training, requiring fine-tuning of the policy ``in the wild''.

Live content is unavailable. Log in and register to view live content