Skip to yearly menu bar Skip to main content

Workshop: Reincarnating Reinforcement Learning

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

Lina Mezghani · Sainbayar Sukhbaatar · Piotr Bojanowski · Karteek Alahari


The success of transformer models trained with a language modeling objective brings a promising opportunity to reinforcement learning. The Decision Transformer is a step towards this direction, showing how to train transformers with the same next-step prediction objective on offline data. Another important development in this area is the recent emergence of large-scale datasets collected from the internet. One interesting source of such data is tutorial videos with captions where people talk about what they are doing. To take advantage of this language component, we propose a novel method for unifying language reasoning with actions in a single policy. Specifically, we augment a transformer policy with word outputs, so it can generate textual captions interleaved with actions. When tested on the most challenging task in BabyAI, with captions describing next subgoals, our reasoning policy consistently outperforms the caption-free baseline.

Chat is not available.