Skip to yearly menu bar Skip to main content

Workshop: Reincarnating Reinforcement Learning

Chain-of-Thought Predictive Control with Behavior Cloning

Zhiwei Jia · Fangchen Liu · Vineet Thumuluri · Linghao Chen · Zhiao Huang · Hao Su


We study how to learn generalizable policies from demonstrations for complex continuous space tasks (e.g., low-level object manipulations). We aim to leverage the applicability & scalability of Behavior Cloning (BC) combined with the planning capabilities & generalizability of Model Predictive Control (MPC), and at the same time, overcome the challenges of BC with sub-optimal demos and enable planning-based control over a much longer horizon. Specifically, we utilize hierarchical structures in object manipulation tasks via key states that mark the boundary between sub-stages of a trajectory. We couple key state (the chain-of-thought) and action predictions during both training and evaluation stages, providing the model with a structured peek into the long-term future to dynamically adjust its plan. Our method resembles a closed-loop control design and we call it Chain-of-Thought Predictive Control (CoTPC). We empirically find key states governed by learnable patterns shared across demos, and thus CoTPC eases the optimization of BC and produces policies much more generalizable than existing methods on four challenging object manipulation tasks in ManiSkill2.

Chat is not available.