Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 7th Robot Learning Workshop: Towards Robots with Human-Level Abilities

Learning Long-Context Robot Policies via Past-Token Prediction

Marcel Torne Villasevil · Andy Tang · Yuejiang Liu · Chelsea Finn


Abstract:

Complex robotic tasks often require spatiotemporal reasoning over long sequences of actions and observations. Yet learning long-context policies remains difficult: as context length increases, the training process becomes increasingly compute and memory-intensive, and covariate shifts at deployment become more pronounced. Recent methods typically sidestep these challenges by discarding significant portions of the historical context, risking the loss of crucial information for subsequent decisions. In this paper, we propose a two-stage training approach that explicitly regularizes the information preserved in the learned representation: first, we pre-train a short-context encoder to predict a long sequence of future actions, thereby maximizing the information each frame encodes about long-range dependencies; then, given pre-computed frame embeddings, we fine-tune a long-context decoder on an auxiliary task, where the policy learns to predict past actions alongside future ones. This simple design yields two surprising benefits: substantially reduces memory consumption during training and greatly improves history awareness of the learned policy. Moreover, the auxiliary task provides a natural mechanism for self-verification, allowing the policy to assess its sampled predictions at test time. Experiments on manipulation tasks that necessitate extensive historical context demonstrate that our proposed method improves the performance of long-context policy by 3× and accelerates policy training by more than 10×.

Chat is not available.