Skip to yearly menu bar Skip to main content


Poster

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

Guangqi Jiang · Yifei Sun · Tao Huang · Huanyu Li · Yongyuan Liang · Huazhe Xu

Hall 3 + Hall 2B #42
[ ] [ Project Page ]
Thu 24 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

The pre-training of visual representations has enhanced the efficiency of robot learning. Due to the lack of large-scale in-domain robotic datasets, prior works utilize in-the-wild human videos to pre-train robotic visual representation. Despite their promising results, representations from human videos are inevitably subject to distribution shifts and lack the dynamics information crucial for task completion. We first evaluate various pre-trained representations in terms of their correlation to the downstream robotic manipulation tasks (i.e., manipulation centricity). Interestingly, we find that the ''manipulation centricity'' is a strong indicator of success rates when applied to downstream tasks. Drawing from these findings, we propose Manipulation Centric Representation (MCR), a foundation representation learning framework capturing both visual features and the dynamics information such as actions and proprioceptions of manipulation tasks to improve manipulation centricity. Specifically, we pre-train a visual encoder on the DROID robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions. We introduce a novel contrastive loss that aligns visual observations with the robot's proprioceptive state-action dynamics, combined with an action prediction loss and a time contrastive loss during pre-training. Empirical results across four simulation domains with 20 robotic manipulation tasks demonstrate that MCR outperforms the strongest baseline by 14.8\%. Additionally, MCR significantly boosts the success rate in three real-world manipulation tasks by 76.9\%. Project website: robots-pretrain-robots.github.io

Live content is unavailable. Log in and register to view live content