Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 7th Robot Learning Workshop: Towards Robots with Human-Level Abilities

Self-supervised Visual State Representation Learning for robotics from Dynamic Scenes

Taekyung Kim · JeongEun Park · Sangdoo Yun · Dongyoon Han · Byeongho Heo


Abstract:

In robot policy learning, deriving informative state representations encompassing visual and proprioceptive representations is critical. While proprioceptions are acquired from internal sensors, visual state representations primarily rely on vision backbones. Therefore, leveraging a strong backbone generalized across diverse tasks and environments is essential for effective robotic perception. Self-supervised learning (SSL) has been a promising approach for pre-training such backbones. However, conventional SSL approaches for visual representation learning have predominantly focused on learning capability for a comprehensive understanding of a whole image or video, far from requisites for robotics such as seamless interactions. Bearing this in mind, we introduce a novel and intuitive self-supervised visual state representation learning pipeline designed to facilitate the acquisition of state representations through masked autoencoding. Our method implicitly dissolves the forming process of the state representations into the encoding process without any additional layers. Extensive experiments in diverse simulated environments demonstrate the superiority of our method in robot manipulation and locomotion tasks over previous baselines. Moreover, deploying our pre-trained model on physical robots confirms its robustness and effectiveness in real-world settings.

Chat is not available.