ICLR SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Poster
in
Workshop: 7th Robot Learning Workshop: Towards Robots with Human-Level Abilities

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Haoquan Fang · Markus Grotz · Wilbert Pumacay · Yi Ru Wang · Dieter Fox · Ranjay Krishna · Jiafei Duan

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Robotic manipulation systems operating in diverse, dynamic environments must exhibit three critical abilities: generalization to unseen scenarios, multitask interaction, and spatial memory. While significant progress has been made in robotic manipulation, existing approaches often fall short in addressing memory-dependent tasks and generalization to complex environmental variations. To bridge this gap, we introduce SAM2Act, a multi-view robotic transformer that leverages multi-resolution upsampling and visual representations from large-scale foundation models. SAM2Act achieves a state-of-the-art average success rate of 86.8% across 18 tasks in the RLBench benchmark, and demonstrates robust generalization on The Colosseum benchmark, with only a 4.3% performance drop under diverse environmental perturbations. Building on this foundation, we propose SAM2Act+, a memory-augmented architecture inspired by SAM2, which incorporates a memory bank and attention mechanism for spatial memory. To address the need for evaluating memory-dependent tasks, we introduce MemoryBench, a novel benchmark designed to assess spatial memory and action recall in robotic manipulation. SAM2Act+ achieves competitive performance on MemoryBench, significantly outperforming existing approaches and pushing the boundaries of memory-enabled robotic systems.

Chat is not available.

Poster in Workshop: 7th Robot Learning Workshop: Towards Robots with Human-Level Abilities

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Haoquan Fang · Markus Grotz · Wilbert Pumacay · Yi Ru Wang · Dieter Fox · Ranjay Krishna · Jiafei Duan

Poster
in
Workshop: 7th Robot Learning Workshop: Towards Robots with Human-Level Abilities