Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation

Suraj Nair, Chelsea Finn

Keywords: generation, planning, reinforcement learning, scalability, self supervised learning, self supervision, uncertainty, video prediction

Abstract: Video prediction models combined with planning algorithms have shown promise in enabling robots to learn to perform many vision-based tasks through only self-supervision, reaching novel goals in cluttered scenes with unseen objects. However, due to the compounding uncertainty in long horizon video prediction and poor scalability of sampling-based planning optimizers, one significant limitation of these approaches is the ability to plan over long horizons to reach distant goals. To that end, we propose a framework for subgoal generation and planning, hierarchical visual foresight (HVF), which generates subgoal images conditioned on a goal image, and uses them for planning. The subgoal images are directly optimized to decompose the task into easy to plan segments, and as a result, we observe that the method naturally identifies semantically meaningful states as subgoals. Across three out of four simulated vision-based manipulation tasks, we find that our method achieves more than 20% absolute performance improvement over planning without subgoals and model-free RL approaches. Further, our experiments illustrate that our approach extends to real, cluttered visual scenes.

Similar Papers

Deep Imitative Models for Flexible Inference, Planning, and Control
Nicholas Rhinehart, Rowan McAllister, Sergey Levine,
Dynamics-Aware Unsupervised Skill Discovery
Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman,