Poster
in
Workshop: Workshop on Reasoning and Planning for Large Language Models

LookPlanGraph: Embodied instruction following method with VLM graph augmentation

Anatolii Onishchenko · Aleksey Kovalev · Aleksandr Panov

Project Page [ OpenReview]

Abstract

Recently, approaches using Large Language Models as planners for robotic tasks have become widespread. In such systems, the LLM must be grounded in the environment in which the robot is operating in order to successfully complete tasks. One way to do this is to use a scene graph that contains all the information necessary to complete the task, including the presence and location of objects. In this paper, we propose an approach that works with a scene graph containing only immobile static objects, and augments the scene graph with the necessary movable objects during instruction following using a visual language model and an image from the agent's camera. We conduct thorough experiments on the SayPlan Office, BEHAVIOR-1K, and VirtualHome RobotHow datasets, and demonstrate that the proposed approach effectively handles the task, bypassing approaches that use pre-created scene graphs.

Chat is not available.