Lifting Ego World Models for Planning and Control
Alex N. Wang ⋅ trevor darrell ⋅ Pavel Izmailov ⋅ Yutong Bai ⋅ Amir Bar
Abstract
World models have shown remarkable ability to predict future observations from high-dimensional action inputs, but planning in complex action spaces like human joint movement remains a difficult and unsolved problem. Inspired by hierarchical control in humans, we design a goal-conditioned controller policy to generate low-level joint actions conditioned on high-level waypoint inputs. Leveraging waypoint goal-conditioning and short-term motion patterns, we combine our policy with a low-level PEVA world model, lifting it its input to the high-level waypoint space. First, we show that waypoint goal conditioning improves Mean Joint Error (MJE) for a human-like agent by $5.8\times$ while being easily controllable and generalizing to unseen actions. Next, we perform visuomotor planning with the lifted PEVA world model for hybrid navigation-interaction tasks in the Nymeria dataset, improving MJE by up to $4.7\times$, while being more efficient and generalizing to entirely unseen environments.
Chat is not available.
Successful Page Load