In-Person Poster presentation / poster accept
Interaction-Based Disentanglement of Entities for Object-Centric World Models
Akihiro Nakano · Masahiro Suzuki · Yutaka Matsuo
Keywords: [ Generative models ] [ probabilistic deep learning ] [ physics prediction ] [ VAEs ] [ World Models ] [ variational autoencoders ] [ structured models ] [ self-supervised learning ] [ video prediction ] [ model-based reinforcement learning ] [ unsupervised ] [ planning ] [ object-oriented ] [ object-centric ]
Perceiving the world compositionally in terms of space and time is essential to understanding object dynamics and solving downstream tasks. Object-centric learning using generative models has improved in its ability to learn distinct representations of individual objects and predict their interactions, and how to utilize the learned representations to solve untrained, downstream tasks is a focal question. However, as models struggle to predict object interactions and track the objects accurately, especially for unseen configurations, using object-centric representations in downstream tasks is still a challenge. This paper proposes STEDIE, a new model that disentangles object representations, based on interactions, into interaction-relevant relational features and interaction-irrelevant global features without supervision. Empirical evaluation shows that the proposed model factorizes global features, unaffected by interactions from relational features that are necessary to predict outcome of interactions. We also show that STEDIE achieves better performance in planning tasks and understanding causal relationships. In both tasks, our model not only achieves better performance in terms of reconstruction ability but also utilizes the disentangled representations to solve the tasks in a structured manner.