Skip to yearly menu bar Skip to main content

Workshop: Workshop on the Elements of Reasoning: Objects, Structure and Causality

INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision

Lluis Castrejon · Nicolas Ballas · Aaron Courville


We propose INFERNO, a method to infer object-centric representations of visual scenes without annotations.Our method decomposes a scene into multiple objects, with each object having a structured representation that disentangles its shape, appearance and pose.Each object representation defines a localized neural radiance field used to generate 2D views of the scene through differentiable rendering. Our model is subsequently trained by minimizing a reconstruction loss between inputs and corresponding rendered scenes. We empirically show that INFERNO discovers objects in a scene without supervision.We also validate the interpretability of the learned representations by manipulating inferred scenes and showing the corresponding effect in the rendered output.Finally, we demonstrate the usefulness of our 3D object representations in a visual reasoning task using the CATER dataset.

Chat is not available.