Poster
in
Workshop: Workshop on the Elements of Reasoning: Objects, Structure and Causality

INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision

Lluis Castrejon ⋅ Nicolas Ballas ⋅ Aaron Courville

Project Page [ Visit Poster at Spot A1 in Virtual World ] [ OpenReview]

Abstract

We propose INFERNO, a method to infer object-centric representations of visual scenes without annotations.Our method decomposes a scene into multiple objects, with each object having a structured representation that disentangles its shape, appearance and pose.Each object representation defines a localized neural radiance field used to generate 2D views of the scene through differentiable rendering. Our model is subsequently trained by minimizing a reconstruction loss between inputs and corresponding rendered scenes. We empirically show that INFERNO discovers objects in a scene without supervision.We also validate the interpretability of the learned representations by manipulating inferred scenes and showing the corresponding effect in the rendered output.Finally, we demonstrate the usefulness of our 3D object representations in a visual reasoning task using the CATER dataset.

Chat is not available.