## Workshop on the Elements of Reasoning: Objects, Structure and Causality

### Sungjin Ahn · Wilka Carvalho · Klaus Greff · Tong He · Thomas Kipf · Francesco Locatello · Sindy Löwe

Abstract Workshop Website
Fri 29 Apr, midnight PDT

Abstract:

Discrete abstractions such as objects, concepts, and events are at the basis of our ability to perceive the world, relate the pieces in it, and reason about their causal structure. The research communities of object-centric representation learning and causal machine learning, have – largely independently – pursued a similar agenda of equipping machine learning models with more structured representations and reasoning capabilities. Despite their different languages, these communities have similar premises and overall pursue the same benefits. They operate under the assumption that, compared to a monolithic/black-box representation, a structured model will improve systematic generalization, robustness to distribution shifts, downstream learning efficiency, and interpretability. Both communities typically approach the problem from opposite directions. Work on causality often assumes a known (true) decomposition into causal factors and is focused on inferring and leveraging interactions between them. Object-centric representation learning, on the other hand, typically starts from an unstructured input and aims to infer a useful decomposition into meaningful factors, and has so far been less concerned with their interactions.This workshop aims to bring together researchers from object-centric and causal representation learning. To help integrate ideas from these areas, we invite perspectives from the other fields including cognitive psychology and neuroscience. We hope that this creates opportunities for discussion, presenting cutting-edge research, establishing new collaborations and identifying future research directions.

Chat is not available.
Timezone: America/Los_Angeles »

### Schedule

 Fri 12:00 a.m. - 12:10 a.m. Introduction and Opening Remarks (Opening remarks)  link » Klaus Greff 🔗 Fri 12:10 a.m. - 12:50 a.m. Invited Talk - Bernhard Schölkopf: Towards Causal Representation Learning (Invited talk)  link » Bernhard Schoelkopf 🔗 Fri 12:50 a.m. - 1:00 a.m. Q&A - Bernhard Schölkopf (Q&A)  link » Bernhard Schoelkopf · Klaus Greff 🔗 Fri 1:00 a.m. - 1:15 a.m. Disentanglement and Generalization Under Correlation Shifts (Oral)  link »    Correlations between factors of variation are prevalent in real-world data. However, often such correlations are not robust (e.g., they may change between domains, datasets, or applications) and we wish to avoid exploiting them. Disentanglement methods aim to learn representations which capture different factors of variation in latent subspaces. A common approach involves minimizing the mutual information between latent subspaces, such that each encodes a single underlying attribute. However, this fails when attributes are correlated. We solve this problem by enforcing independence between subspaces conditioned on the available attributes, which allows us to remove only dependencies that are not due to the correlation structure present in the training data. We achieve this via an adversarial approach to minimize the conditional mutual information (CMI) between subspaces with respect to categorical variables. We first show theoretically that CMI minimization is a good objective for robust disentanglement on linear problems with Gaussian data. We then apply our method on real-world datasets based on MNIST and CelebA, and show that it yields models that are disentangled and robust under correlation shift, including in weakly supervised settings. Link » Christina Funke · Paul Vicol · Kuan-Chieh Wang · Matthias Kümmerer · Richard Zemel · Matthias Bethge 🔗 Fri 1:15 a.m. - 1:30 a.m. Learning Fourier-Sparse Functions on DAGs (Oral)  link »    We show that the classical Moebius transform from combinatorics can be interpreted as a causal form of Fourier transform on directed acyclic graphs (DAGs). The associated Fourier basis, which is spanned by the columns of the zeta transform, enables us to make use of Fourier-sparse learning methods to learn functions on the vertices of DAGs from few observations. As a prototypical application example we construct a DAG from a dynamic contact tracing network, in which each vertex represents an individual at a given timestamp, and learn the function that indicates which of the vertices are infected by a disease. Link » Bastian Seifert · Chris Wendler · Markus Püschel 🔗 Fri 1:30 a.m. - 2:30 a.m. Poster Session 1 (Poster session)  link » Sungjin Ahn 🔗 Fri 2:30 a.m. - 2:38 a.m. Break 🔗 Fri 2:38 a.m. - 2:40 a.m. Speaker introduction (Live intro) Tong He 🔗 Fri 2:40 a.m. - 3:10 a.m. Invited Talk - Qianru Sun: Invariant Learning from Insufficient Data (Invited talk)  link »    If we have sufficient training data of every class, e.g., “dog” and “cat” images with different shapes, poses, colors, and backgrounds (i.e., in different environments), by using a conventional softmax cross-entropy based “dog vs. cat” classifier, we can obtain a perfect “dog-cat” model. However, we don’t have such training data in reality, and need to learn models from insufficient data. In this keynote, we will talk about why insufficient data renders the model easily biased to the limited environments in training data; and how to do invariant learning that learns the inherent causality of image recognition and yields generalizable models to the different environments in testing data. Link » Qianru Sun 🔗 Fri 3:10 a.m. - 3:20 a.m. Q&A - Qianru Sun (Q&A)  link » Qianru Sun · Tong He 🔗 Fri 3:20 a.m. - 3:50 a.m. Invited Talk - Karl Stelzner: 3D Geometry: The Latent Variable We Can Touch (Invited talk)  link »    Scene understanding models seek to extract the latent factors underlying visual observations. But only recently have they started to account for what is arguably the most fundamental of these factors: the 3D geometry of the world around us. In this talk, we investigate recent approaches which learn to infer 3D aware representations from images in a self-supervised way. In particular, we discuss how we may leverage 3D representations for unsupervised object discovery. We conclude by considering current questions, including how 3D geometry should be built into the model structure, how uncertainty can be handled, and how we might improve the scalability of these models. Link » Karl Stelzner 🔗 Fri 3:50 a.m. - 4:00 a.m. Q&A - Karl Stelzner (Q&A)  link » Karl Stelzner · Thomas Kipf 🔗 Fri 4:00 a.m. - 6:38 a.m. Break  link » 🔗 Fri 6:38 a.m. - 6:40 a.m. Speaker introduction (Live intro) Thomas Kipf 🔗 Fri 6:40 a.m. - 7:20 a.m. Invited Talk - Nikolaus Kriegeskorte: Resource-rational vision: data, time, and space for perception and learning (Invited talk)  link » Nikolaus Kriegeskorte 🔗 Fri 7:20 a.m. - 7:30 a.m. Q&A - Nikolaus Kriegeskorte (Q&A)  link » Nikolaus Kriegeskorte · Thomas Kipf 🔗 Fri 7:30 a.m. - 8:00 a.m. Invited Talk - Rosemary Ke: From “what” to “why”: Towards causal deep learning (Invited talk)  link » Nan Rosemary Ke 🔗 Fri 8:00 a.m. - 8:10 a.m. Q&A - Rosemary Ke (Q&A)  link » Nan Rosemary Ke · Francesco Locatello 🔗 Fri 8:10 a.m. - 8:25 a.m. Object Representations as Fixed Points: Training Iterative Inference Algorithms with Implicit Differentiation (Oral)  link »    Deep generative models, particularly those that aim to factorize the observations into discrete entities (such as objects), must often use iterative inference procedures that break symmetries among equally plausible explanations for the data. Such inference procedures include variants of the expectation-maximization algorithm and structurally resemble clustering algorithms in a latent space. However, combining such methods with deep neural networks necessitates differentiating through the inference process, which can make optimization exceptionally challenging. We observe that such iterative amortized inference methods can be made differentiable by means of the implicit function theorem, and develop an implicit differentiation approach that improves the stability and tractability of training such models by decoupling the forward and backward passes. This connection enables us to apply recent advances in optimizing implicit layers to not only improve the stability and optimization of the slot attention module in SLATE, a state-of-the-art method for learning entity representations, but do so with constant space and time complexity in backpropagation and only one additional line of code. Link » Michael Chang · Thomas L. Griffiths · Sergey Levine 🔗 Fri 8:25 a.m. - 8:40 a.m. On the Identifiability of Nonlinear ICA with Unconditional Priors (Oral)  link »    Nonlinear independent component analysis (ICA) aims to recover the underlying marginally independent latent sources from their observable nonlinear mixtures. The identifiability of nonlinear ICA is a major unsolved problem in unsupervised learning. Recent breakthroughs reformulate the standard marginal independence assumption of sources as conditional independence given some auxiliary variables (e.g., class labels) as weak supervision or inductive bias. However, the modified setting is not applicable in many scenarios that do not have auxiliary variables. We explore an alternative path and consider instead only assumptions on the mixing process, such as the pairwise orthogonality among the columns of the Jacobian of the mixing function. We show that marginally independent latent sources can be identified from strongly nonlinear mixtures up to a component-wise transformation and a permutation, thus providing, to the best of our knowledge, a first full identifiability result of nonlinear ICA without auxiliary variables. We provide an estimation method and validate the theoretical results experimentally. Link » Yujia Zheng · Zhi Yong Ignavier Ng · Kun Zhang 🔗 Fri 8:40 a.m. - 9:40 a.m. Poster Session 2 (Poster session)  link » Francesco Locatello 🔗 Fri 9:40 a.m. - 9:48 a.m. Break  link » 🔗 Fri 9:48 a.m. - 9:50 a.m. Speaker introduction (Live intro) Wilka Carvalho 🔗 Fri 9:50 a.m. - 10:30 a.m. Invited Talk - Alison Gopnik: Causal Learning in Children and Computers (Invited talk)  link » I will describe our research showing how even very young human children engage in effective causal inference - discovering new causal relationships through observation and intervention. This includes not only inferring specific causal relationships but the discovery of abstract causal “over-hypotheses” , variable discovery, analogical reasoning and active learning through exploration. I will discuss implications for causal learning in AI systems and for designing machine common sense. Link » Alison Gopnik 🔗 Fri 10:30 a.m. - 10:40 a.m. Q&A - Alison Gopnik (Q&A)  link » Alison Gopnik · Wilka Carvalho 🔗 Fri 10:40 a.m. - 11:40 a.m. Panel Discussion (Panel discussion)  link » Nikolaus Kriegeskorte · Nan Rosemary Ke · Wilka Carvalho · Karl Stelzner · Sjoerd van Steenkiste · Sara Magliacane · Sindy Löwe 🔗 Fri 11:40 a.m. - 11:50 a.m. Closing Remarks (Closing remarks)  link » Thomas Kipf 🔗 - LogicInference: A new Datasaet for Teaching Logical Inference to seq2seq Models (Poster)  link » Machine learning models such as Transformers or LSTMs struggle with tasks that are compositional in nature such as those involving reasoning/inference. Although many datasets exist to evaluate compositional generalization, when it comes to evaluating inference abilities, options are more limited. This paper presents LogicInference, a new dataset to evaluate the ability of models to perform logical inference. The dataset focuses on inference using propositional logic and a small subset of first-order logic, represented both in semi-formal logical notation, as well as in natural language. We also report initial results using a collection of machine learning models to establish an initial baseline in this dataset. Link » Santiago Ontanon · Joshua Ainslie · Vaclav Cvicek · Zachary Fisher 🔗 - Finding Structure and Causality in Linear Programs (Poster)  link » Linear Programs (LP) are celebrated widely, particularly so in machine learning where they have allowed for effectively solving probabilistic inference tasks or imposing structure on end-to-end learning systems. Their potential might seem depleted but we propose a foundational, causal perspective that reveals intriguing intra- and inter-structure relations for LP components. We conduct a systematic, empirical investigation on general-, shortest path- and energy system LPs. Link » Matej Zečević · Florian Peter Busch · Devendra Dhami · Kristian Kersting 🔗 - Weakly supervised causal representation learning (Poster)  link » Learning high-level causal representations together with a causal model from unstructured low-level data such as pixels is impossible from observational data alone. We prove under mild assumptions that this representation is identifiable in a weakly supervised setting. This requires a dataset with paired samples before and after random, unknown interventions, but no further labels. Finally, we show that we can infer the representation and causal graph reliably in a simple synthetic domain using a variational autoencoder with a structural causal model as prior. Link » Johann Brehmer · Pim De Haan · Phillip Lippe · Taco Cohen 🔗 - CITRIS: Causal Identifiability from Temporal Intervened Sequences (Poster)  link » We propose CITRIS, a variational framework that learns causal representations from temporal sequences of images with interventions. In contrast to the recent literature, CITRIS exploits temporality and the observation of intervention targets to identify scalar and multidimensional causal factors. Furthermore, by introducing a normalizing flow, we extend CITRIS to leverage and disentangle representations obtained by already pretrained autoencoders. Extending previous results on scalar causal factors, we prove identifiability in a more general setting, in which only some components of a causal factor are affected by interventions. In experiments on 3D rendered image sequences, CITRIS outperforms previous methods on recovering the underlying causal variables, and can even generalize to unseen instantiations of causal factors, opening future research areas in sim-to-real generalization. Link » Phillip Lippe · Sara Magliacane · Sindy Löwe · Yuki Asano · Taco Cohen · Efstratios Gavves 🔗 - Learning Articulated Rigid Body Dynamics Simulations From Video (Poster)  link » Being able to reproduce physical phenomena, ranging from light interaction to contact mechanics, simulators are becoming increasingly useful to more and more application domains where real-world interaction or labeled data is difficult to obtain. Despite the gain in attention, it requires significant human effort to configure simulators to accurately reproduce real-world behaviors.We introduce a pipeline that combines inverse rendering with differentiable simulation to create digital twins of real-world articulated mechanisms from depth or RGB videos. Our approach automatically discovers joint types and estimates their kinematic parameters, while the dynamic properties of the overall mechanism are tuned to attain physically accurate simulations.On a real-world coupled pendulum system observed through RGB video, we correctly determine its articulation and simulation parameters, such that its motion can be reproduced accurately in a physics engine.Having learned a simulator from depth video, we demonstrate on a simulated cartpole that a model-predictive controller can leverage such dynamics model to control nonlinear systems. Link » Eric Heiden · Ziang Liu · Vibhav Vineet · Erwin Coumans · Gaurav Sukhatme 🔗 - Towards self-supervised learning of global and object-centric representations (Poster)  link » Self-supervision allows learning meaningful representations of natural images which usually contain one central object. How well does it transfer to multi-entity scenes? We discuss key aspects of learning structured object-centric representations with self-supervision and validate our insights through several experiments on the CLEVR dataset. Regarding the architecture, we confirm the importance of competition for attention-based object discovery, where each image patch is exclusively attended by one object. For training, we show that contrastive losses equipped with matching can be applied directly in a latent space, avoiding pixel-based reconstruction. However, such an optimization objective is sensitive to false negatives (recurring objects) and false positives (matching errors). Thus, careful consideration is required around data augmentation and negative sample selection. Anonymized repository: https://anonymous.4open.science/r/iclr-osc-22 Link » Federico Baldassarre · Hossein Azizpour 🔗 - DAG Learning on the Permutahedron (Poster)  link » We introduce Daguerro, a strategy for learning directed acyclic graphs (DAGs). In contrast to previous methods, our problem formulation (i) guarantees to learn a DAG, (ii) does not propagate errors over multiple stages, and (iii) can be trained end-to-end without pre-processing steps. Our algorithm leverages advances in differentiable sparse structured inference for learning a total ordering of the variables in the simplex of permutation vectors (the permutahedron), allowing for a substantial reduction in memory and time complexities w.r.t. existing permutation-based continuous optimization methods. Link » Valentina Zantedeschi · Jean Kaddour · Luca Franceschi · Matt Kusner · Vlad Niculae 🔗 - Factorized World Models for Learning Causal Relationships (Poster)  link » World models serve as a powerful framework for model-based reinforcement learning, and they can greatly benefit from the shared structure of the world environments. However, learning the high-level causal influence of objects on each other remains a challenge. In this work, we propose CEMA, a structured world model with factorized latent state capable of modeling sparse interaction, with non-zero components corresponding to events of interest. This is possible due to a separate state and dynamics of three components: the actor, the object of manipulation, the latent influence factor between these two states. In multitask setting, we analyze the mutual information of the hierarchical latent states to show how the model can represent sparse updates and directly model the causal influence of the robot on the object. Link » Artem Zholus · Yaroslav Ivchenkov · Aleksandr Panov 🔗 - Compositional Multi-object Reinforcement Learning with Linear Relation Networks (Poster)  link » Although reinforcement learning has seen remarkable progress over the last years, solving robust dexterous object-manipulation tasks in multi-object settings remains a challenge. In this paper, we focus on models that can learn manipulation tasks in fixed multi-object settings \emph{and} extrapolate this skill zero-shot without any drop in performance when the number of objects changes. We consider the generic task of bringing a specific cube out of a set to a goal position. We find that previous approaches, which primarily leverage attention and graph neural network-based architectures, do not generalize their skills when the number of input objects changes while scaling as $K^2$. We propose an alternative plug-and-play module based on relational inductive biases to overcome these limitations. Besides exceeding performances in their training environment, we show that our approach, which scales linearly in $K$, allows agents to extrapolate and generalize zero-shot to any new object number. Link » Davide Mambelli · Frederik Träuble · Stefan Bauer · Bernhard Schoelkopf · Francesco Locatello 🔗 - Object-Centric Learning as Nested Optimization (Poster)  link » Various iterative algorithms have shown promising results in unsupervised decomposition simple visual scenes into representations of humans could intuitively consider objects, but all with different algorithmic and implementational design choices for making them work.In this paper, we ask what the underlying computational problem that all of these iterative approaches are solving.We show that these approaches can all be viewed as instances of algorithms for solving a particular nested optimization problem whose inner optimization is that of maximizing the ELBO with respect to a set of independently initialized parameters for each datapoint.We lastly discuss how our nested optimization formulation reveals connections to similar problems studied in other fields, enabling us to leverage tools developed in these other fields to improve our object-centric learning methods. Link » Michael Chang · Sergey Levine · Thomas L. Griffiths 🔗 - Action-Sufficient State Representation Learning for Control with Structural Constraints (Poster)  link » Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks. In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed \textit{Action-Sufficient state Representations} (ASRs). We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs based on structural constraints and the goal of maximizing cumulative reward in policy learning. We then develop a structured sequential Variational Auto-Encoder to estimate the environment model and extract ASRs. Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning. Moreover, the estimated environment model and ASRs allow learning behaviors from imagined outcomes in the compact latent space to improve sample efficiency. Link » Biwei Huang · Chaochao Lu · Liu Leqi · José Miguel Hernández Lobato · Clark Glymour · Bernhard Schoelkopf · Kun Zhang 🔗 - Invariant Causal Representation Learning for Generalization in Imitation and Reinforcement Learning (Poster)  link » A fundamental challenge in imitation and reinforcement learning is to learn policies, representations, or dynamics that do not build on spurious correlations and generalize beyond the specific environments that they were trained on. We investigate these generalization problems from a unified view. For this, we propose a general framework to tackle them with theoretical guarantees on both identifiability and generalizability under mild assumptions on environmental changes. By leveraging a diverse set of training environments, we construct a data representation that ignores any spurious features and consistently predicts target variables well across environments. Following this approach, we build invariant predictors in terms of policy, representations, and dynamics. We theoretically show that the resulting policies, representations, and dynamics are able to generalize to unseen environments. Extensive experiments on both synthetic and real-world datasets show that our methods attain improved generalization over a variety of baselines. Link » Chaochao Lu · José Miguel Hernández Lobato · Bernhard Schoelkopf 🔗 - Object Representations as Fixed Points: Training Iterative Inference Algorithms with Implicit Differentiation (Poster)  link » Deep generative models, particularly those that aim to factorize the observations into discrete entities (such as objects), must often use iterative inference procedures that break symmetries among equally plausible explanations for the data. Such inference procedures include variants of the expectation-maximization algorithm and structurally resemble clustering algorithms in a latent space. However, combining such methods with deep neural networks necessitates differentiating through the inference process, which can make optimization exceptionally challenging. We observe that such iterative amortized inference methods can be made differentiable by means of the implicit function theorem, and develop an implicit differentiation approach that improves the stability and tractability of training such models by decoupling the forward and backward passes. This connection enables us to apply recent advances in optimizing implicit layers to not only improve the stability and optimization of the slot attention module in SLATE, a state-of-the-art method for learning entity representations, but do so with constant space and time complexity in backpropagation and only one additional line of code. Link » Michael Chang · Thomas L. Griffiths · Sergey Levine 🔗 - Object-centric Compositional Imagination for Visual Abstract Reasoning (Poster)  link » Like humans devoid of imagination, current machine learning systems lack the ability to adapt to new, unexpected situations by foreseeing them, which makes them unable to solve new tasks by analogical reasoning. In this work, we introduce a new compositional imagination framework that improves a model's ability to generalize. One of the key components of our framework is object-centric inductive biases that enables models to perceive the environment as a series of objects, properties, and transformations. By composing these key ingredients, it is possible to generate new unseen tasks that, when used to train the model, improve generalization. Experiments on a simplified version of the Abstraction and Reasoning Corpus (ARC) demonstrate the effectiveness of our framework. Link » Rim Assouel · Pau Rodriguez Lopez · Perouz Taslakian · David Vazquez · Yoshua Bengio 🔗 - Recognizing Actions using Object States (Poster)  link » Object-centric actions cause changes in object states, including their visual appearance and their immediate context. We propose a computational framework that uses only two object states, start and end, and learns to recognize the under-lying actions. Our approach has two modules that learn subtle changes induced by the action and suppress spurious correlations. We demonstrate that only two object states are sufficient to recognize object-centric actions. Our framework per-forms better than approaches that use multiple frames and a relatively large model.Moreover, our method generalizes to unseen objects and unseen video datasets Link » Nirat Saini · Bo He · Gaurav Shrivastava · Sai Saketh Rambhatla · Abhinav Shrivastava 🔗 - Inductive Biases for Relational Tasks (Poster)  link » Current deep learning approaches have shown good in-distribution performance but struggle in out-of-distribution settings. This is especially true in the case of tasks involving abstract relations like recognizing rules in sequences, as required in many intelligence tests. In contrast, our brains are remarkably flexible at such tasks, an attribute that is likely linked to anatomical constraints on computations. Inspired by this, recent work has explored how enforcing that relational representations remain distinct from sensory representations can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by partitioned'' representations of relations and sensory details. We investigate inductive biases that ensure abstract relations are learned and represented distinctly from sensory data across several neural network architectures and show that they outperform existing architectures on out-of-distribution generalization for various relational tasks. These results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing relational computations. Link » Giancarlo Kerg · Sarthak Mittal · David Rolnick · Yoshua Bengio · Blake A Richards · Guillaume Lajoie 🔗 - Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning (Poster)  link » In this work, we consider one-shot imitation learning for object rearrangement tasks, where an AI agent needs to watch a single expert demonstration and learn to perform the same task in different environments. To achieve a strong generalization, the AI agent must infer the spatial goal specification for the task. However, there can be multiple goal specifications that fit the given demonstration. To address this, we propose a reward learning approach, Graph-based Equivalence Mappings (GEM), that can discover spatial goal representations that are aligned with the intended goal specification, enabling successful generalization in unseen environments. We conducted experiments with simulated oracles and with human subjects. The results show that GEM can drastically improve the generalizability of the learned goal representations over strong baselines. Link » Aviv Netanyahu · Tianmin Shu · Joshua B Tenenbaum · Pulkit Agrawal 🔗 - Align-Deform-Subtract: An interventional framework for explaining object differences (Poster)  link » Given two object images, how can we explain their differences in terms of the underlying object properties? To address this question, we propose Align-Deform-Subtract (ADS)---an interventional framework for explaining object differences. By leveraging semantic alignments in image-space as counterfactual interventions on the underlying object properties, ADS iteratively quantifies and removes differences in object properties. The result is a set of "disentangled" error measures that explain object differences in terms of their underlying properties. Experiments on real and synthetic data illustrate the efficacy of the framework. Link » Cian Eastwood · Li Nanbo · Chris Williams 🔗 - Disentanglement and Generalization Under Correlation Shifts (Poster)  link » Correlations between factors of variation are prevalent in real-world data. However, often such correlations are not robust (e.g., they may change between domains, datasets, or applications) and we wish to avoid exploiting them. Disentanglement methods aim to learn representations which capture different factors of variation in latent subspaces. A common approach involves minimizing the mutual information between latent subspaces, such that each encodes a single underlying attribute. However, this fails when attributes are correlated. We solve this problem by enforcing independence between subspaces conditioned on the available attributes, which allows us to remove only dependencies that are not due to the correlation structure present in the training data. We achieve this via an adversarial approach to minimize the conditional mutual information (CMI) between subspaces with respect to categorical variables. We first show theoretically that CMI minimization is a good objective for robust disentanglement on linear problems with Gaussian data. We then apply our method on real-world datasets based on MNIST and CelebA, and show that it yields models that are disentangled and robust under correlation shift, including in weakly supervised settings. Link » Christina Funke · Paul Vicol · Kuan-Chieh Wang · Matthias Kümmerer · Richard Zemel · Matthias Bethge 🔗 - Continuous Relaxation For The Multivariate Non-Central Hypergeometric Distribution (Poster)  link » Partitioning a set of elements into a given number of groups of a priori unknown sizes is an important task in many applications. Due to hard constraints it is a non-differentiable problem that prohibits its direct use in modern machine learning frameworks. Hence, previous works mostly fall back on suboptimal heuristics or simplified assumptions. The multivariate hypergeometric distribution offers a probabilistic formulation of how to distribute a given number of samples across multiple groups. Unfortunately, as a discrete probability distribution, it neither is differentiable. In this work, we propose a continuous relaxation for the multivariate non-central hypergeometric distribution. We introduce an efficient and numerically stable sampling procedure. This enables reparameterized gradients for the hypergeometric distribution and its integration into automatic differentiation frameworks. We additionally highlight its advantages on a weakly-supervised learning task. Link » Thomas Sutter · Laura Manduchi · Alain Ryser · Julia E Vogt 🔗 - Binding Actions to Objects in World Models (Poster)  link » We study the problem of binding actions to objects in object-factored world models using action-attention mechanisms. We propose two attention mechanisms for binding actions to objects, soft attention and hard attention, which we evaluate in the context of structured world models for five environments. Our experiments show that hard attention helps contrastively-trained structured world models to learn to separate individual objects in an object-based grid-world environment. Further, we show that soft attention increases performance of factored world models trained on a robotic manipulation task. The learned action attention weights can be used to interpret the factored world model as the attention focuses on the manipulated object in the environment. Link » Ondrej Biza · Robert Platt · Jan-Willem van de Meent · Lawson Wong · Thomas Kipf 🔗 - Learning to reason about and to act on physical cascading events (Poster)  link » Reasoning and interacting with dynamic environments is a fundamental problem in AI, but it becomes extremely challenging when actions can trigger cascades of cross-dependant events.We introduce a new learning setup called Cascade where an agent is shown a video of a simulated physical dynamic scene, and is asked to intervene and trigger a cascade of events, such that the system reaches a "counterfactual" goal. For instance, the agent may be asked to “Make the blue ball hit the red one, by pushing the green ball”. The problem is very challenging because agent interventions are from a continuous space, and cascades of events make the dynamics highly non-linear.We combine semantic tree search with an event-driven forward model and devise an algorithm that learns to search in semantic trees in continuous spaces. We demonstrate that our approach learns to effectively follow instructions to intervene in previously unseen complex scenes. Interestingly, it can use the observed cascade of events to reason about alternative counterfactual outcomes. Link » Yuval Atzmon · Eli Meirom · Shie Mannor · Gal Chechik 🔗 - Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning (Poster)  link » We present a two-step hybrid reinforcement learning (RL) policy that is designed to generate interpretable and robust hierarchical policieson the RL problem with graph-based input. Unlike prior deep reinforcement learning policies parameterized by an end-to-end black-box graph neural network, our approach disentangles the decision-making process into two steps. The first step is a simplified classification problem that maps the graph input to an action group where all actions share a similar semantic meaning. The second step implements a sophisticated rule-miner that conducts explicit one-hop reasoning over the graph and identifies decisive edges in the graph input without the necessity of heavy domain knowledge. This two-step hybrid policy presents human-friendly interpretations and achieves better performance in terms of generalization and robustness. Extensive experimental studies on four levels of complex text-based games have demonstrated the superiority of the proposed method compared to the state-of-the-art. Link » Tongzhou Mu · Kaixiang Lin · Feiyang Niu · Govind Thattai 🔗 - A CAUSAL VIEWPOINT ON MOTOR-IMAGERY BRAINWAVE DECODING (Poster)  link » In this work, we employ causal reasoning to breakdown and analyze important challenges of the decoding of Motor-Imagery (MI) electroencephalography (EEG) signals. Furthermore, we present a framework consisting of dynamic convolutions, that address one of the issues that arises through this causal investigation, namely the subject distribution shift (or inter-subject variability). Using a publicly available MI dataset, we demonstrate increased cross-subject performance in two different MI tasks for four well-established deep architectures. Link » Konstantinos Barmpas · Yannis Panagakis · Dimitrios Adamos · Nikolaos Laskaris · Stefanos Zafeiriou 🔗 - On the Identifiability of Nonlinear ICA with Unconditional Priors (Poster)  link » Nonlinear independent component analysis (ICA) aims to recover the underlying marginally independent latent sources from their observable nonlinear mixtures. The identifiability of nonlinear ICA is a major unsolved problem in unsupervised learning. Recent breakthroughs reformulate the standard marginal independence assumption of sources as conditional independence given some auxiliary variables (e.g., class labels) as weak supervision or inductive bias. However, the modified setting is not applicable in many scenarios that do not have auxiliary variables. We explore an alternative path and consider instead only assumptions on the mixing process, such as the pairwise orthogonality among the columns of the Jacobian of the mixing function. We show that marginally independent latent sources can be identified from strongly nonlinear mixtures up to a component-wise transformation and a permutation, thus providing, to the best of our knowledge, a first full identifiability result of nonlinear ICA without auxiliary variables. We provide an estimation method and validate the theoretical results experimentally. Link » Yujia Zheng · Zhi Yong Ignavier Ng · Kun Zhang 🔗 - Improving Generalization with Approximate Factored Value Functions (Poster)  link » Reinforcement learning in general unstructured MDPs presents a challenging learning problem. However, certain kinds of MDP structures, such as factorization, are known to make the problem simpler. This fact is often not useful in more complex tasks because complex MDPs with high-dimensional state spaces do not often exhibit such structure, and even if they do, the structure itself is typically unknown. In this work, we instead turn this observation on its head: instead of developing algorithms for structured MDPs, we propose a representation learning algorithm that approximates an unstructured MDP with one that has factorized structure. We then use these factors as a more convenient state representation for downstream learning. The particular structure that we leverage is reward factorization, which defines a more compact class of MDPs that admit factorized value functions. We show that our proposed approach, \textbf{A}pproximately \textbf{Fa}ctored \textbf{R}epresentations (AFaR), can be easily combined with existing RL algorithms, leading to faster training (better sample complexity) and robust zero-shot transfer (better generalization) on the Procgen benchmark. An interesting future work would be to extend AFaR to learn~\textit{factorized} policies that can act on the individual factors that may lead to benefits like better exploration. We empirically verify the effectiveness of our approach in terms of better sample complexity and improved generalization on the ProcGen benchmark and the MiniGrid environments. Link » Shagun Sodhani · Sergey Levine · Amy Zhang 🔗 - INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision (Poster)  link » We propose INFERNO, a method to infer object-centric representations of visual scenes without annotations.Our method decomposes a scene into multiple objects, with each object having a structured representation that disentangles its shape, appearance and pose.Each object representation defines a localized neural radiance field used to generate 2D views of the scene through differentiable rendering. Our model is subsequently trained by minimizing a reconstruction loss between inputs and corresponding rendered scenes. We empirically show that INFERNO discovers objects in a scene without supervision.We also validate the interpretability of the learned representations by manipulating inferred scenes and showing the corresponding effect in the rendered output.Finally, we demonstrate the usefulness of our 3D object representations in a visual reasoning task using the CATER dataset. Link » Lluis Castrejon · Nicolas Ballas · Aaron Courville 🔗 - Coherence Evaluation of Visual Concepts With Objects and Language (Poster)  link » Meaningful concepts are the fundamental elements of human reasoning. In explainable AI, they are used to provide concept-based explanations of machine learning models. The concepts are often extracted from large-scale image data sets in an unsupervised manner and are therefore not guaranteed to be meaningful to users. In this work, we investigate to which extent we can automatically assessthe meaningfulness of such visual concepts using objects and language as forms of supervision. On the way towards discovering more meaningful concepts, we propose the “Semantic-level, Object and Language-Guided Coherence Evaluation” framework for visual concepts (SOLaCE). SOLaCE assigns semantic meanings in the form of words to automatically discovered visual concepts and evaluates theirdegree of intelligibility on this higher level without human effort. We consider the question of whether objects are sufficient as possible meanings, or whether a broader vocabulary including more abstract meanings needs to be considered. By means of a user study, we confirm that our simulated evaluations highly agree with the human perception of coherence. They can improve over mere visual metrics, even when only relying on objects. Link » Tobias Leemann · Yao Rong · Stefan Kraft · Enkelejda Kasneci · Gjergji Kasneci 🔗 - Learning Fourier-Sparse Functions on DAGs (Poster)  link » We show that the classical Moebius transform from combinatorics can be interpreted as a causal form of Fourier transform on directed acyclic graphs (DAGs). The associated Fourier basis, which is spanned by the columns of the zeta transform, enables us to make use of Fourier-sparse learning methods to learn functions on the vertices of DAGs from few observations. As a prototypical application example we construct a DAG from a dynamic contact tracing network, in which each vertex represents an individual at a given timestamp, and learn the function that indicates which of the vertices are infected by a disease. Link » Bastian Seifert · Chris Wendler · Markus Püschel 🔗 - Causal Policy Ranking (Poster)  link » Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant is their contribution. Given a trained policy, we propose a black-box method based on counterfactual reasoning that estimates the causal effect that these decisions have on reward attainment and ranks the decisions according to this estimate. In this preliminary work, we compare our measure against an alternative, non-causal, ranking procedure, highlight the benefits of causality-based policy ranking, and discuss potential future work integrating causal algorithms into the interpretation of RL agent policies. Link » Daniel McNamee · Hana Chockler 🔗 - ReMixer: Object-aware Mixing Layer for Vision Transformers and Mixers (Poster)  link » Patch-based models, e.g., Vision Transformers (ViTs) and Mixers, have shown impressive results on various visual recognition tasks, exceeding classic convolutional networks. While the initial patch-based models treated all patches equally, recent studies reveal that incorporating inductive biases like spatiality benefits the learned representations. However, most prior works solely focused on the position of patches, overlooking the scene structure of images. This paper aims to further guide the interaction of patches using the object information. Specifically, we propose ReMixer, which reweights the patch mixing layers based on the patch-wise object labels extracted from pretrained saliency or classification models. We apply ReMixer on various patch-based models using different patch mixing layers: ViT, MLP-Mixer, and ConvMixer, where our method consistently improves the classification accuracy and background robustness of baseline models. Link » Hyunwoo Kang · Sangwoo Mo · Jinwoo Shin 🔗