Skip to yearly menu bar Skip to main content


Poster

When does compositional structure yield compositional generalization? A kernel theory.

Samuel Lippl · Kimberly Stachenfeld

Hall 3 + Hall 2B #441
[ ]
Thu 24 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

Compositional generalization (the ability to respond correctly to novel combinations of familiar components) is thought to be a cornerstone of intelligent behavior. Compositionally structured (e.g. disentangled) representations are essential for this; however, the conditions under which they yield compositional generalization remain unclear. To address this gap, we present a general theory of compositional generalization in kernel models with fixed, compositionally structured representations, a tractable framework for characterizing the impact of dataset statistics on generalization. We find that these models are constrained to adding up values assigned to each combination of components seen during training ("conjunction-wise additivity"). This imposes fundamental restrictions on the set of tasks compositionally structured kernel models can learn, in particular preventing them from transitively generalizing equivalence relations. Even for compositional tasks that they can learn in principle, we identify novel failure modes in compositional generalization (memorization leak and shortcut bias) that arise from biases in the training data. Finally, we empirically validate our theory, showing that it captures the behavior of deep neural networks (convolutional networks, residual networks, and Vision Transformers) trained on a set of compositional tasks with similarly structured data. Ultimately, this work examines how statistical structure in the training data can affect compositional generalization, with implications for how to identify and remedy failure modes in deep learning models.

Live content is unavailable. Log in and register to view live content