Compositional Generalization through Gradient Search in Nonparametric Latent Space
Abstract
Neural network architectures have made considerable advances in their ability to solve reasoning problems, but many state-of-the-art methods fail at systematic compositional generalization. To address this, we propose a novel architecture which uses a nonparametric latent space, information-theoretic regularization of this space, and test-time gradient-based search to achieve strong performance on OOD compositional meta-learning tasks such as ARC-like program induction, Raven's progressive matrices, and linguistic systematicity tasks. Our proposed architecture, Abduction Transformer, uses nonparametric mixture distributions to represent inferred hidden causes of few-shot meta-learning instances. These representations are refined at test-time via gradient descent to better account for the observed few-shot examples, a form of variational posterior inference which allows Abduction Transformer to solve meta-learning tasks that require novel recombinations of knowledge acquired during training. Our method outperforms standard transformer architectures and previous test-time adaptive approaches, indicating a promising new direction for neural networks capable of systematic generalization.