Poster
in
Workshop: Deep Generative Model in Machine Learning: Theory, Principle and Efficacy
Transcending Bayesian Inference: Transformers Extrapolate Rules Compositionally Under Model Misspecification
Szilvia Ujváry · Anna Mészáros · Wieland Brendel · Patrik Reizinger · Ferenc Huszar
Keywords: [ compositional generalization ] [ implicit Bayesian inference ] [ OOD generalization ] [ language models ]
LLMs' intelligent behaviour, such as emergent reasoning and in-context learning abilities have been interpreted as implicit Bayesian inference (IBI). IBI considers the training data as a mixture, infers the underlying latent parameters thereof, and makes predictions on the training data consistent with explicit Bayesian inference. When the test prompts are out-of-distribution, Bayesian inference over the training mixture components becomes suboptimal due to model misspecification. We pre-train Transformer models for implicit Bayesian inference, and investigate whether they can transcend this behaviour under model misspecification. Our experiments demonstrate that Transformers generalize compositionally, even when the Bayesian posterior is undefined. We hypothesize this behavior is due to Transformers learning general algorithms instead of only fitting the training mixture.