ICLR Pre-training and In-context Learning IS Bayesian Inference a la De Finetti

Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models

Pre-training and In-context Learning IS Bayesian Inference a la De Finetti

Naimeng Ye · Hanming Yang · Andrew Siah · Hongseok Namkoong

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

In-context learning (ICL) has emerged as a powerful learning paradigm. Going back to De Finetti’s work on Bayesian inference using observables—as opposed to priors on latent factors/parameters—we establish an \emph{explicit} equivalence between ICL and Bayesian inference \emph{a la} De Finetti. From this view, pre-training is precisely empirical Bayes: it optimizes the marginal likelihood of observed sequences; compared to fitting priors in conventional empirical Bayes, pre-training fits posterior predictives using transformers. Our observation highlights previously under-explored capabilities of ICL: statistical inference and uncertainty quantification. Our theory highlights the importance of predictive coherence and motivates a new regularizer for pre-training sequence models to be logically coherent Bayesians statisticians. Our preliminary empirical results demonstrate coherency regularization can substantially improve the inferential capabilities of ICL.

Chat is not available.

Poster in Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models

Pre-training and In-context Learning IS Bayesian Inference a la De Finetti

Naimeng Ye · Hanming Yang · Andrew Siah · Hongseok Namkoong

Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models