Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models
Pre-training and In-context Learning IS Bayesian Inference a la De Finetti
Naimeng Ye · Hanming Yang · Andrew Siah · Hongseok Namkoong
In-context learning (ICL) has emerged as a powerful learning paradigm. Going back to De Finetti’s work on Bayesian inference using observables—as opposed to priors on latent factors/parameters—we establish an \emph{explicit} equivalence between ICL and Bayesian inference \emph{a la} De Finetti. From this view, pre-training is precisely empirical Bayes: it optimizes the marginal likelihood of observed sequences; compared to fitting priors in conventional empirical Bayes, pre-training fits posterior predictives using transformers. Our observation highlights previously under-explored capabilities of ICL: statistical inference and uncertainty quantification. Our theory highlights the importance of predictive coherence and motivates a new regularizer for pre-training sequence models to be logically coherent Bayesians statisticians. Our preliminary empirical results demonstrate coherency regularization can substantially improve the inferential capabilities of ICL.