In-Person Oral presentation / top 5% paper

What learning algorithm is in-context learning? Investigations with linear models

Ekin Aky├╝rek · Dale Schuurmans · Jacob Andreas · Tengyu Ma · Denny Zhou

AD12
[ Abstract ] [ Livestream: Visit Oral 6 Track 3: Deep Learning and representational learning ]
Wed 3 May 6:10 a.m. — 6:20 a.m. PDT

Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding context-specific parametric models in their hidden representations, and updating these implicit models as new examples appear in the context. Using linear regression as a model problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form computation of regression parameters. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may work by rediscovering standard estimation algorithms.

Chat is not available.