ICLR The Effect of Model Capacity on the Emergence of In-Context Learning

Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models

The Effect of Model Capacity on the Emergence of In-Context Learning

Berkan Ottlik · Narutatsu Ri · Daniel Hsu · Clayton Sanford

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

This paper investigates the relationship between model capacity and the emergence of in-context learning under a simplified statistical framework in the transformer model. When model capacity is restricted enough, transformers shift from learning the Bayes optimal estimator for the training task distribution to an estimator that is suitable for out-of-distribution tasks. This shift is attributed to the restricted model's inability to fully memorize the training task distribution. Further experiments examine how the transformer's hyper-parameters impact its capacity for memorization.

Chat is not available.

Poster in Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models

The Effect of Model Capacity on the Emergence of In-Context Learning

Berkan Ottlik · Narutatsu Ri · Daniel Hsu · Clayton Sanford

Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models