Skip to yearly menu bar Skip to main content


Poster

Why In-Context Learning Models are Good Few-Shot Learners?

Shiguang Wu · Yaqing Wang · Quanming Yao

Hall 3 + Hall 2B #468
[ ]
Sat 26 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

We explore in-context learning (ICL) models from a learning-to-learn perspective. Unlike studies that identify specific learning algorithms in ICL models, we compare ICL models with typical meta-learners to understand their superior performance. We theoretically prove the expressiveness of ICL models as learning algorithms and examine their learnability and generalizability. Our findings show that ICL with transformers can effectively construct data-dependent learning algorithms instead of directly follow existing ones (including gradient-based, metric-based, and amortization-based meta-learners). The construction of such learning algorithm is determined by the pre-training process, as a function fitting the training distribution, which raises generalizability as an important issue.With above understanding, we propose strategies to transfer techniques for classical deep networks to meta-level to further improve ICL. As examples, we implement meta-level meta-learning for domain adaptability with limited data and meta-level curriculum learning for accelerated convergence during pre-training, demonstrating their empirical effectiveness.

Live content is unavailable. Log in and register to view live content