Finetuned Language Models are Zero-Shot Learners

Jason Wei · Maarten Bosma · Vincent Zhao · Kelvin Guu · Wei Yu · Brian Lester · Nan Du · Andrew Dai · Quoc V Le

Keywords: [ language models ] [ zero-shot learning ] [ natural language processing ]

[ Abstract ]
[ Visit Poster at Spot I0 in Virtual World ] [ OpenReview
Tue 26 Apr 10:30 a.m. PDT — 12:30 p.m. PDT
Oral presentation: Oral 4: Sequence modeling
Thu 28 Apr 1 a.m. PDT — 2:30 a.m. PDT


This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning—finetuning language models on a collection of datasets described via instructions—substantially improves zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction tune it on over 60 NLP datasets verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 20 of 25 datasets that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze. Ablation studies reveal that number of finetuning datasets, model scale, and natural language instructions are key to the success of instruction tuning.

Chat is not available.