Virtual presentation / top 25% paper
Not All Tasks Are Born Equal: Understanding Zero-Shot Generalization
Jing Zhou · Zongyu Lin · Yanan Zheng · Jian Li · Zhilin Yang
Keywords: [ transfer learning ] [ multi-task learning ] [ zero-shot learning ] [ Deep Learning and representational learning ]
Recent work has achieved remarkable zero-shot performance with multi-task prompted pretraining, but little has been understood. For the first time, we show that training on a small number of key tasks beats using all the training tasks, while removing these key tasks substantially hurts performance. We also find that these key tasks are mostly question answering (QA) tasks. These novel findings combined deepen our understanding about zero-shot generalization—training on certain tasks such as QA encodes general knowledge transferable to a wide range of tasks. In addition, to automate this procedure, we devise a method that (1) identifies key training tasks without observing the test tasks by examining the pairwise generalization results and (2) resamples training tasks for better data distribution. Empirically, our approach achieves improved results across various model scales and tasks.