Not All Tasks Are Born Equal: Understanding Zero-Shot Generalization

Jing Zhou,Zongyu Lin,Yanan Zheng,Jian Li,Zhilin Yang
2023-01-01
Abstract:Recent work has achieved remarkable zero-shot performance with multi-task prompted pretraining, but little has been understood. For the first time, we show that training on a small number of key tasks beats using all the training tasks, while removing these key tasks substantially hurts performance. We also find that these key tasks are mostly question answering (QA) tasks. We design a shuffle experiment to further show that training on these QA tasks leads to better cross-task generalization in multi-task learning under various training/test task splits. These novel findings combined deepen our understanding about zero-generalization---training on certain tasks such as QA encodes general knowledge transferable to a wide range of tasks, which explains the improved zero-shot performance in recent progress. In addition, to automate this procedure, we devise a method to identify and upsample key training tasks without observing the test tasks based on cross validation. Empirically, our approach achieves improved results across various model scales and tasks.
What problem does this paper attempt to address?