Exploring Lottery Prompts for Pre-trained Language Models

Yulin Chen,Ning Ding,Xiaobin Wang,Shengding Hu,Hai-Tao Zheng,Zhiyuan Liu,Pengjun Xie
2023-05-31
Abstract:Consistently scaling pre-trained language models (PLMs) imposes substantial burdens on model adaptation, necessitating more efficient alternatives to conventional fine-tuning. Given the advantage of prompting in the zero-shot setting and the observed performance fluctuation among different prompts, we explore the instance-level prompt and their generalizability. By searching through the prompt space, we first validate the assumption that for every instance, there is almost always a lottery prompt that induces the correct prediction from the PLM, and such prompt can be obtained at a low cost thanks to the inherent ability of PLMs. Meanwhile, we find that some strong lottery prompts have high performance over the whole training set, and they are equipped with distinguishable linguistic features. Lastly, we attempt to generalize the searched strong lottery prompts to unseen data with prompt ensembling method without any parameter tuning. Experiments are conducted on various types of NLP classification tasks and demonstrate that the proposed method can achieve comparable results with other gradient-free and optimization-free baselines.
Computation and Language
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: without the need to update the parameters of pre - trained language models (PLMs), can at least one instance - level prompt (i.e., "lottery prompt") that can induce the correct output be found for each data point? Specifically, the researchers explored the following hypothesis: - For a given pre - trained language model and classification dataset, there exists at least one lottery prompt for each instance, which can induce the correct prediction result without updating the PLM parameters. To verify this hypothesis, the researchers conducted experiments by searching the prompt space to find lottery prompts for each data point. They selected 13 representative classification task datasets and designed a reasonable prompt search space. The experimental results show that in almost all cases, such lottery prompts can indeed be found, indicating that in a limited text prompt space, a common set of word combinations can almost always be found as prompts to make the prediction results correct. In addition, the researchers further analyzed the characteristics of these lottery prompts and their generalization ability on unseen data. They discovered some "strong prompts" that perform well on the entire training set and can be effectively generalized to test data without any parameter adjustment through a prompt integration method based on mutual information, thereby achieving performance comparable to or better than many competitive baseline methods. ### Key Conclusions 1. **Existence of lottery prompts**: For most datasets, a lottery prompt can be found for almost every data point, enabling the pre - trained language model to correctly predict the label. 2. **Low search cost**: For most datasets, the actual search cost for finding lottery prompts is far lower than the theoretical maximum, usually not exceeding 30 API calls. 3. **Impact of model capacity**: As the scale of the pre - trained model expands and the number of pre - training steps increases, the success rate of finding lottery prompts is higher and the search cost is lower. 4. **Discovery of strong prompts**: Some prompts perform excellently on the entire training set, have interpretable language features, and can be effectively generalized to unseen data through the integration method. This research demonstrates the great potential of pre - trained language models and provides new ideas for more efficient prompt search and integration methods in the future.