Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning

Xiaolei Wang,Xinyu Tang,Wayne Xin Zhao,Ji-Rong Wen
2024-06-20
Abstract:The emergence of in-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) for recognizing the task from demonstrations and utilizing pre-trained priors, and task learning (TL) for learning from demonstrations. However, relationships between the two abilities and how such relationships affect the emergence of ICL is unclear. In this paper, we take the first step by examining the pre-training dynamics of the emergence of ICL. With carefully designed metrics, we find that these two abilities are, in fact, competitive during pre-training. Moreover, we observe a strong negative correlation between the competition and ICL performance. Further analysis of common pre-training factors (i.e., model size, dataset size, and data curriculum) demonstrates possible ways to manage the competition. Based on these insights, we propose a simple yet effective method to better integrate these two abilities for ICL at inference time. Through adaptive ensemble learning, the performance of ICL can be significantly boosted, enabling two small models to outperform a larger one with more than twice the parameters. The code is available at <a class="link-external link-https" href="https://github.com/RUCAIBox/Competitive-ICL" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The paper aims to explore the competitive relationship between task recognition (TR) and task learning (TL) in in-context learning (ICL) and investigate how this relationship affects the emergence of ICL. Specifically: 1. **Revealing the Competitive Relationship**: By analyzing the dynamic changes during the pre-training process, the authors found that there is a competitive relationship between TR and TL during pre-training, and this competition is significantly negatively correlated with the performance of ICL. 2. **Analysis of Influencing Factors**: The paper further analyzes the impact of common pre-training factors (such as model size, dataset size, and data curriculum arrangement) on this competitive relationship. The results show that increasing the model size can lead to earlier competition but with lower average competition intensity; expanding the dataset size can delay the competition; specific data curriculum arrangements can adjust the competition intensity to enhance or specialize the capabilities of large language models (LLMs). 3. **Proposing a Fusion Strategy**: Based on the above findings, the authors propose a simple and effective adaptive ensemble learning method that can better integrate TR and TL capabilities during the inference phase, thereby significantly improving the performance of ICL. Experimental results show that even when two small models are combined, their performance can exceed that of a much larger model with far more parameters. In summary, this paper not only reveals the underlying mechanisms through the study of the competitive relationship between TR and TL in ICL but also provides a practical method to optimize the performance of ICL.