Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning

Xiaolei Wang,Xinyu Tang,Wayne Xin Zhao,Ji-Rong Wen

2024-06-20

Abstract:The emergence of in-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) for recognizing the task from demonstrations and utilizing pre-trained priors, and task learning (TL) for learning from demonstrations. However, relationships between the two abilities and how such relationships affect the emergence of ICL is unclear. In this paper, we take the first step by examining the pre-training dynamics of the emergence of ICL. With carefully designed metrics, we find that these two abilities are, in fact, competitive during pre-training. Moreover, we observe a strong negative correlation between the competition and ICL performance. Further analysis of common pre-training factors (i.e., model size, dataset size, and data curriculum) demonstrates possible ways to manage the competition. Based on these insights, we propose a simple yet effective method to better integrate these two abilities for ICL at inference time. Through adaptive ensemble learning, the performance of ICL can be significantly boosted, enabling two small models to outperform a larger one with more than twice the parameters. The code is available at <a class="link-external link-https" href="https://github.com/RUCAIBox/Competitive-ICL" rel="external noopener nofollow">this https URL</a>.

Machine Learning,Computation and Language

What problem does this paper attempt to address?

The paper aims to explore the competitive relationship between task recognition (TR) and task learning (TL) in in-context learning (ICL) and investigate how this relationship affects the emergence of ICL. Specifically: 1. **Revealing the Competitive Relationship**: By analyzing the dynamic changes during the pre-training process, the authors found that there is a competitive relationship between TR and TL during pre-training, and this competition is significantly negatively correlated with the performance of ICL. 2. **Analysis of Influencing Factors**: The paper further analyzes the impact of common pre-training factors (such as model size, dataset size, and data curriculum arrangement) on this competitive relationship. The results show that increasing the model size can lead to earlier competition but with lower average competition intensity; expanding the dataset size can delay the competition; specific data curriculum arrangements can adjust the competition intensity to enhance or specialize the capabilities of large language models (LLMs). 3. **Proposing a Fusion Strategy**: Based on the above findings, the authors propose a simple and effective adaptive ensemble learning method that can better integrate TR and TL capabilities during the inference phase, thereby significantly improving the performance of ICL. Experimental results show that even when two small models are combined, their performance can exceed that of a much larger model with far more parameters. In summary, this paper not only reveals the underlying mechanisms through the study of the competitive relationship between TR and TL in ICL but also provides a practical method to optimize the performance of ICL.

Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

Dual Operating Modes of In-Context Learning

Pre-Training to Learn in Context

Pretraining Task Diversity and the Emergence of Non-Bayesian In-Context Learning for Regression

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning

How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?

Parallel Structures in Pre-training Data Yield In-Context Learning

Take off the Training Wheels! Progressive In-Context Learning for Effective Alignment

Do pretrained Transformers Learn In-Context by Gradient Descent?

Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning

What Do Language Models Learn in Context? The Structured Task Hypothesis

How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes

Towards the Effect of Examples on In-Context Learning: A Theoretical Case Study

Implicit In-context Learning

Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning

Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs?

Explaining Emergent In-Context Learning as Kernel Regression