Abstract:Large language models (LLMs) exhibit remarkable in-context learning (ICL) capabilities. However, the underlying working mechanism of ICL remains poorly understood. Recent research presents two conflicting views on ICL: One emphasizes the impact of similar examples in the demonstrations, stressing the need for label correctness and more shots. The other attributes it to LLMs' inherent ability of task recognition, deeming label correctness and shot numbers of demonstrations as not crucial. In this work, we provide a Two-Dimensional Coordinate System that unifies both views into a systematic framework. The framework explains the behavior of ICL through two orthogonal variables: whether similar examples are presented in the demonstrations (perception) and whether LLMs can recognize the task (cognition). We propose the peak inverse rank metric to detect the task recognition ability of LLMs and study LLMs' reactions to different definitions of similarity. Based on these, we conduct extensive experiments to elucidate how ICL functions across each quadrant on multiple representative classification tasks. Finally, we extend our analyses to generation tasks, showing that our coordinate system can also be used to interpret ICL for generation tasks effectively.

What problem does this paper attempt to address?

The paper attempts to address the issue that the working mechanism of large language models (LLMs) in In-Context Learning (ICL) is not yet fully understood. Specifically, researchers have two conflicting views on the effectiveness of ICL: 1. **View 1**: Emphasizes the impact of including similar examples in the demonstration, believing that the correctness of labels and a greater number of examples can improve performance. 2. **View 2**: Argues that LLMs have implicitly learned the knowledge required for tasks during the pre-training phase, so the examples provided in the context only offer clues for task recognition, and the correctness of labels and the number of examples are not important. To unify these two views, the paper proposes a two-dimensional coordinate system that systematically explains the working mechanism of ICL through two orthogonal variables—perception (whether similar examples are included) and cognition (whether the model can recognize the task). The paper introduces the Peak Inverse Rank (PIR) metric to detect the model's task recognition ability and conducts extensive experiments to verify the performance of ICL under different conditions. ### Main Contributions: 1. **Proposed a Two-Dimensional Coordinate System**: Decomposes the working mechanism of ICL into four different scenarios, each corresponding to a quadrant in the coordinate system. 2. **Introduced the PIR Metric**: Used to evaluate whether the model successfully recognizes the task during ICL. 3. **Extensive Experiments**: Conducted experiments on multiple classification and generation tasks to verify the effectiveness and universality of the coordinate system. ### Explanation of the Four Quadrants: 1. **First Quadrant**: The model can recognize the task, and the demonstration includes similar examples. The model not only uses pre-trained knowledge for prediction but also refers to the labels of similar examples. If the labels of similar examples are incorrect, smaller models tend to replicate these incorrect labels, while larger models rely on pre-trained knowledge. 2. **Second Quadrant**: The model can recognize the task, but the demonstration does not include similar examples. The model mainly relies on pre-trained knowledge for prediction, and increasing the number of examples has limited improvement on ICL performance. 3. **Third Quadrant**: The model cannot recognize the task, and the demonstration does not include similar examples. ICL fails, and the model tends to blindly predict the label of the first example. 4. **Fourth Quadrant**: The model cannot recognize the task, but the demonstration includes similar examples. The model directly replicates the labels of similar examples, so the performance of ICL entirely depends on whether the labels of similar examples match the true labels of the test samples. Larger models are better at recognizing similar examples and thus more likely to replicate these labels. ### Conclusion: The paper systematically explains the working mechanism of ICL by proposing a two-dimensional coordinate system and verifies the performance of ICL under different conditions. The study shows that the correctness of similar example labels is particularly important when the model cannot recognize the task, and increasing the number of examples can improve the effectiveness of ICL. Additionally, this coordinate system is applicable not only to classification tasks but also to generation tasks.

Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism

In-Context Compositional Generalization for Large Vision-Language Models

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When

Implicit In-context Learning

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

Large Language Models Know What Makes Exemplary Contexts

The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis

ICLEval: Evaluating In-Context Learning Ability of Large Language Models

Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps

A Survey on In-context Learning

Inference and Verbalization Functions During In-Context Learning

What Do Language Models Learn in Context? The Structured Task Hypothesis

Towards More Unified In-context Visual Understanding

How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning

Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models

Visual In-Context Learning for Large Vision-Language Models

"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval"

When Does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks