Abstract:Large Language Models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities from few-shot demonstration exemplars. While recent learning-based demonstration selection methods have proven beneficial to ICL by choosing more useful exemplars, their underlying mechanisms are opaque, hindering efforts to address limitations such as high training costs and poor generalization across tasks. These methods generally assume the selection process captures similarities between the exemplar and the target instance, however, it remains unknown what kinds of similarities are captured and vital to performing ICL. To dive into this question, we analyze the working mechanisms of the learning-based demonstration selection methods and empirically identify two important factors related to similarity measurement: 1) The ability to integrate different levels of task-agnostic text similarities between the input of exemplars and test cases enhances generalization power across different tasks. 2) Incorporating task-specific labels when measuring the similarities significantly improves the performance on each specific task. We validate these two findings through extensive quantitative and qualitative analyses across ten datasets and various LLMs. Based on our findings, we introduce two effective yet simplified exemplar selection methods catering to task-agnostic and task-specific demands, eliminating the costly LLM inference overhead.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is in In - Context Learning (ICL), how to more effectively select demonstration examples through learning - based demonstration selection methods to improve the performance of large - language models (LLMs) on unseen tasks. Specifically, the paper focuses on the types of similarity captured by these learning - based methods when selecting demonstration examples and their impact on ICL performance. Although existing learning - based methods have shown better performance than task - independent similarity methods, the implicit similarities they capture and the relationship between these similarities and ICL performance remain unclear. For this reason, the paper proposes two hypotheses: 1. **Hypothesis H1**: After training, the retriever acts as an ensemble model and can adaptively integrate different levels of task - independent similarities that exist between the demonstration example input (\(x\)) and the test case (\(x_t\)). 2. **Hypothesis H2**: In addition to input similarity, the training process also encourages the selection of demonstration examples whose output (\(y\)) is similar to the test - case output (\(y_t\)), which is implicitly predicted during the retrieval process, thereby enhancing the retriever's ability to distinguish specific tasks. To verify these two hypotheses, the paper conducts extensive quantitative analysis and proposes two simplified methods - Multi - Layer Similarity Maximization (MLSM) and Test - Task Fine - Tuning (TTF), aiming to reduce the expensive data collection costs required for constructing proxy tasks while meeting cross - task and task - specific requirements. Experiments on multiple LLMs and various tasks prove the effectiveness of these two methods, support the above hypotheses, and provide directions for future demonstration selection research.

Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

Revisiting Demonstration Selection Strategies in In-Context Learning

Are Human-generated Demonstrations Necessary for In-context Learning?

In-Context Compositional Generalization for Large Vision-Language Models

In-Context Learning Demonstration Selection via Influence Analysis

Misconfidence-based Demonstration Selection for LLM In-Context Learning

Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning

Comparable Demonstrations Are Important in In-Context Learning: A Novel Perspective on Demonstration Selection

In-Context Learning with Iterative Demonstration Selection

Curriculum Demonstration Selection for In-Context Learning

Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning

Large Language Models Know What Makes Exemplary Contexts

Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

Not All Demonstration Examples Are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning

Demonstration Selection for In-Context Learning via Reinforcement Learning

DemoShapley: Valuation of Demonstrations for In-Context Learning

In-Context Demonstration Selection with Cross Entropy Difference

Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps

Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions