Unraveling the Mechanics of Learning-Based Demonstration Selection for In-Context Learning

Hui Liu,Wenya Wang,Hao Sun,Chris Xing Tian,Chenqi Kong,Xin Dong,Haoliang Li
2024-10-15
Abstract:Large Language Models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities from few-shot demonstration exemplars. While recent learning-based demonstration selection methods have proven beneficial to ICL by choosing more useful exemplars, their underlying mechanisms are opaque, hindering efforts to address limitations such as high training costs and poor generalization across tasks. These methods generally assume the selection process captures similarities between the exemplar and the target instance, however, it remains unknown what kinds of similarities are captured and vital to performing ICL. To dive into this question, we analyze the working mechanisms of the learning-based demonstration selection methods and empirically identify two important factors related to similarity measurement: 1) The ability to integrate different levels of task-agnostic text similarities between the input of exemplars and test cases enhances generalization power across different tasks. 2) Incorporating task-specific labels when measuring the similarities significantly improves the performance on each specific task. We validate these two findings through extensive quantitative and qualitative analyses across ten datasets and various LLMs. Based on our findings, we introduce two effective yet simplified exemplar selection methods catering to task-agnostic and task-specific demands, eliminating the costly LLM inference overhead.
Machine Learning,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in In - Context Learning (ICL), how to more effectively select demonstration examples through learning - based demonstration selection methods to improve the performance of large - language models (LLMs) on unseen tasks. Specifically, the paper focuses on the types of similarity captured by these learning - based methods when selecting demonstration examples and their impact on ICL performance. Although existing learning - based methods have shown better performance than task - independent similarity methods, the implicit similarities they capture and the relationship between these similarities and ICL performance remain unclear. For this reason, the paper proposes two hypotheses: 1. **Hypothesis H1**: After training, the retriever acts as an ensemble model and can adaptively integrate different levels of task - independent similarities that exist between the demonstration example input (\(x\)) and the test case (\(x_t\)). 2. **Hypothesis H2**: In addition to input similarity, the training process also encourages the selection of demonstration examples whose output (\(y\)) is similar to the test - case output (\(y_t\)), which is implicitly predicted during the retrieval process, thereby enhancing the retriever's ability to distinguish specific tasks. To verify these two hypotheses, the paper conducts extensive quantitative analysis and proposes two simplified methods - Multi - Layer Similarity Maximization (MLSM) and Test - Task Fine - Tuning (TTF), aiming to reduce the expensive data collection costs required for constructing proxy tasks while meeting cross - task and task - specific requirements. Experiments on multiple LLMs and various tasks prove the effectiveness of these two methods, support the above hypotheses, and provide directions for future demonstration selection research.