In-Context Learning with Iterative Demonstration Selection

Chengwei Qin,Aston Zhang,Chen Chen,Anirudh Dagar,Wenming Ye
2024-06-23
Abstract:Spurred by advancements in scale, large language models (LLMs) have demonstrated strong few-shot learning ability via in-context learning (ICL). However, the performance of ICL has been shown to be highly sensitive to the selection of few-shot demonstrations. Selecting the most suitable examples as context remains an ongoing challenge and an open problem. Existing literature has highlighted the importance of selecting examples that are diverse or semantically similar to the test sample while ignoring the fact that the optimal selection dimension, i.e., diversity or similarity, is task-specific. Based on how the test sample is answered, we propose Iterative Demonstration Selection (IDS) to leverage the merits of both dimensions. Using zero-shot chain-of-thought reasoning (Zero-shot-CoT), IDS iteratively selects examples that are diverse but still strongly correlated with the test sample as ICL demonstrations. Specifically, IDS applies Zero-shot-CoT to the test sample before demonstration selection. The output reasoning path is then used to choose demonstrations that are prepended to the test sample for inference. The generated answer is followed by its corresponding reasoning path for extracting a new set of demonstrations in the next iteration. After several iterations, IDS adopts majority voting to obtain the final result. Through extensive experiments on tasks including reasoning, question answering, and topic classification, we demonstrate that IDS can consistently outperform existing ICL demonstration selection methods.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of selecting demonstration examples in In-Context Learning (ICL). Specifically, the paper points out that the performance of ICL is highly dependent on the quality of the selected demonstration examples. Different sets of demonstration examples can lead to performance variations ranging from nearly random to comparable with state-of-the-art models. However, existing methods typically consider only one dimension of diversity or similarity when selecting demonstration examples and do not account for the fact that the optimal selection dimension is task-specific. To overcome this problem, the authors propose an Iterative Demonstration Selection (IDS) method, which selects demonstration examples that are both diverse and highly relevant to the test samples through Zero-shot Chain-of-Thought (Zero-shot-CoT) reasoning. IDS iteratively selects the best demonstration examples through multiple iterations and finally determines the final result through majority voting. ### Main Contributions 1. **Considering Diversity and Similarity**: The authors point out that the optimal dimension for selecting demonstration examples is task-specific and propose the IDS method, which leverages Zero-shot-CoT reasoning to fully utilize the advantages of both diversity and similarity. 2. **Experimental Validation**: Through extensive experiments and analysis, the authors demonstrate the effectiveness of IDS across various tasks, including mathematical reasoning, common-sense reasoning, logical reasoning, question answering, and topic classification. ### Method Overview 1. **Zero-shot Chain-of-Thought (Zero-shot-CoT)**: Apply Zero-shot-CoT to the test samples first to generate reasoning paths. 2. **Selecting Demonstration Examples**: Based on the generated reasoning paths, select the training examples that are most semantically similar as demonstration examples. 3. **Iterative Selection**: Attach the selected demonstration examples to the test samples for ICL reasoning, generating new answers and reasoning paths. Then, continue selecting demonstration examples with the new reasoning paths, iterating multiple times. 4. **Majority Voting**: After several iterations, determine the final result through majority voting. ### Experimental Results - **Performance Improvement**: IDS significantly outperforms existing baseline methods across all test datasets, with an average performance improvement of approximately 1.7%. - **Task Specificity**: IDS performs exceptionally well on complex tasks. For example, in mathematical reasoning tasks, IDS shows an average relative performance improvement of approximately 4% compared to the Top-k-Consistency method. - **Similarity Analysis**: The demonstration examples selected by IDS exhibit a more balanced semantic similarity, maintaining both diversity and high relevance to the test samples. Through these contributions, the paper provides an effective method to optimize the selection of demonstration examples in ICL, thereby improving model performance.