Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement

Joseph Shtok,Amit Alfassy,Foad Abo Dahood,Eliyahu Schwartz,Sivan Doveh,Assaf Arbelle
2024-10-14
Abstract:It has been shown that Large Language Models' (LLMs) performance can be improved for many tasks using Chain of Thought (CoT) or In-Context Learning (ICL), which involve demonstrating the steps needed to solve a task using a few examples. However, while datasets with input-output pairs are relatively easy to produce, providing demonstrations which include intermediate steps requires cumbersome manual work. These steps may be executable programs, as in agentic flows, or step-by-step reasoning as in CoT. In this work, we propose Automatic Data Labeling and Refinement (ADLR), a method to automatically generate and filter demonstrations which include the above intermediate steps, starting from a small seed of manually crafted examples. We demonstrate the advantage of ADLR in code-based table QA and mathematical reasoning, achieving up to a 5.5% gain. The code implementing our method is provided in the Supplementary material and will be made available.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges encountered in improving task performance through in - context learning (ICL) in large - language models (LLMs). Specifically, although showing the steps required to solve a task (for example, using the chain of thought (CoT) of several examples or executing a program) can significantly improve the performance of LLMs on a variety of tasks, generating a high - quality set of examples containing intermediate steps requires a great deal of manual work, which is both time - consuming and labor - intensive. To overcome this challenge, the authors propose the Automatic Data Labeling and Refinement (ADLR) method. ADLR aims to automatically generate and screen demonstration examples containing the above - mentioned intermediate steps starting from a small number of hand - crafted examples. Through this method, the paper demonstrates the advantages of ADLR in code - based table question - answering (table QA) and mathematical reasoning tasks, achieving a performance improvement of up to 5.5%. The ADLR method is mainly divided into three steps: 1. **Generate a large number of examples**: Start with a data set containing inputs and final answers, and use the initial hand - crafted context to generate intermediate data for these samples. Ensure the correctness of the generated intermediate data by verifying whether it can lead to the correct final answer. This step provides a complete set of solved examples as well as a set of unsolved (difficult) samples. 2. **Filter and refine examples**: Refine the set of solved examples according to two criteria. First, estimate the difficulty of the sample, that is, the proportion of correctly solving the problem after running the LLM multiple times with a non - zero temperature under the same input. Then, select those examples that can solve many difficult samples in a single prompt to test their utility. 3. **Use selected examples for ICL**: Use the refined set of examples to enhance the reasoning protocol of the underlying algorithm. By using multiple diverse contexts, each containing a random subset of examples and a large number of examples, and finally aggregating the results of multiple LLM runs through majority voting. Through this method, ADLR not only improves the performance of existing algorithms in the ICL mode, but also provides a simple and effective method for generating and screening high - quality training examples, thereby further promoting the application and development of LLMs in complex tasks.