Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations

Kai Tzu-iunn Ong,Taeyoon Kwon,Jinyoung Yeo
2024-08-22
Abstract:Guiding large language models with a selected set of human-authored demonstrations is a common practice for improving LLM applications. However, human effort can be costly, especially in specialized domains (e.g., clinical diagnosis), and does not guarantee optimal performance due to the potential discrepancy of target skills between selected demonstrations and real test instances. Motivated by these, this paper explores the automatic creation of customized demonstrations, whose target skills align with the given target instance. We present SELF-TAUGHT, a problem-solving framework, which facilitates demonstrations that are "tailored" to the target problem and "filtered" for better quality (i.e., correctness) in a zero-shot manner. In 15 tasks of multiple-choice questions of diverse domains and the diagnosis of Alzheimer's disease (AD) with real-world patients, SELF-TAUGHT achieves superior performance to strong baselines (e.g., Few-shot CoT, Plan-and-Solve, Auto-CoT). We conduct comprehensive analyses on SELF-TAUGHT, including its generalizability to existing prompting methods and different LLMs, the quality of its intermediate generation, and more.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
This paper attempts to address the problem of reducing reliance on manually annotated examples and improving model performance in solving real-world problems when using large language models (LLMs) in specific domains. Specifically, the authors propose a framework called SELF-TAUGHT, which aims to automatically generate high-quality examples highly relevant to the target problem to guide LLMs in problem-solving. This approach is particularly suitable for domains requiring expertise, such as clinical diagnosis, where obtaining high-quality manually annotated data is both expensive and time-consuming. ### Background and Problem - **Limitations of Existing Methods**: Currently, most LLM-based applications rely on human experts to select representative problems, annotate solutions, and use these examples to guide LLMs in problem-solving (i.e., few-shot prompting). This approach is not only costly but also may not achieve optimal performance due to the selected examples potentially not fully matching the actual test instances. - **Challenges of Zero-Shot Prompting**: While zero-shot prompting (i.e., directly prompting LLMs without using any examples) can reduce reliance on manually annotated data, its performance is usually inferior to few-shot prompting, especially in domains requiring expertise. ### Solution - **SELF-TAUGHT Framework**: The authors propose a zero-shot framework called SELF-TAUGHT, which automatically generates high-quality examples highly relevant to the target problem through the following three stages: 1. **Information Identification**: First, the LLM identifies the knowledge points or skills involved in the target problem. 2. **Customized Example Generation**: Then, the LLM generates pseudo-problems and their solutions similar to the target problem and ensures the quality of the solutions through confidence filtering. 3. **Self-Guided Problem Solving**: Finally, the automatically generated examples are used to guide the LLM in solving the target problem. ### Experiments and Results - **Experimental Setup**: The authors conducted experiments on multiple-choice tasks in various domains and clinical diagnosis tasks for Alzheimer's disease (AD), including StrategyQA, ScienceQA, MedQA, etc. - **Performance Comparison**: Experimental results show that SELF-TAUGHT significantly outperforms existing zero-shot and few-shot methods across multiple tasks, particularly excelling in tasks requiring expertise. - **Analysis and Discussion**: The authors also validated the effectiveness and generality of each stage of the framework through ablation experiments and tests with different LLMs. ### Main Contributions 1. **Proposed a Zero-Shot Framework**: SELF-TAUGHT can automatically generate high-quality customized examples without relying on manually annotated data, improving LLM performance in solving real-world problems. 2. **Extensive Experimental Validation**: Experiments were conducted across tasks in various domains, demonstrating the framework's effectiveness and robustness. 3. **Performance and Cost Trade-off**: Although the API cost of SELF-TAUGHT is relatively high, its cost-effectiveness remains competitive when performance is prioritized. ### Conclusion The SELF-TAUGHT framework provides an effective solution to reduce reliance on manually annotated data and improve LLM performance in specific domains. Future work can further optimize the cost-effectiveness of the framework and explore its application in more domains.