Abstract:Guiding large language models with a selected set of human-authored demonstrations is a common practice for improving LLM applications. However, human effort can be costly, especially in specialized domains (e.g., clinical diagnosis), and does not guarantee optimal performance due to the potential discrepancy of target skills between selected demonstrations and real test instances. Motivated by these, this paper explores the automatic creation of customized demonstrations, whose target skills align with the given target instance. We present SELF-TAUGHT, a problem-solving framework, which facilitates demonstrations that are "tailored" to the target problem and "filtered" for better quality (i.e., correctness) in a zero-shot manner. In 15 tasks of multiple-choice questions of diverse domains and the diagnosis of Alzheimer's disease (AD) with real-world patients, SELF-TAUGHT achieves superior performance to strong baselines (e.g., Few-shot CoT, Plan-and-Solve, Auto-CoT). We conduct comprehensive analyses on SELF-TAUGHT, including its generalizability to existing prompting methods and different LLMs, the quality of its intermediate generation, and more.

What problem does this paper attempt to address?

This paper attempts to address the problem of reducing reliance on manually annotated examples and improving model performance in solving real-world problems when using large language models (LLMs) in specific domains. Specifically, the authors propose a framework called SELF-TAUGHT, which aims to automatically generate high-quality examples highly relevant to the target problem to guide LLMs in problem-solving. This approach is particularly suitable for domains requiring expertise, such as clinical diagnosis, where obtaining high-quality manually annotated data is both expensive and time-consuming. ### Background and Problem - **Limitations of Existing Methods**: Currently, most LLM-based applications rely on human experts to select representative problems, annotate solutions, and use these examples to guide LLMs in problem-solving (i.e., few-shot prompting). This approach is not only costly but also may not achieve optimal performance due to the selected examples potentially not fully matching the actual test instances. - **Challenges of Zero-Shot Prompting**: While zero-shot prompting (i.e., directly prompting LLMs without using any examples) can reduce reliance on manually annotated data, its performance is usually inferior to few-shot prompting, especially in domains requiring expertise. ### Solution - **SELF-TAUGHT Framework**: The authors propose a zero-shot framework called SELF-TAUGHT, which automatically generates high-quality examples highly relevant to the target problem through the following three stages: 1. **Information Identification**: First, the LLM identifies the knowledge points or skills involved in the target problem. 2. **Customized Example Generation**: Then, the LLM generates pseudo-problems and their solutions similar to the target problem and ensures the quality of the solutions through confidence filtering. 3. **Self-Guided Problem Solving**: Finally, the automatically generated examples are used to guide the LLM in solving the target problem. ### Experiments and Results - **Experimental Setup**: The authors conducted experiments on multiple-choice tasks in various domains and clinical diagnosis tasks for Alzheimer's disease (AD), including StrategyQA, ScienceQA, MedQA, etc. - **Performance Comparison**: Experimental results show that SELF-TAUGHT significantly outperforms existing zero-shot and few-shot methods across multiple tasks, particularly excelling in tasks requiring expertise. - **Analysis and Discussion**: The authors also validated the effectiveness and generality of each stage of the framework through ablation experiments and tests with different LLMs. ### Main Contributions 1. **Proposed a Zero-Shot Framework**: SELF-TAUGHT can automatically generate high-quality customized examples without relying on manually annotated data, improving LLM performance in solving real-world problems. 2. **Extensive Experimental Validation**: Experiments were conducted across tasks in various domains, demonstrating the framework's effectiveness and robustness. 3. **Performance and Cost Trade-off**: Although the API cost of SELF-TAUGHT is relatively high, its cost-effectiveness remains competitive when performance is prioritized. ### Conclusion The SELF-TAUGHT framework provides an effective solution to reduce reliance on manually annotated data and improve LLM performance in specific domains. Future work can further optimize the cost-effectiveness of the framework and explore its application in more domains.

Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations

Are Human-generated Demonstrations Necessary for In-context Learning?

Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models

AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations

Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactions

Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Task-Level Thinking Steps Help Large Language Models for Challenging Classification Task

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

Teaching Large Language Models to Self-Debug

Human-Instruction-Free LLM Self-Alignment with Limited Samples

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach

Teaching Language Models to Self-Improve through Interactive Demonstrations

Misconfidence-based Demonstration Selection for LLM In-Context Learning

Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework

Large Language Models Can Self-Improve in Long-context Reasoning

How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective