Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling

Guangya Wan,Yuqi Wu,Jie Chen,Sheng Li
2024-08-30
Abstract:Self-Consistency (SC) is a widely used method to mitigate hallucinations in Large Language Models (LLMs) by sampling the LLM multiple times and outputting the most frequent solution. Despite its benefits, SC results in significant computational costs proportional to the number of samples generated. Previous early-stopping approaches, such as Early Stopping Self Consistency and Adaptive Consistency, have aimed to reduce these costs by considering output consistency, but they do not analyze the quality of the reasoning paths (RPs) themselves. To address this issue, we propose Reasoning-Aware Self-Consistency (RASC), an innovative early-stopping framework that dynamically adjusts the number of sample generations by considering both the output answer and the RPs from Chain of Thought (CoT) prompting. RASC assigns confidence scores sequentially to the generated samples, stops when certain criteria are met, and then employs weighted majority voting to optimize sample usage and enhance answer reliability. We comprehensively test RASC with multiple LLMs across varied QA datasets. RASC outperformed existing methods and significantly reduces sample usage by an average of 80% while maintaining or improving accuracy up to 5% compared to the original SC
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the hallucination problem in large language models (LLMs) when generating reasoning paths (RPs). Specifically, existing self-consistency (SC) methods reduce hallucinations through multiple sampling, but this approach leads to significant computational costs. Although some early stopping strategies (such as Early Stopping Self-Consistency and Adaptive Consistency) attempt to reduce these costs, they mainly focus on output consistency while neglecting the quality of the reasoning paths. To overcome this issue, the authors propose the **Reasoning-Aware Self-Consistency (RASC)** framework. RASC dynamically adjusts the number of samples and optimizes sample usage by considering both the quality of the reasoning paths and output consistency, thereby reducing computational costs while improving the reliability of the answers. Specifically, RASC assigns a confidence score to each generated sample and stops sampling when certain conditions are met, finally determining the final answer through weighted majority voting. ### Main Contributions 1. **Reasoning-Aware Self-Consistency (RASC) Framework**: An innovative early stopping framework is proposed, which dynamically adjusts the number of sample evaluations by considering the quality of the reasoning paths and output consistency. 2. **Confidence Score Approximator**: Seven lightweight text metrics are designed to evaluate content quality and consistency, combined with a weighted majority voting system to enhance sampling efficiency and accuracy. 3. **Robustness Evaluation**: The effectiveness and robustness of RASC are validated on different LLMs and datasets, achieving significant improvements in efficiency and accuracy. ### Experimental Results - **Reduction in Sample Quantity**: RASC significantly reduces the required number of samples in multiple benchmarks, with an average reduction of 80%, while maintaining or improving accuracy. - **Cost Analysis**: Experiments on GPT-3.5-Turbo show that RASC reduces API costs by 84.6% while improving accuracy by 7.9%. - **Generalization Performance**: RASC also performs well on unseen datasets, demonstrating its adaptability to different language models and diverse prompts. ### Conclusion By combining the quality of reasoning paths and output consistency, RASC effectively reduces the computational costs of self-consistency methods while improving the reliability of answers. This approach shows superior performance across various tasks and datasets, providing a new solution for efficient reasoning in large language models.