Abstract:Self-Consistency (SC) is a widely used method to mitigate hallucinations in Large Language Models (LLMs) by sampling the LLM multiple times and outputting the most frequent solution. Despite its benefits, SC results in significant computational costs proportional to the number of samples generated. Previous early-stopping approaches, such as Early Stopping Self Consistency and Adaptive Consistency, have aimed to reduce these costs by considering output consistency, but they do not analyze the quality of the reasoning paths (RPs) themselves. To address this issue, we propose Reasoning-Aware Self-Consistency (RASC), an innovative early-stopping framework that dynamically adjusts the number of sample generations by considering both the output answer and the RPs from Chain of Thought (CoT) prompting. RASC assigns confidence scores sequentially to the generated samples, stops when certain criteria are met, and then employs weighted majority voting to optimize sample usage and enhance answer reliability. We comprehensively test RASC with multiple LLMs across varied QA datasets. RASC outperformed existing methods and significantly reduces sample usage by an average of 80% while maintaining or improving accuracy up to 5% compared to the original SC

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the hallucination problem in large language models (LLMs) when generating reasoning paths (RPs). Specifically, existing self-consistency (SC) methods reduce hallucinations through multiple sampling, but this approach leads to significant computational costs. Although some early stopping strategies (such as Early Stopping Self-Consistency and Adaptive Consistency) attempt to reduce these costs, they mainly focus on output consistency while neglecting the quality of the reasoning paths. To overcome this issue, the authors propose the **Reasoning-Aware Self-Consistency (RASC)** framework. RASC dynamically adjusts the number of samples and optimizes sample usage by considering both the quality of the reasoning paths and output consistency, thereby reducing computational costs while improving the reliability of the answers. Specifically, RASC assigns a confidence score to each generated sample and stops sampling when certain conditions are met, finally determining the final answer through weighted majority voting. ### Main Contributions 1. **Reasoning-Aware Self-Consistency (RASC) Framework**: An innovative early stopping framework is proposed, which dynamically adjusts the number of sample evaluations by considering the quality of the reasoning paths and output consistency. 2. **Confidence Score Approximator**: Seven lightweight text metrics are designed to evaluate content quality and consistency, combined with a weighted majority voting system to enhance sampling efficiency and accuracy. 3. **Robustness Evaluation**: The effectiveness and robustness of RASC are validated on different LLMs and datasets, achieving significant improvements in efficiency and accuracy. ### Experimental Results - **Reduction in Sample Quantity**: RASC significantly reduces the required number of samples in multiple benchmarks, with an average reduction of 80%, while maintaining or improving accuracy. - **Cost Analysis**: Experiments on GPT-3.5-Turbo show that RASC reduces API costs by 84.6% while improving accuracy by 7.9%. - **Generalization Performance**: RASC also performs well on unseen datasets, demonstrating its adaptability to different language models and diverse prompts. ### Conclusion By combining the quality of reasoning paths and output consistency, RASC effectively reduces the computational costs of self-consistency methods while improving the reliability of answers. This approach shows superior performance across various tasks and datasets, providing a new solution for efficient reasoning in large language models.

Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling

Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs

Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning

Path-Consistency: Prefix Enhancement for Efficient Inference in LLM

Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation

Enhancing Language Model Reasoning via Weighted Reasoning in Self-Consistency

Soft Self-Consistency Improves Language Model Agents

Universal Self-Consistency for Large Language Model Generation

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation

Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling

RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought

DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models

Improving Retrieval Augmented Language Model with Self-Reasoning

CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Casual Significance and Consistency

Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning

Learning to Reason via Self-Iterative Process Feedback for Small Language Models

Improving Self Consistency in LLMs through Probabilistic Tokenization

Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths

Atomic Self-Consistency for Better Long Form Generations