Abstract:Large Language Models are increasingly used to build agents to perform more complex tasks. As LLMs perform more complicated reasoning through longer interactions, self-consistency, i.e., the idea that the answer obtained from sampling and marginalising a number of multiple independent inferences is more likely to be correct, has received much attention as a simple validation technique. This paper aims to empirically verify this intuitive hypothesis by predicting the correctness of answers obtained using self-consistency from properties of the samples of reasoning paths. We introduce Lachesis, a predictive model for self-consistency based LLM inferences, and empirically evaluate it using AutoFL, a recently proposed LLM-based fault localisation technique, as the target technique that uses self-consistency. Lachesis converts collected reasoning paths from AutoFL using specifically designed reasoning path representations, and trains LSTM and GCN models to predict whether a given set of reasoning paths would result in a correct answer. The results suggest that Lachesis can predict the correctness of answers with a precision of up to 0.8136, highlighting the possibility of training a predictive model that can allow early termination of inferences that are not likely to be successful.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: **How to predict the correctness of self - consistency reasoning results through the structural characteristics of the reasoning path during the reasoning process of large - language models (LLMs)**. Specifically, the author hopes to predict whether the final answer is correct in the early stage of the reasoning process by analyzing the structural characteristics of the reasoning path, thereby reducing unnecessary computational costs and resource consumption. ### Detailed Explanation 1. **Background and Motivation** - As LLMs are more and more widely used, especially when performing complex tasks, self - consistency has received extensive attention as a verification technique. The basic idea of self - consistency is that by performing multiple independent inferences and aggregating the results, if multiple reasoning paths converge to the same answer, then this answer is more likely to be correct. - However, using self - consistency requires multiple queries to LLMs, which is not only computationally costly but may also have a negative impact on the environment. 2. **Research Question** - The question raised in the paper is: Can the results of self - consistency reasoning be predicted through the structural characteristics of the reasoning path before the LLMs generate answers? If so, this will allow the early termination of those inferences that are less likely to succeed, thereby reducing computational costs. 3. **Solution** - To this end, the author introduced the Lachesis model, which aims to predict the correctness of self - consistency reasoning results according to the structural characteristics of the reasoning path. - Lachesis represents the reasoning path in two ways: LLM Inference Matrix (LIM) and LLM Inference Graph (LIG), and uses LSTM and GCN models for prediction. 4. **Experimental Verification** - The author uses AutoFL (an LLM - based fault - location tool) as the target technology to evaluate the performance of Lachesis. - The experimental results show that Lachesis can predict whether the reasoning path will lead to the correct answer with a precision of 0.8136. ### Formula Summary - Self - consistency score formula: \[ \text{confidence}=\max_{m\in M}\text{score}(m) \] where \(M\) represents the set of methods covered by failed tests, and \(\text{score}(m)\) represents the voting score of each method. Through these methods, the paper shows how to use the structural characteristics of the reasoning path to predict the correctness of LLMs' reasoning results, thereby optimizing resource utilization and improving efficiency.

Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths

Dissociation of Faithful and Unfaithful Reasoning in LLMs

LLMs for Relational Reasoning: How Far are We?

Learning From Mistakes Makes LLM Better Reasoner

Self-Training Meets Consistency: Improving LLMs' Reasoning With Consistency-Driven Rationale Evaluation

A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models

Alignment Between the Decision-Making Logic of LLMs and Human Cognition: A Case Study on Legal LLMs

Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic Approach

Large Language Models are reasoners with Self-Verification

LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic

A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences

LLMs are Superior Feedback Providers: Bootstrapping Reasoning for Lie Detection with Self-Generated Feedback

Automatic Curriculum Expert Iteration for Reliable LLM Reasoning

Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

LLMs cannot find reasoning errors, but can correct them given the error location

Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

Large Language Models Cannot Self-Correct Reasoning Yet

Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

Understanding and Patching Compositional Reasoning in LLMs

Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification