Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths

Naryeong Kim,Sungmin Kang,Gabin An,Shin Yoo
2024-12-11
Abstract:Large Language Models are increasingly used to build agents to perform more complex tasks. As LLMs perform more complicated reasoning through longer interactions, self-consistency, i.e., the idea that the answer obtained from sampling and marginalising a number of multiple independent inferences is more likely to be correct, has received much attention as a simple validation technique. This paper aims to empirically verify this intuitive hypothesis by predicting the correctness of answers obtained using self-consistency from properties of the samples of reasoning paths. We introduce Lachesis, a predictive model for self-consistency based LLM inferences, and empirically evaluate it using AutoFL, a recently proposed LLM-based fault localisation technique, as the target technique that uses self-consistency. Lachesis converts collected reasoning paths from AutoFL using specifically designed reasoning path representations, and trains LSTM and GCN models to predict whether a given set of reasoning paths would result in a correct answer. The results suggest that Lachesis can predict the correctness of answers with a precision of up to 0.8136, highlighting the possibility of training a predictive model that can allow early termination of inferences that are not likely to be successful.
Software Engineering
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: **How to predict the correctness of self - consistency reasoning results through the structural characteristics of the reasoning path during the reasoning process of large - language models (LLMs)**. Specifically, the author hopes to predict whether the final answer is correct in the early stage of the reasoning process by analyzing the structural characteristics of the reasoning path, thereby reducing unnecessary computational costs and resource consumption. ### Detailed Explanation 1. **Background and Motivation** - As LLMs are more and more widely used, especially when performing complex tasks, self - consistency has received extensive attention as a verification technique. The basic idea of self - consistency is that by performing multiple independent inferences and aggregating the results, if multiple reasoning paths converge to the same answer, then this answer is more likely to be correct. - However, using self - consistency requires multiple queries to LLMs, which is not only computationally costly but may also have a negative impact on the environment. 2. **Research Question** - The question raised in the paper is: Can the results of self - consistency reasoning be predicted through the structural characteristics of the reasoning path before the LLMs generate answers? If so, this will allow the early termination of those inferences that are less likely to succeed, thereby reducing computational costs. 3. **Solution** - To this end, the author introduced the Lachesis model, which aims to predict the correctness of self - consistency reasoning results according to the structural characteristics of the reasoning path. - Lachesis represents the reasoning path in two ways: LLM Inference Matrix (LIM) and LLM Inference Graph (LIG), and uses LSTM and GCN models for prediction. 4. **Experimental Verification** - The author uses AutoFL (an LLM - based fault - location tool) as the target technology to evaluate the performance of Lachesis. - The experimental results show that Lachesis can predict whether the reasoning path will lead to the correct answer with a precision of 0.8136. ### Formula Summary - Self - consistency score formula: \[ \text{confidence}=\max_{m\in M}\text{score}(m) \] where \(M\) represents the set of methods covered by failed tests, and \(\text{score}(m)\) represents the voting score of each method. Through these methods, the paper shows how to use the structural characteristics of the reasoning path to predict the correctness of LLMs' reasoning results, thereby optimizing resource utilization and improving efficiency.