GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach

Lang Cao
2024-04-21
Abstract:Large Language Models (LLMs) have showcased impressive reasoning capabilities, particularly when guided by specifically designed prompts in complex reasoning tasks such as math word problems. These models typically solve tasks using a chain-of-thought approach, which not only bolsters their reasoning abilities but also provides valuable insights into their problem-solving process. However, there is still significant room for enhancing the reasoning abilities of LLMs. Some studies suggest that the integration of an LLM output verifier can boost reasoning accuracy without necessitating additional model training. In this paper, we follow these studies and introduce a novel graph-based method to further augment the reasoning capabilities of LLMs. We posit that multiple solutions to a reasoning task, generated by an LLM, can be represented as a reasoning graph due to the logical connections between intermediate steps from different reasoning paths. Therefore, we propose the Reasoning Graph Verifier (GraphReason) to analyze and verify the solutions generated by LLMs. By evaluating these graphs, models can yield more accurate and reliable results.Our experimental results show that our graph-based verification method not only significantly enhances the reasoning abilities of LLMs but also outperforms existing verifier methods in terms of improving these models' reasoning performance.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to enhance the reasoning ability of large - language models (LLMs) through a graph - based method without additional training for these models. Specifically, the authors propose a new method named "Reasoning Graph Verifier" (GraphReason), which can analyze and verify the solutions generated by LLMs, thereby improving the accuracy and reliability of the model when solving complex reasoning tasks (such as math word problems). ### Main contributions of the paper 1. **Proposing a graph - based verification method**: GraphReason aims to significantly improve the reasoning ability of large - language models without additional training for LLMs. 2. **Establishing an arithmetic reasoning benchmark**: Use three math word problem datasets to illustrate the basic reasoning performance of large - language models and provide a fair comparison of the performance of various existing validators. 3. **Experimental results show**: This method outperforms other enhancement methods, and the paper also provides an extensive analysis of the limitations and future potential of GraphReason. ### Method overview 1. **Graph construction**: First, group the generated solutions according to the final answer. Then, split the reasoning path by steps and merge the intermediate steps of the same expression into one node to form a reasoning graph. 2. **Graph classification**: Use Graph Isomorphism Network (GIN) for node feature propagation and aggregation to generate a representation of the reasoning graph. At the same time, calculate the sum of scores of all solutions for the same final answer. 3. **Validator design**: Train a validator model, which judges whether the final answer is correct according to the reasoning graph and the sum of solution scores. 4. **Answer verification**: In the prediction stage, use the trained validator to evaluate the score of each reasoning graph, and select the answer corresponding to the reasoning graph with the highest score as the final predicted answer. ### Experimental results - **Performance improvement**: GraphReason significantly improves the reasoning ability of gpt - 3.5 - turbo on three datasets. For example, on the GSM8K dataset, the accuracy rate is increased from 72.7% to 85.7%. - **Outperforming other methods**: Compared with other validator methods based on the same LLMs output, GraphReason achieves the state - of - the - art performance on all three datasets. - **Ablation experiment**: The importance of each component is verified through ablation experiments, especially the effectiveness of the solution semantic information provided by the basic validator and the graph structure. ### Conclusion GraphReason effectively enhances the reasoning ability of large - language models by introducing a graph - based verification method, especially when dealing with complex multi - step reasoning tasks. This method not only improves the accuracy and reliability of the model, but also provides a new research direction for future improvement of reasoning ability.