Automated Theorem Provers Help Improve Large Language Model Reasoning

Lachlan McGinness,Peter Baumgartner
DOI: https://doi.org/10.29007/2n9m
2024-08-07
Abstract:In this paper we demonstrate how logic programming systems and Automated first-order logic Theorem Provers (ATPs) can improve the accuracy of Large Language Models (LLMs) for logical reasoning tasks where the baseline performance is given by direct LLM solutions. We first evaluate LLM reasoning on steamroller problems using the PRONTOQA benchmark. We show how accuracy can be improved with a neuro-symbolic architecture where the LLM acts solely as a front-end for translating a given problem into a formal logic language and an automated reasoning engine is called for solving it. However, this approach critically hinges on the correctness of the LLM translation. To assess this translation correctness, we secondly define a framework of syntactic and semantic error categories. We implemented the framework and used it to identify errors that LLMs make in the benchmark domain. Based on these findings, we thirdly extended our method with capabilities for automatically correcting syntactic and semantic errors. For semantic error correction we integrate first-order logic ATPs, which is our main and novel contribution. We demonstrate that this approach reduces semantic errors significantly and further increases the accurracy of LLM logical reasoning.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issue of insufficient accuracy of large language models (LLMs) in logical reasoning tasks. Specifically, the authors explore how to improve the performance of LLMs in logical reasoning tasks by integrating Automated Theorem Provers (ATPs). ### Main Objectives: 1. **Improve Accuracy**: Enhance the accuracy of LLMs in logical reasoning tasks by combining them with automated theorem provers. 2. **Error Classification**: Define a framework to classify various errors made by LLMs when translating natural language problems into formal logic and evaluate these errors. 3. **Automatic Error Correction**: Propose a method to automatically detect and correct syntactic and semantic errors generated by LLMs during the translation process. ### Method Overview: - **Neuro-Symbolic Architecture**: LLMs act as the front-end, responsible for translating natural language problems into formal logic representations, which are then solved by an automated reasoning engine. - **Error Classification Framework**: Define syntactic errors (such as symbol errors, natural language errors, etc.) and semantic errors (shallow semantic errors and deep semantic errors). - **SEDAC Algorithm**: Used to automatically detect and correct these errors, including automatic fixing of syntactic errors and classification and partial automatic correction of semantic errors. ### Experimental Results: - The use of automated theorem provers significantly improved the accuracy of LLMs in logical reasoning tasks, especially when combined with automatic error correction algorithms. - Error classification and automatic error correction techniques not only improved accuracy but also provided valuable feedback for subsequent improvements. Through this approach, the authors demonstrate how to effectively enhance the performance of LLMs in logical reasoning tasks and provide new ideas and technical means for further research.