Abstract:In this paper we demonstrate how logic programming systems and Automated first-order logic Theorem Provers (ATPs) can improve the accuracy of Large Language Models (LLMs) for logical reasoning tasks where the baseline performance is given by direct LLM solutions. We first evaluate LLM reasoning on steamroller problems using the PRONTOQA benchmark. We show how accuracy can be improved with a neuro-symbolic architecture where the LLM acts solely as a front-end for translating a given problem into a formal logic language and an automated reasoning engine is called for solving it. However, this approach critically hinges on the correctness of the LLM translation. To assess this translation correctness, we secondly define a framework of syntactic and semantic error categories. We implemented the framework and used it to identify errors that LLMs make in the benchmark domain. Based on these findings, we thirdly extended our method with capabilities for automatically correcting syntactic and semantic errors. For semantic error correction we integrate first-order logic ATPs, which is our main and novel contribution. We demonstrate that this approach reduces semantic errors significantly and further increases the accurracy of LLM logical reasoning.

What problem does this paper attempt to address?

The paper aims to address the issue of insufficient accuracy of large language models (LLMs) in logical reasoning tasks. Specifically, the authors explore how to improve the performance of LLMs in logical reasoning tasks by integrating Automated Theorem Provers (ATPs). ### Main Objectives: 1. **Improve Accuracy**: Enhance the accuracy of LLMs in logical reasoning tasks by combining them with automated theorem provers. 2. **Error Classification**: Define a framework to classify various errors made by LLMs when translating natural language problems into formal logic and evaluate these errors. 3. **Automatic Error Correction**: Propose a method to automatically detect and correct syntactic and semantic errors generated by LLMs during the translation process. ### Method Overview: - **Neuro-Symbolic Architecture**: LLMs act as the front-end, responsible for translating natural language problems into formal logic representations, which are then solved by an automated reasoning engine. - **Error Classification Framework**: Define syntactic errors (such as symbol errors, natural language errors, etc.) and semantic errors (shallow semantic errors and deep semantic errors). - **SEDAC Algorithm**: Used to automatically detect and correct these errors, including automatic fixing of syntactic errors and classification and partial automatic correction of semantic errors. ### Experimental Results: - The use of automated theorem provers significantly improved the accuracy of LLMs in logical reasoning tasks, especially when combined with automatic error correction algorithms. - Error classification and automatic error correction techniques not only improved accuracy but also provided valuable feedback for subsequent improvements. Through this approach, the authors demonstrate how to effectively enhance the performance of LLMs in logical reasoning tasks and provide new ideas and technical means for further research.

Automated Theorem Provers Help Improve Large Language Model Reasoning

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies

Proof Automation with Large Language Models

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning

LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning

When Do Program-of-Thought Works for Reasoning?

Coupling Large Language Models with Logic Programming for Robust and General Reasoning from Text

Reliable Reasoning Beyond Natural Language

Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications

Towards Large Language Models as Copilots for Theorem Proving in Lean

Strategies for Improving NL-to-FOL Translation with LLMs: Data Generation, Incremental Fine-Tuning, and Verification

Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning

LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models

Automated Theorem Proving in Intuitionistic Propositional Logic by Deep Reinforcement Learning

Towards Logically Consistent Language Models via Probabilistic Reasoning

Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification