Abstract:Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge engineering and artificial intelligence. Recently, Large Language Models (LLMs) have emerged as a noteworthy innovation in natural language processing (NLP). However, the question of whether LLMs can effectively address the task of logical reasoning, which requires gradual cognitive inference similar to human intelligence, remains unanswered. To this end, we aim to bridge this gap and provide comprehensive evaluations in this paper. Firstly, to offer systematic evaluations, we select fifteen typical logical reasoning datasets and organize them into deductive, inductive, abductive and mixed-form reasoning settings. Considering the comprehensiveness of evaluations, we include 3 early-era representative LLMs and 4 trending LLMs. Secondly, different from previous evaluations relying only on simple metrics (e.g., \emph{accuracy}), we propose fine-level evaluations in objective and subjective manners, covering both answers and explanations, including \emph{answer correctness}, \emph{explain correctness}, \emph{explain completeness} and \emph{explain redundancy}. Additionally, to uncover the logical flaws of LLMs, problematic cases will be attributed to five error types from two dimensions, i.e., \emph{evidence selection process} and \emph{reasoning process}. Thirdly, to avoid the influences of knowledge bias and concentrate purely on benchmarking the logical reasoning capability of LLMs, we propose a new dataset with neutral content. Based on the in-depth evaluations, this paper finally forms a general evaluation scheme of logical reasoning capability from six dimensions (i.e., \emph{Correct}, \emph{Rigorous}, \emph{Self-aware}, \emph{Active}, \emph{Oriented} and \emph{No hallucination}). It reflects the pros and cons of LLMs and gives guiding directions for future works.

Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability

Reason from Fallacy: Enhancing Large Language Models' Logical Reasoning through Logical Fallacy Understanding

Can LLM Graph Reasoning Generalize beyond Pattern Memorization?

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

Towards Reasoning in Large Language Models: A Survey

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Reasoning in Large Language Models: A Geometric Perspective

The Role of Deductive and Inductive Reasoning in Large Language Models

LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning

Enhancing Logical Reasoning in Large Language Models to Facilitate Legal Applications

Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path

A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences

Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus

Enhancing Recommender Systems with Large Language Model Reasoning Graphs