Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases

Risako Ando,Takanobu Morishita,Hirohiko Abe,Koji Mineshima,Mitsuhiro Okada

2023-06-22

Abstract:This paper investigates whether current large language models exhibit biases in logical reasoning, similar to humans. Specifically, we focus on syllogistic reasoning, a well-studied form of inference in the cognitive science of human deduction. To facilitate our analysis, we introduce a dataset called NeuBAROCO, originally designed for psychological experiments that assess human logical abilities in syllogistic reasoning. The dataset consists of syllogistic inferences in both English and Japanese. We examine three types of biases observed in human syllogistic reasoning: belief biases, conversion errors, and atmosphere effects. Our findings demonstrate that current large language models struggle more with problems involving these three types of biases.

Computation and Language,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is whether current large - language models exhibit biases similar to those of humans in logical reasoning. Specifically, the research focuses on syllogistic reasoning, a form of reasoning that has been widely studied in human cognitive science. For the analysis, the author introduced a dataset named NeuBAROCO, which was originally designed for psychological experiments to evaluate human logical abilities in syllogistic reasoning. The dataset contains syllogistic reasoning in English and Japanese. The research focuses on examining three types of human syllogistic - reasoning biases: belief bias, conversion error, and atmosphere effect. The study found that current large - language models have more difficulty when dealing with problems involving these three biases. The main contributions of the paper include: 1. Proposing the NeuBAROCO dataset, specifically designed for syllogistic reasoning, which can be a valuable resource for evaluating human biases in language models. 2. Using this dataset to evaluate the logical - reasoning abilities of several of the latest large - language models in English and Japanese. 3. The evaluation results show that current large - language models have significant deficiencies when faced with wrong problems that are likely to lead to the above three biases. Through these studies, the author hopes to further understand the performance of large - language models in logical reasoning and explore the differences between them and human reasoning.

Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases

Exploring Reasoning Biases in Large Language Models Through Syllogism: Insights from the NeuBAROCO Dataset

Concise and Organized Perception Facilitates Large Language Models for Deductive Reasoning.

A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences

(Ir)rationality and Cognitive Biases in Large Language Models

LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models

Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey

Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

Language models show human-like content effects on reasoning tasks

Analyzing Social Biases in Japanese Large Language Models

Studying and improving reasoning in humans and machines

Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice Questions

SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning

Reliable Reasoning Beyond Natural Language

Birth defects among children born to a population occupationally exposed to pesticides in Colombia.

A Study on the Representativeness Heuristics Problem in Large Language Models

Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning