Abstract:Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy. Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent system, we leveraged GPT-4 Turbo to facilitate interactions among four simulated agents to replicate clinical team dynamics. Each agent has a distinct role: 1) To make the initial and final diagnosis after considering the discussions, 2) The devil's advocate and correct confirmation and anchoring bias, 3) The tutor and facilitator of the discussion to reduce premature closure bias, and 4) To record and summarize the findings. A total of 80 simulations were evaluated for the accuracy of initial diagnosis, top differential diagnosis and final two differential diagnoses. Findings: In a total of 80 responses evaluating both initial and final diagnoses, the initial diagnosis had an accuracy of 0% (0/80), but following multi-agent discussions, the accuracy for the top differential diagnosis increased to 71.3% (57/80), and for the final two differential diagnoses, to 80.0% (64/80). The system demonstrated an ability to reevaluate and correct misconceptions, even in scenarios with misleading initial investigations. Interpretation: The LLM-driven multi-agent conversation system shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios.

MindScope: Exploring cognitive biases in large language models through Multi-Agent Systems

Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework

Cognitive Bias in Decision-Making with LLMs

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

FairMonitor: A Dual-framework for Detecting Stereotypes and Biases in Large Language Models

Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models

Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias

A Comprehensive Evaluation of Cognitive Biases in LLMs

Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments

Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice Questions

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Benchmarking Cognitive Biases in Large Language Models as Evaluators

Quantifying Bias in Agentic Large Language Models: A Benchmarking Approach

Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View

CBEval: A framework for evaluating and interpreting cognitive biases in LLMs

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Metacognitive Myopia in Large Language Models