The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges

Jiqun Liu,Jiangen He

2024-11-23

Abstract:Can AI be cognitively biased in automated information judgment tasks? Despite recent progresses in measuring and mitigating social and algorithmic biases in AI and large language models (LLMs), it is not clear to what extent LLMs behave "rationally", or if they are also vulnerable to human cognitive bias triggers. To address this open problem, our study, consisting of a crowdsourcing user experiment and a LLM-enabled simulation experiment, compared the credibility assessments by LLM and human judges under potential decoy effects in an information retrieval (IR) setting, and empirically examined the extent to which LLMs are cognitively biased in COVID-19 medical (mis)information assessment tasks compared to traditional human assessors as a baseline. The results, collected from a between-subject user experiment and a LLM-enabled replicate experiment, demonstrate that 1) Larger and more recent LLMs tend to show a higher level of consistency and accuracy in distinguishing credible information from misinformation. However, they are more likely to give higher ratings for misinformation due to the presence of a more salient, decoy misinformation result; 2) While decoy effect occurred in both human and LLM assessments, the effect is more prevalent across different conditions and topics in LLM judgments compared to human credibility ratings. In contrast to the generally assumed "rationality" of AI tools, our study empirically confirms the cognitive bias risks embedded in LLM agents, evaluates the decoy impact on LLMs against human credibility assessments, and thereby highlights the complexity and importance of debiasing AI agents and developing psychology-informed AI audit techniques and policies for automated judgment tasks and beyond.

Information Retrieval,Artificial Intelligence,Human-Computer Interaction

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to evaluate the cognitive biases of large - language models (LLMs) in the task of online medical information evaluation, especially whether LLMs are more vulnerable to the Decoy Effect compared to human evaluators. Specifically, the research focuses on two main research questions: 1. **How vulnerable are LLM agents to the decoy effect when evaluating the credibility of online medical information in web searches, compared to human evaluators?** - This question aims to explore the performance of LLMs in the face of decoy information, especially the differences from human evaluators in the ability to identify and distinguish between credible information and misinformation. 2. **How does the vulnerability of LLM agents to the decoy effect change under different topics and evaluation contexts?** - This question further refines the first question and aims to understand whether there are differences in the sensitivity of LLMs to the decoy effect when dealing with different medical topics and evaluation environments. Through these questions, the research hopes to reveal the potential cognitive biases of LLMs in automated information - evaluation tasks and evaluate the impact of these biases on medical information - evaluation tasks. This not only helps to understand the performance of LLMs in complex decision - making tasks, but also provides a basis for the development of more reliable artificial intelligence auditing techniques and policies.

The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges

Humans or LLMs as the Judge? A Study on Judgement Biases

Cognitive Bias in Decision-Making with LLMs

Leveraging artificial intelligence to detect ethical concerns in medical research: a case study

AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment

Evaluation and mitigation of cognitive biases in medical language models

Investigating Bias in LLM-Based Bias Detection: Disparities between LLMs and Human Perception

Artificial intelligence and judicial decision-making: Evaluating the role of AI in debiasing

Challenging the appearance of machine intelligence: Cognitive bias in LLMs and Best Practices for Adoption

Deciphering Deception: How Different Rhetoric of AI Language Impacts Users' Sense of Truth in LLMs

Unmasking the Shadows of AI: Investigating Deceptive Capabilities in Large Language Models

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools

Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation

Addressing cognitive bias in medical language models

Influence of believed AI involvement on the perception of digital medical advice

Just because you're paranoid doesn't mean they won't side with the plaintiff: Examining perceptions of liability about AI in radiology

People Perceive Algorithmic Assessments as Less Fair and Trustworthy Than Identical Human Assessments

The Effects of AI-based Credibility Indicators on the Detection and Spread of Misinformation under Social Influence

Right, No Matter Why: AI Fact-checking and AI Authority in Health-related Inquiry Settings

Bias patterns in the application of LLMs for clinical decision support: A comprehensive study

On scalable oversight with weak LLMs judging strong LLMs