The Decoy Dilemma in Online Medical Information Evaluation: A Comparative Study of Credibility Assessments by LLM and Human Judges

Jiqun Liu,Jiangen He
2024-11-23
Abstract:Can AI be cognitively biased in automated information judgment tasks? Despite recent progresses in measuring and mitigating social and algorithmic biases in AI and large language models (LLMs), it is not clear to what extent LLMs behave "rationally", or if they are also vulnerable to human cognitive bias triggers. To address this open problem, our study, consisting of a crowdsourcing user experiment and a LLM-enabled simulation experiment, compared the credibility assessments by LLM and human judges under potential decoy effects in an information retrieval (IR) setting, and empirically examined the extent to which LLMs are cognitively biased in COVID-19 medical (mis)information assessment tasks compared to traditional human assessors as a baseline. The results, collected from a between-subject user experiment and a LLM-enabled replicate experiment, demonstrate that 1) Larger and more recent LLMs tend to show a higher level of consistency and accuracy in distinguishing credible information from misinformation. However, they are more likely to give higher ratings for misinformation due to the presence of a more salient, decoy misinformation result; 2) While decoy effect occurred in both human and LLM assessments, the effect is more prevalent across different conditions and topics in LLM judgments compared to human credibility ratings. In contrast to the generally assumed "rationality" of AI tools, our study empirically confirms the cognitive bias risks embedded in LLM agents, evaluates the decoy impact on LLMs against human credibility assessments, and thereby highlights the complexity and importance of debiasing AI agents and developing psychology-informed AI audit techniques and policies for automated judgment tasks and beyond.
Information Retrieval,Artificial Intelligence,Human-Computer Interaction
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to evaluate the cognitive biases of large - language models (LLMs) in the task of online medical information evaluation, especially whether LLMs are more vulnerable to the Decoy Effect compared to human evaluators. Specifically, the research focuses on two main research questions: 1. **How vulnerable are LLM agents to the decoy effect when evaluating the credibility of online medical information in web searches, compared to human evaluators?** - This question aims to explore the performance of LLMs in the face of decoy information, especially the differences from human evaluators in the ability to identify and distinguish between credible information and misinformation. 2. **How does the vulnerability of LLM agents to the decoy effect change under different topics and evaluation contexts?** - This question further refines the first question and aims to understand whether there are differences in the sensitivity of LLMs to the decoy effect when dealing with different medical topics and evaluation environments. Through these questions, the research hopes to reveal the potential cognitive biases of LLMs in automated information - evaluation tasks and evaluate the impact of these biases on medical information - evaluation tasks. This not only helps to understand the performance of LLMs in complex decision - making tasks, but also provides a basis for the development of more reliable artificial intelligence auditing techniques and policies.