DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using Large Language Models

Rajat Rawat,Hudson McBride,Dhiyaan Nirmal,Rajarshi Ghosh,Jong Moon,Dhruv Alamuri,Sean O'Brien,Kevin Zhu
2024-09-03
Abstract:As large language models (LLMs) gain traction in healthcare, concerns about their susceptibility to demographic biases are growing. We introduce {DiversityMedQA}, a novel benchmark designed to assess LLM responses to medical queries across diverse patient demographics, such as gender and ethnicity. By perturbing questions from the MedQA dataset, which comprises medical board exam questions, we created a benchmark that captures the nuanced differences in medical diagnosis across varying patient profiles. Our findings reveal notable discrepancies in model performance when tested against these demographic variations. Furthermore, to ensure the perturbations were accurate, we also propose a filtering strategy that validates each perturbation. By releasing DiversityMedQA, we provide a resource for evaluating and mitigating demographic bias in LLM medical diagnoses.
Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the issue of gender and racial bias when using large language models (LLMs) in medical diagnosis. Specifically, the researchers developed a new benchmark tool called DiversityMedQA, aimed at evaluating LLM responses across different patient backgrounds to reveal performance disparities of these models when handling medical queries from various demographic groups. By perturbing questions in the MedQA dataset, the researchers created medical diagnostic scenarios that reflect different demographic characteristics to capture the subtle differences in these models' performance under varying patient profiles. The study found significant performance differences in the models when faced with these demographic variables. Additionally, to ensure the accuracy of the perturbations, the researchers proposed a filtering strategy to validate each perturbation. By releasing DiversityMedQA, the researchers provide a resource for assessing and mitigating demographic biases in LLMs used for medical diagnosis.