Demographic Reporting in Publicly Available Chest Radiograph Data Sets: Opportunities for Mitigating Sex and Racial Disparities in Deep Learning Models

Paul H Yi,Tae Kyung Kim,Eliot Siegel,Noushin Yahyavi-Firouz-Abadi
DOI: https://doi.org/10.1016/j.jacr.2021.08.018
Abstract:Objective: Data sets with demographic imbalances can introduce bias in deep learning models and potentially amplify existing health disparities. We evaluated the reporting of demographics and potential biases in publicly available chest radiograph (CXR) data sets. Methods: We reviewed publicly available CXR data sets available on February 1, 2021, with >100 CXRs and performed a thorough search of various repositories, including Radiopaedia and Kaggle. For each data set, we recorded the total number of images and whether the data set reported demographic variables (age, race or ethnicity, sex, insurance status) in aggregate and on an image-level basis. Results: Twenty-three CXR data sets were included (range, 105-371,858 images). Most data sets reported demographics in some form (19 of 23; 82.6%) and on an image level (17 of 23; 73.9%). The majority reported age (19 of 23; 82.6%) and sex (18 of 23; 78.2%), but a minority reported race or ethnicity (2 of 23; 8.7%) and insurance status (1 of 23; 4.3%). Of the 13 data sets with sex distribution readily available, the average breakdown was 55.2% male subjects, ranging from 47.8% to 69.7% male representation. Of these, 8 (61.5%) overrepresented male subjects and 5 (38.5%) overrepresented female subjects. Discussion: Although most publicly available CXR data sets report age and sex on an image-basis level, few report race or ethnicity and insurance status. Furthermore, these data sets frequently underrepresent one of the sexes, more frequently the female sex. We recommend that data sets report standard demographic variables, and when possible, balance demographic representation to mitigate bias. Furthermore, for researchers using these data sets, we recommend that attention be paid to balancing demographic labels in addition to disease labels, as well as developing training methods that can account for these imbalances.
What problem does this paper attempt to address?