Identifying biases in a multicenter MRI database for Parkinson's disease classification: Is the disease classifier a secret site classifier?
Raissa Souza,Anthony Winder,Emma A. M. Stanley,Vibujithan Vigneshwaran,Milton Camacho,Richard Camicioli,Oury Monchi,Matthias Wilms,Nils D. Forkert,Emma A.M. Stanley
DOI: https://doi.org/10.1109/jbhi.2024.3352513
IF: 7.7
2024-01-01
IEEE Journal of Biomedical and Health Informatics
Abstract:Sharing multicenter imaging datasets can be advantageous to increase data diversity and size but may lead to spurious correlations between site-related biological and non-biological image features and target labels, which machine learning (ML) models may exploit as shortcuts. To date, studies analyzing how and if deep learning models may use such effects as a shortcut are scarce. Thus, the aim of this work was to investigate if site-related effects are encoded in the feature space of an established deep learning model designed for Parkinson's disease (PD) classification based on T1-weighted MRI datasets. Therefore, all layers of the PD classifier were frozen, except for the last layer of the network, which was replaced by a linear layer that was exclusively re-trained to predict three potential bias types (biological sex, scanner type, and originating site). Our findings based on a large database consisting of 1880 MRI scans collected across 41 centers show that the feature space of the established PD model (74% accuracy) can be used to classify sex (75% accuracy), scanner type (79% accuracy), and site location (71% accuracy) with high accuracies despite this information never being explicitly provided to the PD model during original training. Overall, the results of this study suggest that trained image-based classifiers may use unwanted shortcuts that are not meaningful for the actual clinical task at hand. This finding may explain why many image-based deep learning models do not perform well when applied to data from centers not contributing to the training set.
computer science, interdisciplinary applications,mathematical & computational biology,medical informatics, information systems