Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias

Robert J. Jirsaraie,Tobias Kaufmann,Vishnu Bashyam,Guray Erus,Joan L. Luby,Lars T. Westlye,Christos Davatzikos,Deanna M. Barch,Aristeidis Sotiras
DOI: https://doi.org/10.1002/hbm.26144
IF: 4.8
2022-11-09
Human Brain Mapping
Abstract:We examined the out‐of‐sample predictions of previously developed brain age models, which were chosen based on the size and diversity of their training data. Broadly speaking, no single model was uniformly better across all test samples or facets of generalizability. The largest limitations affecting generalizability were scanner‐related variance and biased brain age predictions. Machine learning has been increasingly applied to neuroimaging data to predict age, deriving a personalized biomarker with potential clinical applications. The scientific and clinical value of these models depends on their applicability to independently acquired scans from diverse sources. Accordingly, we evaluated the generalizability of two brain age models that were trained across the lifespan by applying them to three distinct early‐life samples with participants aged 8–22 years. These models were chosen based on the size and diversity of their training data, but they also differed greatly in their processing methods and predictive algorithms. Specifically, one brain age model was built by applying gradient tree boosting (GTB) to extracted features of cortical thickness, surface area, and brain volume. The other model applied a 2D convolutional neural network (DBN) to minimally preprocessed slices of T1‐weighted scans. Additional model variants were created to understand how generalizability changed when each model was trained with data that became more similar to the test samples in terms of age and acquisition protocols. Our results illustrated numerous trade‐offs. The GTB predictions were relatively more accurate overall and yielded more reliable predictions when applied to lower quality scans. In contrast, the DBN displayed the most utility in detecting associations between brain age gaps and cognitive functioning. Broadly speaking, the largest limitations affecting generalizability were acquisition protocol differences and biased brain age estimates. If such confounds could eventually be removed without post‐hoc corrections, brain age predictions may have greater utility as personalized biomarkers of healthy aging.
radiology, nuclear medicine & medical imaging,neurosciences,neuroimaging
What problem does this paper attempt to address?