Multisite test–retest reliability and compatibility of brain metrics derived from FreeSurfer versions 7.1, 6.0, and 5.3
Elizabeth Haddad,Fabrizio Pizzagalli,Alyssa H. Zhu,Ravi R. Bhatt,Tasfiya Islam,Iyad Ba Gari,Daniel Dixon,Sophia I. Thomopoulos,Paul M. Thompson,Neda Jahanshad
DOI: https://doi.org/10.1002/hbm.26147
IF: 4.8
2022-11-29
Human Brain Mapping
Abstract:We use test–retest data from three public data sets to determine within‐version reliability and between‐version compatibility across 42 regional outputs from FreeSurfer version 7.1, 6.0, and 5.3. Overall, we find generally high within‐version reliability across most versions, however, considerable differences are observed when analyzing between‐version compatibility for regional cortical thickness, surface area, and subcortical volumes. Automatic neuroimaging processing tools provide convenient and systematic methods for extracting features from brain magnetic resonance imaging scans. One tool, FreeSurfer, provides an easy‐to‐use pipeline to extract cortical and subcortical morphometric measures. There have been over 25 stable releases of FreeSurfer, with different versions used across published works. The reliability and compatibility of regional morphometric metrics derived from the most recent version releases have yet to be empirically assessed. Here, we used test–retest data from three public data sets to determine within‐version reliability and between‐version compatibility across 42 regional outputs from FreeSurfer versions 7.1, 6.0, and 5.3. Cortical thickness from v7.1 was less compatible with that of older versions, particularly along the cingulate gyrus, where the lowest version compatibility was observed (intraclass correlation coefficient 0.37–0.61). Surface area of the temporal pole, frontal pole, and medial orbitofrontal cortex, also showed low to moderate version compatibility. We confirm low compatibility between v6.0 and v5.3 of pallidum and putamen volumes, while those from v7.1 were compatible with v6.0. Replication in an independent sample showed largely similar results for measures of surface area and subcortical volumes, but had lower overall regional thickness reliability and compatibility. Batch effect correction may adjust for some inter‐version effects when most sites are run with one version, but results vary when more sites are run with different versions. Age associations in a quality controlled independent sample (N = 106) revealed version differences in results of downstream statistical analysis. We provide a reference to highlight the regional metrics that may yield recent version‐related inconsistencies in published findings. An interactive viewer is provided at http://data.brainescience.org/Freesurfer_Reliability/.
radiology, nuclear medicine & medical imaging,neurosciences,neuroimaging