Comparison of Inter-Method Agreement and Reliability for Automatic Brain Volumetry Using Three Different Clinically Available Software Packages

Kwang Ho Choi,Young Jin Heo,Hye Jin Baek,Jun-Ho Kim,Jeong Yoon Jang
DOI: https://doi.org/10.3390/medicina60050727
2024-04-27
Abstract:Background and Objectives: No comparative study has evaluated the inter-method agreement and reliability between Heuron AD and other clinically available brain volumetric software packages. Hence, we aimed to investigate the inter-method agreement and reliability of three clinically available brain volumetric software packages: FreeSurfer (FS), NeuroQuant® (NQ), and Heuron AD (HAD). Materials and Methods: In this study, we retrospectively included 78 patients who underwent conventional three-dimensional (3D) T1-weighed imaging (T1WI) to evaluate their memory impairment, including 21 with normal objective cognitive function, 24 with mild cognitive impairment, and 33 with Alzheimer's disease (AD). All 3D T1WI scans were analyzed using three different volumetric software packages. Repeated-measures analysis of variance, intraclass correlation coefficient, effect size measurements, and Bland-Altman analysis were used to evaluate the inter-method agreement and reliability. Results: The measured volumes demonstrated substantial to almost perfect agreement for most brain regions bilaterally, except for the bilateral globi pallidi. However, the volumes measured using the three software packages showed significant mean differences for most brain regions, with consistent systematic biases and wide limits of agreement in the Bland-Altman analyses. The pallidum showed the largest effect size in the comparisons between NQ and FS (5.20-6.93) and between NQ and HAD (2.01-6.17), while the cortical gray matter showed the largest effect size in the comparisons between FS and HAD (0.79-1.91). These differences and variations between the software packages were also observed in the subset analyses of 45 patients without AD and 33 patients with AD. Conclusions: Despite their favorable reliability, the software-based brain volume measurements showed significant differences and systematic biases in most regions. Thus, these volumetric measurements should be interpreted based on the type of volumetric software used, particularly for smaller structures. Moreover, users should consider the replaceability-related limitations when using these packages in real-world practice.
What problem does this paper attempt to address?