Development and evaluation of a manual segmentation protocol for deep grey matter in multiple sclerosis: Towards accelerated semi-automated references

Alexandra de Sitter,Jessica Burggraaff,Fabian Bartel,Miklos Palotai,Yaou Liu,Jorge Simoes,Serena Ruggieri,Katharina Schregel,Stefan Ropele,Maria A Rocca,Claudio Gasperini,Antonio Gallo,Menno M Schoonheim,Michael Amann,Marios Yiannakas,Deborah Pareto,Mike P Wattjes,Jaume Sastre-Garriga,Ludwig Kappos,Massimo Filippi,Christian Enzinger,Jette Frederiksen,Bernard Uitdehaag,Charles R G Guttmann,Frederik Barkhof,Hugo Vrenken
DOI: https://doi.org/10.1016/j.nicl.2021.102659
Abstract:Background: Deep grey matter (dGM) structures, particularly the thalamus, are clinically relevant in multiple sclerosis (MS). However, segmentation of dGM in MS is challenging; labeled MS-specific reference sets are needed for objective evaluation and training of new methods. Objectives: This study aimed to (i) create a standardized protocol for manual delineations of dGM; (ii) evaluate the reliability of the protocol with multiple raters; and (iii) evaluate the accuracy of a fast-semi-automated segmentation approach (FASTSURF). Methods: A standardized manual segmentation protocol for caudate nucleus, putamen, and thalamus was created, and applied by three raters on multi-center 3D T1-weighted MRI scans of 23 MS patients and 12 controls. Intra- and inter-rater agreement was assessed through intra-class correlation coefficient (ICC); spatial overlap through Jaccard Index (JI) and generalized conformity index (CIgen). From sparse delineations, FASTSURF reconstructed full segmentations; accuracy was assessed both volumetrically and spatially. Results: All structures showed excellent agreement on expert manual outlines: intra-rater JI > 0.83; inter-rater ICC ≥ 0.76 and CIgen ≥ 0.74. FASTSURF reproduced manual references excellently, with ICC ≥ 0.97 and JI ≥ 0.92. Conclusions: The manual dGM segmentation protocol showed excellent reproducibility within and between raters. Moreover, combined with FASTSURF a reliable reference set of dGM segmentations can be produced with lower workload.
What problem does this paper attempt to address?