Reproducibility and across-site transferability of an improved deep learning approach for aneurysm detection and segmentation in time-of-flight MR-angiograms

Marius Vach,Luisa Wolf,Daniel Weiss,Vivien Lorena Ivan,Björn B Hofmann,Ludmila Himmelspach,Julian Caspers,Christian Rubbert
DOI: https://doi.org/10.1038/s41598-024-68805-w
2024-08-13
Abstract:This study aimed to (1) replicate a deep-learning-based model for cerebral aneurysm segmentation in TOF-MRAs, (2) improve the approach by testing various fully automatic pre-processing pipelines, and (3) rigorously validate the model's transferability on independent, external test-datasets. A convolutional neural network was trained on 235 TOF-MRAs acquired on local scanners from a single vendor to segment intracranial aneurysms. Different pre-processing pipelines including bias field correction, resampling, cropping and intensity-normalization were compared regarding their effect on model performance. The models were tested on independent, external same-vendor and other-vendor test-datasets, each comprised of 70 TOF-MRAs, including patients with and without aneurysms. The best-performing model achieved excellent results on the external same-vendor test-dataset, surpassing the results of the previous publication with an improved sensitivity (0.97 vs. ~ 0.86), a higher Dice score coefficient (DSC, 0.60 ± 0.25 vs. 0.53 ± 0.31), and an improved false-positive rate (0.87 ± 1.35 vs. ~ 2.7 FPs/case). The model further showed excellent performance in the external other-vendor test-datasets (DSC 0.65 ± 0.26; sensitivity 0.92, 0.96 ± 2.38 FPs/case). Specificity was 0.38 and 0.53, respectively. Raising the voxel-size from 0.5 × 0.5×0.5 mm to 1 × 1×1 mm reduced the false-positive rate seven-fold. This study successfully replicated core principles of a previous approach for detecting and segmenting cerebral aneurysms in TOF-MRAs with a robust, fully automatable pre-processing pipeline. The model demonstrated robust transferability on two independent external datasets using TOF-MRAs from the same scanner vendor as the training dataset and from other vendors. These findings are very encouraging regarding the clinical application of such an approach.
What problem does this paper attempt to address?