Convolutional neural networks for skull-stripping in brain MR imaging using silver standard masks

Oeslle Lucena,Roberto Souza,Letícia Rittner,Richard Frayne,Roberto Lotufo
DOI: https://doi.org/10.1016/j.artmed.2019.06.008
Abstract:Manual annotation is considered to be the "gold standard" in medical imaging analysis. However, medical imaging datasets that include expert manual segmentation are scarce as this step is time-consuming, and therefore expensive. Moreover, single-rater manual annotation is most often used in data-driven approaches making the network biased to only that single expert. In this work, we propose a CNN for brain extraction in magnetic resonance (MR) imaging, that is fully trained with what we refer to as "silver standard" masks. Therefore, eliminating the cost associated with manual annotation. Silver standard masks are generated by forming the consensus from a set of eight, public, non-deep-learning-based brain extraction methods using the Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm. Our method consists of (1) developing a dataset with "silver standard" masks as input, and implementing (2) a tri-planar method using parallel 2D U-Net-based convolutional neural networks (CNNs) (referred to as CONSNet). This term refers to our integrated approach, i.e., training with silver standard masks and using a 2D U-Net-based architecture. We conducted our analysis using three public datasets: the Calgary-Campinas-359 (CC-359), the LONI Probabilistic Brain Atlas (LPBA40), and the Open Access Series of Imaging Studies (OASIS). Five performance metrics were used in our experiments: Dice coefficient, sensitivity, specificity, Hausdorff distance, and symmetric surface-to-surface mean distance. Our results showed that we outperformed (i.e., larger Dice coefficients) the current state-of-the-art skull-stripping methods without using gold standard annotation for the CNNs training stage. CONSNet is the first deep learning approach that is fully trained using silver standard data and is, thus, more generalizable. Using these masks, we eliminate the cost of manual annotation, decreased inter-/intra-rater variability, and avoided CNN segmentation overfitting towards one specific manual annotation guideline that can occur when gold standard masks are used. Moreover, once trained, our method takes few seconds to process a typical brain image volume using modern a high-end GPU. In contrast, many of the other competitive methods have processing times in the order of minutes.
What problem does this paper attempt to address?