Image segmentation of nasopharyngeal carcinoma using 3D CNN with long-range skip connection and multi-scale feature pyramid

Feng Guo,Canghong Shi,Xiaojie Li,Xi Wu,Jiliu Zhou,Jiancheng Lv
DOI: https://doi.org/10.1007/s00500-020-04708-y
IF: 3.732
2020-01-29
Soft Computing
Abstract:Nasopharyngeal carcinoma (NPC) is one of the most common cancers of the nasopharynx. A structural analysis of NPC can provide vital insights into methods of treatment. However, manually marking the boundaries of NPC in images is tedious, time-consuming, and prone to error. It has become necessary to use computer-based automatic segmentation algorithms to accurately locate NPC. However, this remains a challenging task owing to the high variation (in shape and size) in the structure of the nasopharynx across subjects. Moreover, the nasopharyngeal area is small, and this causes severe imbalance in the foreground and background categories. In this paper, we propose a 3D convolutional neural network with long-range skip connection and multi-scale feature pyramid (SFP) for the segmentation of images of NPC. Unlike the traditional skip connection in residual blocks, which only considers the feature transfer and feature fusion between the same convolutional layer, long-range skip connection with original features from the first convolution in our network is passed to each down-sampling stage using element-wise sum to effectively increase reuse of low-level features and to solve the problems of gradient disappearance and explosion. The multi-scale feature pyramid with a varying atrous rate adapts to images of different sizes to learn multi-scale features, and hierarchical contextual information regarding NPC. To accelerate the convergence of our network, we use deep supervision to generate three auxiliary segmentation maps and merge the weighted loss into the objective function. And we fuse these auxiliary segmentation maps to refine the final segmentation result. In our experiments, the proposed network was trained and tested on 3D magnetic resonance imaging (MRI) images of 120 clinical patients using 5-fold cross-validation. The average dice similarity coefficient (DSC) and average symmetric surface distance (ASSD), used as evaluation metric, were 0.737 and 1.214 mm, respectively. This shows that in terms of results, our method is superior to five state-of-the-art networks and equivalent to the judgment of an experienced physician.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?