Development and Validation of A Deep Learning Algorithm for Automated Delineation of Primary Tumor for Nasopharyngeal Carcinoma from Multimodal Magnetic Resonance Images
Y. Sun,L. Lin,Q. Dou,H. Chen,Y. M. Jin,G. Q. Zhou,Y. Q. Tang,W. L. Chen,B. A. Su,F. Liu,C. J. Tao,N. Jiang,J. Y. Li,L. Tang,C. Xie,S. M. Huang,P. A. Heng
DOI: https://doi.org/10.1016/j.ijrobp.2018.07.1017
2018-01-01
Abstract:Application of deep learning algorithm to automatically delineate primary gross tumor volume (GTVp) for nasopharyngeal carcinoma (NPC) can potentially improve target defining accuracy and efficiency in radiotherapy. In this work, we aimed to develop a deep learning algorithm for automated delineation of GTVp for NPC, then evaluate its performance and compare it extensively with qualified radiation oncologists. Four-modal radiotherapy dedicated magnetic resonance (MR) images obtained from totally 1021 radiation-naive NPC patients between September 1st, 2016 and September 30th, 2017, were included in this study. GTVp on four modal images was delineated by an expert panel consisted of 2 radiation oncologists and a radiologist. A 3-dimentional convolutional neural network was trained and validated using 818 cases, which consist of 14,522 tumor bearing slices. Performance of deep learning algorithm was evaluated in an independent testing data set of 203 cases, totally 3736 tumor bearing slices. Twenty cases of the testing data set were randomly selected to evaluate against 8 qualified radiation oncologists in a multicenter setting. Dice similarity coefficient (DSC) and average surface distance (ASD) were used to assess the deep-learning delineation accuracy. Sensitivity (SEN) and positive predictive value (PPV) were used to demonstrate the voxel-wise classification accuracy. Inter-observer variation of 8 radiation oncologists was assessed using multi-observer DSC. Time spent of radiation oncologists on manual delineation and in editing deep-learning delineation was also recorded. In testing data set, DSC between deep-learning delineation and expert-panel delineation varied from 0.607 to 0.885, with the mean of 0.780 ± 0.0418, and almost higher than 0.7 (194/203, 97.0%). Except one case had an ASD of 5.827 mm, it ranged from 0.849 to 3.332 mm, with the mean of 2.051 ± 0.638 mm. Mean SEN and PPV were 0.883 ± 0.0798 and 0.709 ± 0.0773. For 20 cases, deep-learning delineation outperformed 4 of the 8 radiation oncologists, with mean DSC of 0.766 against 0.691, 0.699, 0.704 and 0.719 (all P < 0.05); while performed comparably to another 4 oncologists. With the assistance of deep-learning delineation, increased delineation accuracy was observed in 5 oncologists (mean DSC increased from 0.731 to 0.779; all P < 0.05) and stable DSC in 3 oncologists. Furthermore, decreased multi-observer DSC was observed (0.774 ± 0.0773 vs. 0.702 ± 0.114; P < 0.001). Average time spent decreased from 30.2 min on manual delineation to 18.3 min in editing deep-learning delineation (P < 0.001), saving 39.2% of the time. In GTVp delineation for NPC, deep-learning delineation achieved satisfactory agreement in comparison with expert-panel delineation, and outperformed 4 of 8 qualified radiation oncologists significantly. Assistance of deep-learning delineation consistently improved both the accuracy and efficiency.