Deep Learning-Based Internal Gross Target Volume Definition in 4D CT Images of Lung Cancer Patients.
Yuanyuan Ma,Jingfang Mao,Xinguo Liu,Zhongying Dai,Hui Zhang,Xinyang Zhang,Qiang Li
DOI: https://doi.org/10.1002/mp.16106
2022-01-01
Abstract:BACKGROUND:Contouring of internal gross target volume (iGTV) is an essential part of treatment planning in radiotherapy to mitigate the impact of intra-fractional target motion. However, it is usually time-consuming and easily subjected to intra-observer and inter-observer variability. So far, few studies have been explored to directly predict iGTV by deep learning technique, because the iGTV contains not only the gross target volume (GTV) but also the motion information of the GTV.PURPOSE:This work was an exploratory study to present a deep learning-based framework to segment iGTV rapidly and accurately in 4D CT images for lung cancers.METHODS:Five models, including 3D UNet, mmUNet with point-wise add merging approach (mmUNet-add), mmUNet with concatenate fusion strategy (mmUNet-cat), gruUNet with point-wise add fusion approach (gruUNet-add), and gruUNet with concatenate method (gruUNet-cat), were adopted for iGTV segmentation. All the models originated from the 3D UNet network, with multi-channel multi-path and convolutional gated recurrent unit (GRU) added in the mmUNet and gruUNet networks, respectively. Seventy patients with lung cancers were collected and 55 cases were randomly selected as the training set, and 15 cases as the testing set. In addition, the segmentation results of the five models were compared with the ground truths qualitatively and quantitatively.RESULTS:In terms of Dice Similarity Coefficient (DSC), the proposed four networks (mmUNet-add, mmUNet-cat, gruUNet-add, and gruUNet-cat) increased the DSC score of 3D UNet from 0.6945 to 0.7342, 0.7253, 0.7405, and 0.7365, respectively. However, the differences were not statistically significant (p > 0.05). After a simple post-processing to remove the small isolated connected regions, the mean 95th percentile Hausdorff distances (HD_95s) of the 3D UNet, mmUNet-add, mmUNet-cat, gruUNet-add, and gruUNet-cat networks were 19.70, 15.75, 15.84, 15.61, and 15.83 mm, respectively, corresponding to 25.35, 25.96, 25.11, 28.23, and 24.47 mm before the post-processing. With regard to runtime, significant elapsed time growths (about 70s and 230s) were observed both in the mmUNet and gruUNet architectures due to the increasing parameters. But the mmUNet structure showed less growth.CONCLUSION:Our study demonstrated the ability of the deep learning technique to predict iGTVs directly. With the introduction of multi-channel multi-path and convolutional GRU, the segmentation accuracy was improved under certain conditions with a reduced segmentation efficiency and a further research topic when the 3D UNet network would lead to poor performance is elicited. Less efficiency degradation was observed in the mmUNet structure. Besides, the element-wise add fusing strategy was favorable to increase DSC, whereas HD_95 benefited from the concentrate merging approach. Nevertheless, the segmentation accuracy by deep learning still remains to be improved.