Self-channel-and-spatial-attention Neural Network for Automated Multi-Organ Segmentation on Head and Neck CT Images.
Shuiping Gou,Nuo Tong,Sharon Qi,Shuyuan Yang,Robert Chin,Ke Sheng
DOI: https://doi.org/10.1088/1361-6560/ab79c3
2020-01-01
Abstract:Accurate segmentation of organs at risk (OARs) is necessary for adaptive head and neck (H&N) cancer treatment planning, but manual delineation is tedious, slow, and inconsistent. A self-channel-and-spatial-attention neural network (SCSA-Net) is developed for H&N OAR segmentation on CT images. To simultaneously ease the training and improve the segmentation performance, the proposed SCSA-Net utilizes the self-attention ability of the network. Spatial and channel-wise attention learning mechanisms are both employed to adaptively force the network to emphasize the meaningful features and weaken the irrelevant features simultaneously. The proposed network was first evaluated on a public dataset, which includes 48 patients, then on a separate serial CT dataset, which contains ten patients who received weekly diagnostic fan-beam CT scans. On the second dataset, the accuracy of using SCSA-Net to track the parotid and submandibular gland volume changes during radiotherapy treatment was quantified. The Dice similarity coefficient (DSC), positive predictive value (PPV), sensitivity (SEN), average surface distance (ASD), and 95% maximum surface distance (95SD) were calculated on the brainstem, optic chiasm, optic nerves, mandible, parotid glands, and submandibular glands to evaluate the proposed SCSA-Net. The proposed SCSA-Net consistently outperforms the state-of-the-art methods on the public dataset. Specifically, compared with Res-Net and SE-Net, which is constructed from squeeze-and-excitation block equipped residual blocks, the DSC of the optic nerves and submandibular glands is improved by 0.06, 0.03 and 0.05, 0.04 by the SCSA-Net. Moreover, the proposed method achieves statistically significant improvements in terms of DSC on all and eight of nine OARs over Res-Net and SE-Net, respectively. The trained network was able to achieve good segmentation results on the serial dataset, but the results were further improved after fine-tuning of the model using the simulation CT images. For the parotids and submandibular glands, the volume changes of individual patients are highly consistent between the automated and manual segmentation (Pearson's correlation 0.97-0.99). The proposed SCSA-Net is computationally efficient to perform segmentation (sim 2 s/CT).