Three-Dimensional Deep Learning Normal Tissue Complication Probability Model to Predict Late Xerostomia in Patients With Head and Neck Cancer

Hung Chu,Suzanne P M de Vette,Hendrike Neh,Nanna M Sijtsema,Roel J H M Steenbakkers,Amy Moreno,Johannes A Langendijk,Peter M A van Ooijen,Clifton D Fuller,Lisanne V van Dijk
DOI: https://doi.org/10.1016/j.ijrobp.2024.07.2334
2024-08-13
Abstract:Purpose: Conventional normal tissue complication probability (NTCP) models for patients with head and neck cancer are typically based on single-value variables, which, for radiation-induced xerostomia, are baseline xerostomia and mean salivary gland doses. This study aimed to improve the prediction of late xerostomia by using 3-dimensional information from radiation dose distributions, computed tomography imaging, organ-at-risk segmentations, and clinical variables with deep learning (DL). Methods and materials: An international cohort of 1208 patients with head and neck cancer from 2 institutes was used to train and twice validate DL models (deep convolutional neural network, EfficientNet-v2, and ResNet) with 3-dimensional dose distribution, computed tomography scan, organ-at-risk segmentations, baseline xerostomia score, sex, and age as input. The NTCP endpoint was moderate-to-severe xerostomia 12 months postradiation therapy. The DL models' prediction performance was compared with a reference model: a recently published xerostomia NTCP model that used baseline xerostomia score and mean salivary gland doses as input. Attention maps were created to visualize the focus regions of the DL predictions. Transfer learning was conducted to improve the DL model performance on the external validation set. Results: All DL-based NTCP models showed better performance (area under the receiver operating characteristic curve [AUC]test, 0.78-0.79) than the reference NTCP model (AUCtest, 0.74) in the independent test. Attention maps showed that the DL model focused on the major salivary glands, particularly the stem cell-rich region of the parotid glands. DL models obtained lower external validation performance (AUCexternal, 0.63) than the reference model (AUCexternal, 0.66). After transfer learning on a small external subset, the DL model (AUCtl, external, 0.66) performed better than the reference model (AUCtl, external, 0.64). Conclusion: DL-based NTCP models performed better than the reference model when validated in data from the same institute. Improved performance in the external data set was achieved with transfer learning, demonstrating the need for multicenter training data to realize generalizable DL-based NTCP models.
What problem does this paper attempt to address?