Deep Learning Based Time-to-event Analysis with PET, CT and Joint PET/CT for Head and Neck Cancer Prognosis
Yiling Wang,Elia Lombardo,Michele Avanzo,Sebastian Zschaek,Julian Weingaertner,Adrien Holzgreve,Nathalie L. Albert,Sebastian Marschner,Giuseppe Fanetti,Giovanni Franchin,Joseph Stancanello,Franziska Walter,Stefanie Corradini,Maximilian Niyazi,Jinyi Lang,Claus Belka,Marco Riboldi,Christopher Kurz,Guillaume Landry
DOI: https://doi.org/10.1016/j.cmpb.2022.106948
IF: 6.1
2022-01-01
Computer Methods and Programs in Biomedicine
Abstract:Objectives: Recent studies have shown that deep learning based on pre-treatment positron emission tomography (PET) or computed tomography (CT) is promising for distant metastasis (DM) and overall survival (OS) prognosis in head and neck cancer (HNC). However, lesion segmentation is typically required, resulting in a predictive power susceptible to variations in primary and lymph node gross tumor volume (GTV) segmentation. This study aimed at achieving prognosis without GTV segmentation, and extending single modality prognosis to joint PET/CT to allow investigating the predictive performance of combined-compared to single-modality inputs. Methods: We employed a 3D-Resnet combined with a time-to-event outcome model to incorporate censoring information. We focused on the prognosis of DM and OS for HNC patients. For each clinical endpoint, five models with PET and/or CT images as input were compared: PET-GTV, PET-only, CT-GTV, CT-only, and PET/CT-GTV models, where -GTV indicates that the corresponding images were masked using the GTV contour. Publicly available delineated CT and PET scans from 4 different Canadian hospitals (293) and the MAASTRO clinic (74) were used for training by 3-fold cross-validation (CV). For independent testing, we used 110 patients from a collaborating institution. The predictive performance was evaluated via Harrell's Concordance Index (HCI) and Kaplan-Meier curves. Results: In a 5-year time-to-event analysis, all models could produce CV HCIs with median values around 0.8 for DM and 0.7 for OS. The best performance was obtained with the PET-only model, achieving a median testing HCI of 0.82 for DM and 0.69 for OS. Compared with the PET/CT-GTV model, the PET-only still had advantages of up to 0.07 in terms of testing HCI. The Kaplan-Meier curves and corresponding log-rank test results also demonstrated significant stratification capability of our models for the testing cohort. Conclusion: Deep learning-based DM and OS time-to-event models showed predictive capability and could provide indications for personalized RT. The best predictive performance achieved by the PET-only model suggested GTV segmentation might be less relevant for PET-based prognosis. (C) 2022 Elsevier B.V. All rights reserved.