Constructing Novel Prognostic Biomarkers of Advanced Nasopharyngeal Carcinoma from Multiparametric MRI Radiomics Using Ensemble-Model Based Iterative Feature Selection

Ting-ting Yu,Sai-kit Lam,Lok-hang To,Ka -yan Tse,Nong-yi Cheng,Yeuk-nam Fan,Cheuk-lai Lo,Ka-wa Or,Man-lok Chan,Ka-ching Hui,Fong-chi Chan,Wai-ming Hui,Lo-kin Ngai,Francis Kar-ho Lee,Kwok-hung Au,Celia Wai-yi Yip,Yong Zhang,Jing Cai
DOI: https://doi.org/10.1109/icmipe47306.2019.9098211
2019-01-01
Abstract:Although different treatment strategies have been developed for nasopharyngeal carcinoma (NPC), recurrence and distant metastasis remain major challenges to advanced NPC. This study aims to identify pre-treatment radiomics models to predict progression-free survival (PFS) using pre-treatment T2-weighted short tau inversion recovery (STIR) magnetic resonance (MR) images and contrast-enhanced T1-weighted MR images (CET1-W) separately. To address the problem of imbalanced and small dataset in model training, we developed a novel method named as ensemble-model based iterative feature selection for determine the predictive feature sets. Least absolute shrinkage and selection operator (LASSO) was used in both feature selection and model construction. Model ensemble was constructed from the subset of patients during the process of feature selection and model construction. In model construction, selected models built from predictive feature sets were then internally validated using 1000-bootstrapping for whole-patient cohort. Corrected AUC of Joint CET1-w and T2-w model was the highest and corrected AUC of T2-w modes was the lowest. Rad-scores were calculated as a linear combination of selected features for each patient, and were evaluated by stratified Kaplan-Meier analysis and Cox proportional hazard regression. Significant differences (p<; 0.001) were observed between survival curves of high-risk and low-risk patients stratified by Rad-scores. Our results demonstrated the capability of the ensemble-model based iterative feature selection method for imbalanced and small dataset when building MRI-based biomarkers to stratify patients into high risk and low risk.
What problem does this paper attempt to address?