Predicting progression-free survival in patients with epithelial ovarian cancer using an interpretable random forest model

Lian Jian,Xiaoyan Chen,Pingsheng Hu,Handong Li,Chao Fang,Jing Wang,Nayiyuan Wu,Xiaoping Yu
DOI: https://doi.org/10.1016/j.heliyon.2024.e35344
IF: 3.776
2024-07-26
Heliyon
Abstract:Prognostic models play a crucial role in providing personalised risk assessment, guiding treatment decisions, and facilitating the counselling of patients with cancer. However, previous imaging-based artificial intelligence models of epithelial ovarian cancer lacked interpretability. In this study, we aimed to develop an interpretable machine-learning model to predict progression-free survival in patients with epithelial ovarian cancer using clinical variables and radiomics features. A total of 102 patients with epithelial ovarian cancer who underwent contrast-enhanced computed tomography scans were enrolled in this retrospective study. Pre-surgery clinical data, including age, performance status, body mass index, tumour stage, venous blood cancer antigen-125 (CA125) level, white blood cell count, neutrophil count, red blood cell count, haemoglobin level, and platelet count, were obtained from medical records. The volume of interest for each tumour was manually delineated slice-by-slice along the boundary. A total of 2074 radiomic features were extracted from the pre- and post-contrast computed tomography images. Optimal radiomic features were selected using the Least Absolute Shrinkage and Selection Operator logistic regression. Multivariate Cox analysis was performed to identify independent predictors of three-year progression-free survival. The random forest algorithm developed radiomic and combined models using four-fold cross-validation. Finally, the Shapley additive explanation algorithm was applied to interpret the predictions of the combined model. Multivariate Cox analysis identified CA-125 levels (P = 0.015), tumour stage (P = 0.019), and Radscore (P < 0.001) as independent predictors of progression-free survival. The combined model based on these factors achieved an area under the curve of 0.812 (95 % confidence interval: 0.802-0.822) in the training cohort and 0.772 (95 % confidence interval: 0.727-0.817) in the validation cohort. The most impactful features on the model output were Radscore, followed by tumour stage and CA-125. In conclusion, the Shapley additive explanation-based interpretation of the prognostic model enables clinicians to understand the reasoning behind predictions better.
What problem does this paper attempt to address?