Risk factors affecting patients survival with colorectal cancer in Morocco: survival analysis using an interpretable machine learning approach

Imad El Badisy,Zineb Ben Brahim,Mohamed Khalis,Soukaina Elansari,Youssef ElHitmi,Fouad Abbass,Nawfal Mellas,Karima EL Rhazi
DOI: https://doi.org/10.1038/s41598-024-51304-3
IF: 4.6
2024-02-14
Scientific Reports
Abstract:The aim of our study was to assess the overall survival rates for colorectal cancer at 3 years and to identify associated strong prognostic factors among patients in Morocco through an interpretable machine learning approach. This approach is based on a fully non-parametric survival random forest (RSF), incorporating variable importance and partial dependence effects. The data was povided from a retrospective study of 343 patients diagnosed and followed at Hassan II University Hospital. Covariate selection was performed using the variable importance based on permutation and partial dependence plots were displayed to explore in depth the relationship between the estimated partial effect of a given predictor and survival rates. The predictive performance was measured by two metrics, the Concordance Index (C-index) and the Brier Score (BS). Overall survival rates at 1, 2 and 3 years were, respectively, 87% (SE = 0.02; CI-95% 0.84–0.91), 77% (SE = 0.02; CI-95% 0.73–0.82) and 60% (SE = 0.03; CI-95% 0.54–0.66). In the Cox model after adjustment for all covariates, sex, tumor differentiation had no significant effect on prognosis, but rather tumor site had a significant effect. The variable importance obtained from RSF strengthens that surgery, stage, insurance, residency, and age were the most important prognostic factors. The discriminative capacity of the Cox PH and RSF was, respectively, 0.771 and 0.798 for the C-index while the accuracy of the Cox PH and RSF was, respectively, 0.257 and 0.207 for the BS. This shows that RSF had both better discriminative capacity and predictive accuracy. Our results show that patients who are older than 70, living in rural areas, without health insurance, at a distant stage and who have not had surgery constitute a subgroup of patients with poor prognosis.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the overall survival rate of Moroccan colorectal cancer (CRC) patients within three years, and to identify strong prognostic factors related to survival rate through an interpretable machine - learning method. Specifically, the research aims to: 1. **Evaluate the overall survival rate**: Determine the overall survival rate of Moroccan CRC patients at 1 year, 2 years, and 3 years. 2. **Identify prognostic factors**: Identify strong prognostic factors affecting the survival rate of CRC patients through an interpretable machine - learning method (based on a fully non - parametric survival random forest, combined with variable importance and partial dependence effects). ### Research background - **Global impact of CRC**: CRC is the third most common cancer in the world. In 2020, there were more than 1.93 million new cases globally, among which there were 4,558 cases in Morocco, accounting for 7.7% of all new cancer cases in the country. - **Importance of prognostic factors**: Many factors significantly affect the prognosis of CRC patients, including patient characteristics, treatment methods, and various aspects of the healthcare system. Studying these factors is crucial for formulating care strategies adapted to local conditions. - **Economic burden**: CRC imposes a huge economic burden on patients and society, including direct medical costs and indirect costs (such as loss of productivity). ### Research methods - **Data sources**: The research data were obtained from 343 CRC patients in Hassan II University Hospital, with a time range from January 2009 to January 2015. - **Statistical analysis**: - Use the Kaplan - Meier estimator to calculate the overall survival rate at 1 year, 2 years, and 3 years and its 95% confidence interval. - Use the Cox proportional hazards model and survival random forest (RSF) to identify prognostic factors affecting survival rate. - Further explore the influence of key prognostic factors through variable importance and partial dependence plots (PDP). - Evaluate the predictive performance of the model, measured by the concordance index (C - index) and Brier score (BS). ### Main findings - **Overall survival rate**: - The 1 - year survival rate is 87% (SE = 0.02; 95% CI 0.84 - 0.91). - The 2 - year survival rate is 77% (SE = 0.02; 95% CI 0.73 - 0.82). - The 3 - year survival rate is 60% (SE = 0.03; 95% CI 0.54 - 0.66). - **Prognostic factors**: - Surgery, staging, insurance status, place of residence, and age are the most important prognostic factors. - The risk of death for patients who did not have surgery is 3.21 times that of patients who had surgery (HR 3.21; CI 1.83 - 5.63; p < 0.001). - The risk of death for patients with distant staging is 6.64 times that of patients with local staging (HR 6.64; CI 2.80 - 15.72; p < 0.001). - The risk of death for patients without health insurance is 2.85 times that of patients with insurance (HR 2.85; CI 1.63 - 4.98; p < 0.001). - The risk of death for patients living in rural areas is 1.88 times that of patients in urban areas (HR 1.88; CI 1.18 - 2.98; p < 0.001). - The risk of death for patients with tumors located in the rectum is 1.86 times that of patients with tumors located in the colon (HR 1.86; CI 1.21 - 2.88; p = 0.005). ### Model performance - **Cox proportional hazards model**: - The concordance index (C - index) is 0.771. - The Brier score is 0.257. - **Survival random forest (RSF)**: - The concordance index (C - index) is 0.798. - The Brier score is 0.207. ### Conclusions - **Superiority of RSF**: Compared with the Cox proportional hazards model, RSF performs better in terms of discrimination ability and prediction accuracy. - **High - risk subgroups**: Patients over 70 years old, living in rural areas, without health insurance, with distant staging, and without surgery belong to subgroups with a poor prognosis. Through these findings, the research provides practical insights for CRC patients in Morocco and helps to guide specific...