Interpretable Machine learning model to predict survival days of malignant brain tumor patients

Snehal Rajput,Mohendra Roy,Rupal Kapdi,Mehul S. Raval
DOI: https://doi.org/10.1088/2632-2153/acd5a9
2023-05-16
Machine Learning: Science and Technology
Abstract:An artificial intelligence (AI) model's performance is strongly influenced by the input features. Therefore, it is vital to find the optimal feature set. It is more crucial for the survival prediction of the glioblastoma multiforme (GBM) type of brain tumor. In this study, we identify the best feature set for predicting the survival days (SD) of GBM patients that outranks the state-of-the-art methodologies currently in use. The proposed approach is an end-to-end AI model. This model first segments tumors from healthy brain parts in patients' MRI images, extract features from the segmented results, performs feature selection, and makes predictions about patients' survival days based on the features selected. The extracted features are primarily shape based, location-based, and radiomics-based features. Additionally, patient metadata is also included as a feature. The methods used for selecting features include recursive feature elimination (RFE), permutation importance (PI), and finding the correlation between the features. Finally, we examined features behavior at local (single sample) and global (all the samples) levels. In this study, we find that out of 1265 extracted features, only 29 dominant features play a crucial role in predicting patients' survival days (SD). Furthermore, we find explanations of these features using post-hoc interpretability methods to validate the model's robust prediction. Finally, we analysed the behavioural impact of the top six features on survival prediction, and the findings drawn from the explanations were coherent with medical facts. We find that after the Age of 50 years, the likelihood of survival of a patient deteriorates, and survival after 80 years is scarce. Again, for location-based features, the SD is less if the tumor location is in the central or back part of the brain. The results show an overall 33% improvement in the accuracy of SD prediction compared to the top-performing methods of the BraTS-2020 challenge
What problem does this paper attempt to address?