Predictive machine learning models based on VASARI features for WHO grading, isocitrate dehydrogenase mutation, and 1p19q co-deletion status: a multicenter study

Wei Zhao,Chao Xie,Kukun Hanjiaerbieke,Rui Xu,Tuxunjiang Pahati,Shaoyu Wang,Junjie Li,Yunling Wang
DOI: https://doi.org/10.62347/MZLF2460
2024-08-25
Abstract:The objective of our study was to develop predictive models using Visually Accessible Rembrandt Images (VASARI) magnetic resonance imaging (MRI) features combined with machine learning techniques to predict the World Health Organization (WHO) grade, isocitrate dehydrogenase (IDH) mutation status, and 1p19q co-deletion status of high-grade gliomas. To achieve this, we retrospectively included 485 patients with high-grade glioma from the First Affiliated Hospital of Xinjiang Medical University, of which 312 patients were randomly divided into a training set (n=218) and a test set (n=94) in a 7:3 ratio. Twenty-five VASARI MRI features were selected from an initial set of 30, and three machine learning models - Multilayer Perceptron (MP), Bernoulli Naive Bayes (BNB), and Logistic Regression (LR) - were trained using the training set. The most informative features were identified using recursive feature elimination. Model performance was assessed using the test set and an independent validation set of 173 patients from Beijing Tiantan Hospital. The results indicated that the MP model exhibited the highest predictive accuracy on the training set, achieving an area under the curve (AUC) close to 1, indicating perfect discrimination. However, its performance decreased in the test and validation sets; particularly for predicting the 1p19q co-deletion status, the AUC was only 0.703, suggesting potential overfitting. On the other hand, the BNB model demonstrated robust generalization on the test and validation sets, with AUC values of 0.8292 and 0.8106, respectively, for predicting IDH mutation status and 1p19q co-deletion status, indicating high accuracy, sensitivity, and specificity. The LR model also showed good performance with AUCs of 0.7845 and 0.8674 on the test and validation sets, respectively, for predicting IDH mutation status, although it was slightly inferior to the BNB model for the 1p19q co-deletion status. In conclusion, integrating VASARI MRI features with machine learning techniques shows promise for the non-invasive prediction of glioma molecular markers, which could guide treatment strategies and improve prognosis in glioma patients. Nonetheless, further model optimization and validation are necessary to enhance its clinical utility.
What problem does this paper attempt to address?