Automated machine learning to predict the co-occurrence of isocitrate dehydrogenase mutations and O6 -methylguanine-DNA methyltransferase promoter methylation in patients with gliomas
Simin Zhang,Huaiqiang Sun,Xiaorui Su,Xibiao Yang,Weina Wang,Xinyue Wan,Qiaoyue Tan,Ni Chen,Qiang Yue,Qiyong Gong
DOI: https://doi.org/10.1002/jmri.27498
Abstract:Combining isocitrate dehydrogenase mutation (IDHmut) with O6 -methylguanine-DNA methyltransferase promoter methylation (MGMTmet) has been identified as a critical prognostic molecular marker for gliomas. The aim of this study was to determine the ability of glioma radiomics features from magnetic resonance imaging (MRI) to predict the co-occurrence of IDHmut and MGMTmet by applying the tree-based pipeline optimization tool (TPOT), an automated machine learning (autoML) approach. This was a retrospective study, in which 162 patients with gliomas were evaluated, including 58 patients with co-occurrence of IDHmut and MGMTmet and 104 patients with other status comprising: IDH wildtype and MGMT unmethylated (n = 67), IDH wildtype and MGMTmet (n = 36), and IDHmut and MGMT unmethylated (n = 1). Three-dimensional (3D) T1-weighted images, gadolinium-enhanced 3D T1-weighted images (Gd-3DT1WI), T2-weighted images, and fluid-attenuated inversion recovery (FLAIR) images acquired at 3.0 T were used. Radiomics features were extracted from FLAIR and Gd-3DT1WI images. The TPOT was employed to generate the best machine learning pipeline, which contains both feature selector and classifier, based on input feature sets. A 4-fold cross-validation was used to evaluate the performance of automatically generated models. For each iteration, the training set included 121 subjects, while the test set included 41 subjects. Student's t-test or a chi-square test was applied on different clinical characteristics between two groups. Sensitivity, specificity, accuracy, kappa score, and AUC were used to evaluate the performance of TPOT-generated models. Finally, we compared the above metrics of TPOT-generated models to identify the best-performing model. Patients' ages and grades between two groups were significantly different (p = 0.002 and p = 0.000, respectively). The 4-fold cross-validation showed that gradient boosting classifier trained on shape and textual features from the Laplacian-of-Gaussian-filtered Gd-3DT1 achieved the best performance (average sensitivity = 81.1%, average specificity = 94%, average accuracy = 89.4%, average kappa score = 0.76, average AUC = 0.951). Using autoML based on radiomics features from MRI, a high discriminatory accuracy was achieved for predicting co-occurrence of IDHmut and MGMTmet in gliomas. LEVEL OF EVIDENCE: 3 TECHNICAL EFFICACY STAGE: 3.