Generative AI in glioma: Ensuring diversity in training image phenotypes to improve diagnostic performance for IDH mutation prediction

Hye Hyeon Moon,Jiheon Jeong,Ji Eun Park,Namkug Kim,Changyong Choi,Young‑Hoon Kim,Sang Woo Song,Chang-Ki Hong,Jeong Hoon Kim,Ho Sung Kim
DOI: https://doi.org/10.1093/neuonc/noae012
2024-01-22
Neuro-Oncology
Abstract:Abstract Background This study evaluated whether generative artificial intelligence (AI)-based augmentation (GAA) can provide diverse and realistic imaging phenotypes and improve deep learning-based classification of isocitrate dehydrogenase (IDH) type in glioma compared with neuroradiologists. Methods For model development, 565 patients (346 IDH-wildtype, 219 IDH-mutant) with paired contrast-enhanced T1 and FLAIR MRI scans were collected from tertiary hospitals and The Cancer Imaging Archive. Performance was tested on internal (119, 78 IDH-wildtype, 41 IDH-mutant [IDH1 and 2]) and external test sets (108, 72 IDH-wildtype, 36 IDH-mutant). GAA was developed using a score-based diffusion model and ResNet50 classifier. The optimal GAA was selected in comparison with the null model. Two neuroradiologists (R1, R2) assessed realism, diversity of imaging phenotypes, and predicted IDH mutation. The performance of a classifier trained with optimal GAA was compared with that of neuroradiologists using the area under the receiver operating characteristics curve (AUC). The effect of tumor size and contrast enhancement on GAA performance was tested. Results Generated images demonstrated realism (Turing’s test: 47.5–50.5%) and diversity indicating IDH type. Optimal GAA was achieved with augmentation with 110 000 generated slices (AUC: 0.938). The classifier trained with optimal GAA demonstrated significantly higher AUC values than neuroradiologists in both the internal (R1, P = .003; R2, P < .001) and external test sets (R1, P < .01; R2, P < .001). GAA with large-sized tumors or predominant enhancement showed comparable performance to optimal GAA (internal test: AUC 0.956 and 0.922; external test: 0.810 and 0.749). Conclusions The application of generative AI with realistic and diverse images provided better diagnostic performance than neuroradiologists for predicting IDH type in glioma.
oncology,clinical neurology
What problem does this paper attempt to address?