SIENNA: Lightweight Generalizable Machine Learning Platform for Brain Tumor Diagnostics

Sreya Sunil,Rahul S. Rajeev,Ayan Chatterjee,Julie Pilitsis,Amitava Mukherjee,Janet L. Paluh
DOI: https://doi.org/10.1101/2024.04.03.24305210
2024-04-04
Abstract:The transformative integration of Machine Learning (ML) for Artificial General Intelligence (AGI)-enhanced clinical imaging diagnostics, is itself in development. In brain tumor pathologies, magnetic resonance imaging (MRI) is a critical step that impacts the decision for invasive surgery, yet expert MRI tumor typing is inconsistent and misdiagnosis can reach levels as high as 85%. Current state-of-the-art (SOTA) ML brain tumor models struggle with data overfitting and susceptibility to shortcut learning, further exacerbated in large-sized models with many tunable parameters. In a comparison with multiple SOTA models, our deep ML brain tumor diagnostics model, SIENNA, surpassed limitations in four key areas of prioritized minimal data preprocessing, an optimized architecture that reduces shortcut learning and overfitting, integrated inductive cross-validation method for generalizability, and smaller neural architecture. SIENNA is applicable across MRI machines and 1.5 and 3.0 Tesla, and achieves high average accuracies on clinical DICOM MRI data across three-way classification: 92% (non-tumor), 91% (GBM), and 93% (MET) with retained high F1 and AUROC values for limited false positives/negatives. SIENNA is a lightweight clinical-ready AGI framework compatible with future multimodal expanded data integration.
Radiology and Imaging
What problem does this paper attempt to address?
The paper aims to address several key issues in brain tumor diagnosis, particularly the challenges faced when using Magnetic Resonance Imaging (MRI) for diagnosis. Specifically, the research team developed a lightweight, general-purpose machine learning platform called SIENNA to improve the accuracy of brain tumor diagnosis. The main issues mentioned in the paper include: 1. **Consistency issues in MRI diagnosis**: There is inconsistency in experts' judgments based on MRI for brain tumor types, with a misdiagnosis rate as high as 85%. 2. **Problems with existing machine learning models**: Current state-of-the-art (SOTA) machine learning models face issues of overfitting and shortcut learning when handling brain tumor diagnosis data. Large models are particularly prone to these problems. 3. **Challenges in data preprocessing**: Over-preprocessing of public datasets (such as the BraTS dataset) limits the generalization ability of models, making it difficult for these models to adapt to real-world clinical data. 4. **Insufficient model generalization**: Existing models perform poorly when faced with new patients or unseen data. Especially under traditional cross-validation methods, these models may encounter images from the same patient, which does not align with clinical reality. To address the above issues, the research team proposed the SIENNA platform, which has the following features: - **Minimal preprocessing**: SIENNA employs minimal data preprocessing steps to retain the key features of the original MRI DICOM data. - **Optimized architecture design**: The platform's design reduces the risk of shortcut learning and overfitting, and it can better generalize to new data. - **Inductive cross-validation method**: A specific cross-validation strategy is used to evaluate the model's performance when faced with new patient data. - **Small neural networks**: SIENNA uses smaller neural network structures, which helps improve the model's generalization ability. With these improvements, SIENNA achieved high average accuracy in three classification tasks: non-tumor (92%), glioblastoma (GBM) (91%), and metastatic tumor (MET) (93%), and also performed well on other performance metrics such as F1 score and AUROC value. Additionally, SIENNA has been shown to be more generalizable than other SOTA models trained on highly preprocessed datasets.