Cross-Modality Synthetic Data Augmentation using GANs: Enhancing Brain MRI and Chest X-ray Classification

KUNAAL DHAWAN,Siddharth S. Nijhawan
DOI: https://doi.org/10.1101/2024.06.09.24308649
2024-06-10
Abstract:Brain MRI scans and chest X-ray imaging are pivotal in diagnosing and managing neurological and respiratory diseases, respectively. Given their importance in diagnosis, the datasets to train the artificial intelligence (AI) models for automated diagnosis remain scarce. As an example, annotated chest X-ray datasets, especially those containing rare or abnormal cases like bacterial pneumonia, are scarce. Conventional dataset collection methods are labor-intensive and costly, exacerbating the data scarcity issue. To overcome these challenges, we propose a specialized Generative Adversarial Network (GAN) architecture for generating synthetic chest X-ray data representing healthy lungs and various pneumonia conditions, including viral and bacterial pneumonia. Additionally, we extended our experiments to brain MRI scans by simply swapping the training dataset and demonstrating the power of our GAN approach across different medical imaging contexts. Our method aims to streamline data collection and labeling processes while addressing privacy concerns associated with patient data. We demonstrate the effectiveness of synthetic data in facilitating the development and evaluation of machine learning algorithms, particularly leveraging an EfficientNet v2 model. Through comprehensive experimentation, we evaluate our approach on both real and synthetic datasets, showcasing the potential of synthetic data augmentation in improving disease classification accuracy across diverse pathological conditions. Indeed, the classifier performance when trained with fake + real data on brain MRI classification task shows highest accuracy at 85.9%. Our findings underscore the promising role of synthetic data in advancing automated diagnosis and treatment planning for pneumonia, other respiratory conditions, and brain pathologies.
What problem does this paper attempt to address?